Fun with OHLC comparisons
I’ve charted below differences in open, high, low, and close values between Yahoo Finance and PI Trading for the SPY ETF between 12/30/02 and 11/9/12 (last date in my PI dataset).
I haven’t seen this kind of comparison done elsewhere (let me know if you have). I’ve done this to illustrate just how different various data sources can be. Enjoy!
(Note: Reader James reminded me of a similar comparison made here – http://www.quantifiedstrategies.com/the-importance-of-good-data-sets/ that compared Yahoo finance to IB. Check it out as well.)
I was surprised how close the opening quotes tended to be between the two datasets. There are a handful of big divergences. There seems to be an even spread between higher and lower values.
When the two data sets diverged here, Yahoo finance almost always had higher highs. I think this is a result of bogus, non-tradeable quotes being recorded by Yahoo.
When the two datasets diverged here, Yahoo Finance almost always had lower lows. Again, I think this a result of bogus quotes recorded by Yahoo.
The close quotes differed more than I was expecting. The differences have been smaller though since 2009ish. As with the open comparison, the distribution of values above/below 0 seems random.
I don’t know whether the PI Trading dataset is superior to other sources for testing. It seems reasonable to me that it is more accurate at least than Yahoo Finance, especially for high and low values.
The daily OHLC data from PI was constructed from minute level data. The dataset is missing a day for 1/30/07. Part of the morning of 1/31/07 is also missing.
When comparing the close, I excluded days where the markets closed early.