Fun with OHLC comparisons

I’ve charted below differences in open, high, low, and close values between Yahoo Finance and PI Trading for the SPY ETF between 12/30/02 and 11/9/12 (last date in my PI dataset). 

I haven’t seen this kind of comparison done elsewhere (let me know if you have).  I’ve done this to illustrate just how different various data sources can be.  Enjoy!

(Note: Reader James reminded me of a similar comparison made here – http://www.quantifiedstrategies.com/the-importance-of-good-data-sets/ that compared Yahoo finance to IB. Check it out as well.)

The Open

I was surprised how close the opening quotes tended to be between the two datasets. There are a handful of big divergences.  There seems to be an even spread between higher and lower values.

SPYCompOpen

The High

When the two data sets diverged here, Yahoo finance almost always had higher highs.  I think this is a result of bogus, non-tradeable quotes being recorded by Yahoo. 

SPYCompHigh

The Low

When the two datasets diverged here, Yahoo Finance almost always had lower lows. Again, I think this a result of bogus quotes recorded by Yahoo.

SPYCompLow

The Close

The close quotes differed more than I was expecting. The differences have been smaller though since 2009ish.  As with the open comparison, the distribution of values above/below 0 seems random.

SPYCompClose

Notes

I don’t know whether the PI Trading dataset is superior to other sources for testing.  It seems reasonable to me that it is more accurate at least than Yahoo Finance, especially for high and low values. 

The daily OHLC data from PI was constructed from minute level data. The dataset is missing a day for 1/30/07.  Part of the morning of 1/31/07 is also missing.

When comparing the close, I excluded days where the markets closed early.

About these ads

Posted on February 13, 2013, in Other. Bookmark the permalink. 24 Comments.

  1. Here’s a comparison between yahoo (which is really just CSI data) and IQFeed: http://i.imgur.com/TfSQeVk.png

    The open and the close are generally extremely close, usually to the tick. This makes me doubt the accuracy of Pi’s data. Something quite interesting is the fact that the high/low difference is inversed here: yahoo had higher lows and lower highs.

    Might run it again using daily data constructed from intraday data, would be interesting to check for differences.

    • Thanks for sharing the test with IQFeed. Interesting you found the opposite effects compared to Yahoo highs and lows. Definitely share again if you construct using intraday data.

      • Here we go: IQFeed daily vs IQFeed daily derived from minute data, and CSI daily vs IQFeed daily derived from minute data.

        The values are quite close over the last few months compared to earlier times. Perhaps they changed the way they collect data?

      • Major thanks for the share. It’s crazy how much the daily IQ data differs when it’s constructed from the minute data.

      • With respect to IQ, the following is a notice issued (I believe sometime in 2011) by a software developer for a program that I use:

        “IMPORTANT: IQFeed provides unfiltered data. This means that no tick is filtered out therefore you may see a lot of bad ticks. This will change when IQFeed implements bad tick filter.”

  2. Hi,

    Interesting and good work. About a year back I checked the quotes from Yahoo/Finance if they are tradeable. I did this manually every day after close. Unfortunately, I deleted the file by accident but it’s a lot of bogus high and low from Yahoo!.

    • I’m doing the same thing manually now (that is checking at the end of the day to see if the stated SPY range from YF was tradeable). Would have loved to see your file as a check against my own data.

      FWIW, the PI data agrees with the findings in your post from last October. The PI data shows a low on 11/30/11 that is $3+ higher than YF. The PI data also shows a high that is $1+ lower than YF on 4/9/12.

  3. I should also add one more point about Yahoo – it looks like their RSI calculation isn’t done the same way as most people define it. Perhaps it was just the particular parameters I was using, but it sure was different. I ended up doing the calculation myself. I’ve been meaning to double-check if it’s still wrong and send them an email about it.

  4. The difference in closing prices is due to the fact that the PiTrading data doesn’t contain the daily settlement price while Yahoo data (= CSI Unfair Advantage) does.

  5. Hi,

    Just wanted to check with you and see if you have read this. Its somewhat simillar to your strategy.
    I am trying to figure out how to reduce draw downs.

    http://optionvue.com/files/Trading_the_VXX.pdf

    On a different not I am still researching using options when entering vxx trades. Is there any excel spreadsheet with all your trade entry and exit. What I am thinking is vix is only a short term trade most of the case its less than 2 months.

    Bill

    • Bill, thanks for passing that link along. Term structure and volatility are two good signals for timing. I’d like to see the performance of the strategy since being published and test it using simulated VXX/XIV data prior to 2009.

      At some point I’ll email you some entries/exits since 2012 to consider with your options testing.

  6. This article is outdated (1999) and probably not even applicable anymore but thought you might want to read it:

    http://www.csidata.com/?page_id=856

    • Nice, thanks for passing along. Even if it’s outdated it shows where some discrepancies can potentially come from (e.g. highs/lows recorded from bid/ask even if no actual trade takes place).

  7. The data displayed by Yahoo and PI are only as good as their suppliers. Where does PI get their data? Yahoo’s historical comes from CSI and intraday for the major exchanges directly from the exchange. All of their sources are listed on their website. Are PI’s data suppliers as solid as Yahoo’s?

    • I can’t say whether PI is better than other sources and you are right to question it. I can say that I’ve personally seen several non-tradeable quotes from Yahoo! for highs and lows. I personally don’t trust these values anymore.

      • It appears that you are talking about what Yahoo calls daily/current data (obtained in most cases directly from the exchange) rather than historical data which is also available on a daily basis. This has been beaten to death on many blogs. Yahoo’s daily data includes trading outside of RTH when non-RTH trading occurs. As a result, people that download this data often go back a few hours later and get the historical data. As far as the daily data goes I’ll bet that PI does not include non-RTH data? If so, you are comparing apples and oranges. If for some reason you want to use daily data then be sure that the data does not include non-RTH data and you will most probably find most data sources are comparable.

        I use Yahoo’s historical data without a problem and if something does not look right I double check it against StockChart.com and the like and confirm that the odd looking data is consistent with such sources. I suggest that anybody downloading Yahoo data for analysis purposes use data from their historical server, rather than data from their current/daily server. Your downloader should be able to distinguish between the two.

      • RTH stands for regular trading hours? If I understand correctly, you are saying that Yahoo historical data includes pre and aftermarket lows and highs? That would certainly cause the discrepancy if true.

        Yesterday is a good example with Yahoo and other sources showing a high of $151.42 for SPY, though it never traded above $151 during regular trading hours from what I can see.

      • That’s right in re RTH. As for trading hours, about a dozen ETFs, including SPY, are exceptions and trade until 4:15. Your broker or the exchange website can provide the full list. SPY hit 151.42 just after 4:02. Yahoo’s historical data shows a high of 151.42 whcih is correct. It’s a bit more complex, however, as the close is for 4:00. If you want to keep things simple use the index. You can get the official historical data for ETFs like SPY from the NASDAQ website.

      • Appreciate the insights. I can see on the Nasdaq site where the large # of shares traded at that price at 4:02. I guess it’s technically right but is still almost $1 higher than other trades occuring in the same minute of time.

      • I should also note that I do not know if Yahoo’s daily/current data from the exchanges is put through a “bad tick filter” (see IQ note about this above).

    • What do you mean by? “As for trading hours, about a dozen ETFs, including SPY, are exceptions and trade until 4:15.” Do you mean that RTH for these ETFs extend to 4:15, or that their after-market hours only extend until 4:15. If you mean that RTH for these ETFs extend to 4:15, do you know when CSI ends their non-RTH quotes (ex: they don’t incorporate quotes after 7:00 P.M. into their daily data).

      I compared Yahoo’s historical data with ThinkOrSwim’s intraday data (includes pre-market and after-market quotes), and from what I found, CSI does not include pre-market quotes into their daily prices, only RTH and after-market quotes.

      Also, are there any RTH-only data vendors that you would recommend?

      • The ETFs that are exceptions have normal trading until 4:15. This is to keep them in synch with futures/options. Check the product specs.

        As for Yahoo, there are two independent servers – historical and intraday. The historical data from CSI uses RTH, including the H,L from the 4:15 period for the exception ETFs. Yahoo’s intraday data, as I understand from the following link, is direct from the exchange:

        http://help.yahoo.com/kb/index?page=content&y=PROD_FIN&locale=en_US&id=SLN2310&pir=jfV2gFhibUlT0LTbIZEoDju_TuTZ8lvatcsNANASkrt5bibzBOr43OKsx5bMhmlgAqpSdTF3OnIk

        As for good or bad suppliers, for historical I would not be concerned about any historical supplier used by Yahoo, MSN or Google which are listed on their websites and in some cases can be downloaded. Intraday data can be dirty and tricky especially if the supplier does not use a bad tick filter. Based on the disclaimer on Yahoo’s website, I suspect that they post raw data from the exchange which might include bad ticks. For intraday trading a good broker might be your best bet.

  8. Mike,

    Given these data problems tat you’ve encountered, I was wondering if you’ve put the Alpha 2 strategy on permanent hold or whether you’ve figured out a way to resurrect it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 195 other followers