bill vorhiesPredictive Analytics Series, #5

Summary:  Financial markets forecasting inherently involves time series data.  Here’s an interesting approach we used that doesn’t rely on sequential time series data to forecast turns in the NASDAQ.

A few years back we were challenged by a ‘quant’ investment firm to demonstrate the value of predictive modeling in forecasting financial markets.  The project we mutually agreed was to create a market timing predictor using only the daily closing price to forecast whether a market turn on any given day would exceed 3% giving the investment firm guidance to buy or sell the NASDAQ index QQQ on that day.

Adding other external variables relating to the economy, other markets, changes in other financial KPI, or even multiple price points per day (open, high, low, close) would have undoubtedly improved the quality of the model.

The key to our success given this constraints was to use the daily closing price lagged from 0 to 60 days.  From this we created a series of calculated variables that were the moving averages across the entire range from 2 to 60 days.  To keep the number of variables reasonable we used daily moving averages in the range of 2 to 10 days and then 5 day increments out to 60 days.  Whatever information the model discovered to be predictive was somehow represented in the prices and patterns of the preceding 60 days. 

We defined a turn in the market as any local high or low close followed by a minimum of 3% change in value from that peak or low, before the occurrence of the next local peak or low.  The selection of 3% was a project constraint and could easily have been set to another value to match any preference for trading frequency. 

Keep in mind that this would include turns of 3% or greater.  If we had wanted to gauge whether the turn would be incrementally greater than 3%, presumably a more valuable opportunity, we would have needed separate models for each threshold, say 4%, 5%, and so on.

The 3% value resulted in 25 market turns in the test year running from as little as one day to as long as several months.  This year was characterized by a strongly rising market with a few corrections of short duration.  The longest correction lasted only 6 days.  However, whether bull or bear, all movements changed the overall market value by a minimum of 3% creating a trading profit opportunity.

Results:  The model achieved a fitness (expressed as percentage hit rate) of 93.8% on the training and validation data, and 86.3% on the unseen test year data.  In practice, the prediction of market turns was actually better than these percentages indicate.  Of the 25 market turns in the evaluation year, our model predicted:

14 turns On the day of the turn!
7 turns on the day following the turn
2 turns 2 days after the turn
1 turn 3 days after the turn
1 turn 5 days after the turn


The model never failed to predict the turn.  It was particularly accurate in calling the short-term corrections, calling most on the day of the turn, and the balance on the following day. 

Procedures Used in Preparing this Model:

Our tool of choice was a genetic program allowing for both a ‘single best program’ as well as an ensemble or team model of up to 9 independent voting algorithms.  In this case a 7-program ensemble model proved to be marginally the most accurate.

The period we elected was a four year period occurring during a generally expanding economy so there were no external or rogue events to create major changes in market direction greater than about 10%.  The first three years were used to build the model and were evaluated on their ability to predict the most current fourth unseen year.

Model Assessment: Of the 254 trading days, the model agreed with actual market direction 81% of the time.  This means that on 47 days you would have had a false indication.  This is a little misleading, since 11 of the 47 disagreements were in mostly one or two-day lags at the turn as described above.

Disagreements between the model and actual of only one day occurring in the middle of a trend would not be difficult to evaluate since they would return to the correct correlation on the following day presuming that the model is run daily on the new closing data.  16 of the disagreements were of this one-day variety and corresponded to a market move that ultimately did not meet the 3% criteria.

This leaves 20 days of disagreement where the model called for a change typically about 3 days in a row that ultimately did not materialize at the 3% minimum level.  These are the true errors of the model that experience or better modeling using more variables could reduce.  This leaves us with an effective accuracy in excess of 90% for this test. 


December 9, 2013

Bill Vorhies, President & COO – Data-Magnum – © 2013, all rights reserved.


About the author:  Bill Vorhies is President & COO of Data-Magnum and has practiced as a data scientist and commercial predictive modeler since 2001.  He can be reached at:

818.257.2035 (C)


4701 Patrick Henry Drive, Bldg. 8

Great America Technology Park

Santa Clara, CA 95054