Learn To STOP Curve Fitting!
Download this free guide on how to stop curve fitting. Following these four simple steps can improve your trading dramatically!
The fundamental idea behind predictive modeling is indicators may contain information that can be used to predict a forward looking variable, called a target. The task of a predictive model is to find and exploit any such information.
Date trend volatility day_return
19950214 0.251 1.572 0.144
19950215 0.101 1.778 0.055
19950216 -0.167 2.004 -0.013
Suppose we provide several years of this data to a model and ask it to learn how to predict day_return, the one day forward return, from two indicators, one called trend and the other volatility. In the lingo of machine-learning this process is called model training. Then, we may at a later date calculate from recent prices that trend=0.225 and volatility=1.244 as of that day. The trained model may then make a prediction that the target variable day_return will be 0.152. (These are all made-up numbers.) Based on this prediction that the market is about to rise substantially, we may choose to take a long position.
Intuition tells us that we should put more faith in extreme predictions than in more common predictions near the center of the model’s prediction range. If a model predicts that the market will rise by 0.001 percent tomorrow, we would not be nearly as inclined to take a long position as if the model predicts a 5.8 percent rise. This intuition is correct, because our research has shown in general there is a large correspondence between the magnitude of a prediction and the likelihood of success of the associated trade. Predictions of large magnitude are more likely to signal profitable market moves than predictions of small magnitude. The standard method for making trade decisions based on predicted market moves is to compare the prediction to a fixed threshold. If the prediction is greater than or equal to an upper threshold (usually positive), take a long position. If the prediction is less than or equal to a lower threshold (usually negative), take a short position. The holding period for a position is implicit in the definition of the target. This will be discussed in detail in our book Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments (SSML) . It should be obvious that the threshold determines a tradeoff in the number of trades versus the accuracy rate of the trades. If we set a threshold near zero, the magnitude of the predictions will frequently exceed the threshold, and a position will be taken often. Such trades carry a relatively high rate of failure. Conversely, if we set a threshold that is far from zero, predicted market moves will only rarely lie beyond the threshold, so trades will be rare but have a relatively high success rate. We already noted that there is a large correspondence between the magnitude of a prediction and the likelihood of a trade’s success. Thus, by choosing an appropriate threshold, we can control whether we have a system that trades often but with only mediocre accuracy, or a system that trades rarely but with excellent accuracy.
TSSB automatically chooses optimal long and short thresholds by choosing them so as to maximize the profit factor for long systems and short systems separately. Profit factor, a common metric of trading system performance is the ratio of total gains on successful trades to total loses on failed trades. In order to prevent degenerate situations in which there is only one trade or very few trades, the user specifies a minimum number of trades that must be taken, either as an absolute number or as a minimum fraction of bars. In addition, TSSB has an option for using two thresholds on each side (long and short) so as to produce two sets of signals, one set for ‘normal reliability’ trades, and a more conservative set for ‘high reliability’ trades. Finally, in many applications, TSSB prints tables that show performance figures that would be obtained with varying thresholds.
Computation of thresholds and interpretation of trade results based on predictions relative to these thresholds are advanced topics that will be discussed in detail in SSML. For now, the reader needs to understand only the following concepts:
TSSB provides the ability to perform many tests of a predictive model trading or filtering system. The available testing methodologies will be discussed in detail SSML. However, so that the reader may understand the elementary trading/filtering system development and evaluation presented in the next chapter, we now discuss two general testing methodologies: cross validation and walkforward testing. These are the primary standards in many prediction
applications, and both are available in TSSB in a variety of forms.
The principle underlying the vast majority of testing methodologies, including those included in TSSB, is that the complete historical dataset available to the developer is split into separate subsets. One subset, called the training set or the development set, is used to train the predictive model. The other subset, called the test set or the validation set, is used to evaluate performance of the trained model. (Note that the distinction between the terms test set and validation set is not consistent among experts, so the increasingly common convention is to use them interchangeably. The same is true of training set and development set.)
The key here is that no data that takes part in the training of the model is permitted to take part in its performance evaluation. Under fairly general conditions, this mutually exclusive separation guarantees that the performance measured in the test set is an unbiased estimate of future performance. In other words, although the observed performance will almost certainly not exactly equal the performance that will be seen in the future, it does not have a systematic bias toward optimistic or pessimistic values. Having an unbiased estimate of future performance is one of the two main goals of a trading system development and testing operation. The other goal is being able to perform a statistical significance test to estimate the probability that the performance level achieved could have been due to good luck. This advanced concept is beyond the scope of this brief overview but its is discussed in depth in SSML.
In the earliest days of model building and testing, when high speed computers were not readily available, splitting of the data into a training set and a test set was done exactly once. The developer would typically train the model using data through a date several years prior to the current date, and then test the model on subsequent data, ending with the most recent data available. This is an extremely inefficient use of the data. TSSB makes available both cross validation and walkforward testing. These techniques split the available data into training sets and test sets many times, and pool the performance statistics into a single unbiased estimate of the model-based trading system’s true capability. This extensive reuse of the data for both training and testing makes efficient use of precious and limited market history.
Walkforward testing is straightforward, intuitive, and widely used. The principle is that we train the model on a relatively long block of data that ends a considerable time in the past. We test the trained model on a relatively short section of data that immediately follows the training block. Then we shift the training and testing blocks forward in time by an amount equal to the length of the test block and repeat the prior steps. Walkforward testing ends when we reach the end of the dataset. We compute the net performance figure by pooling all of the test block trades. Here is a simple example of walkforward testing:
1) Train the model using data from 1990 through 2007. Test the model on 2008 data.
2) Train the model using data from 1991 through 2008. Test the model on 2009 data.
3) Train the model using data from 1992 through 2009. Test the model on 2010 data.
Pool all trades from the tests of 2008, 2009, and 2010. These trades are used to compute an unbiased estimate of the performance of the model.
The primary advantage of walkforward testing is that it mimics real life. Most developers of automated trading systems periodically retrain or otherwise refine their model. Thus, the results of a walkforward test simulate the results that would have been obtained if the system had been actually traded. This is a compelling argument in favor of this testing methodology.
Another advantage of walkforward testing is that it correctly reflects the response of the model to nonstationarity in the market. All markets evolve and change their behavior over time, sometimes rotating through a number of different regimes. Loosely speaking, this change in market dynamics, and hence in relationships between indicator and target variables, is called nonstationarity. The best predictive models have a significant degree of robustness against such changes, and walkforward testing allows us to judge the robustness of a model.
TSSB’s ability to use a variety of testing block lengths makes it easy to evaluate the robustness of a model against nonstationarity. Suppose a model achieves excellent walkforward results when the test block is very short. In other words, the model is never asked to make predictions for data that is far past the date on which its training block ended. Now suppose the walkforward performance deteriorates if the test block is made longer. This indicates that the market is rapidly changing in ways that the model is not capable of handling. Such a model is risky and will require frequent retraining if it is to keep abreast of current market conditions. On the other hand, if walkforward performance holds up well as the length of the test block is increased, the model is robust against nonstationarity. This is a valuable attribute of a predictive-model based approach to trading system development. Look at Figure 1 on the next page, which depicts the placement of the training and testing blocks (periods) along the time axis. Figure 1 above shows two situations.
The top section of the figure depicts walkforward with very short test blocks. The bottom section depicts very long test blocks. It can be useful to perform several walkforward tests of varying test block lengths in order to evaluate the degree to which the prediction model is robust against nonstationarity.
Walkforward testing has only one disadvantage relative to alternative testing methods such as cross validation: it is relatively inefficient when it comes to use of the available data. Only cases past the end of the first training block are ever used for testing. If you are willing to believe that the indicators and targets are reasonably stationary, this is a tragic waste of data. Cross validation, discussed in the next section, addresses this weakness.
Rather than segregating all test cases at the end of the historical data block, as is done with walkforward testing, we can evenly distribute them throughout the available history. This is called cross validation. For example, we may test as follows:
1) Train using data from 2006 through 2008. Test the model on 2005 data.
2) Train using data from 2005 through 2008, excluding 2006. Test the model on 2006 data.
3) Train using data from 2005 through 2008, excluding 2007. Test the model on 2007 data.
4) Train using data from 2005 through 2008, excluding 2008. Test the model on 2008 data.
This idea of withholding interior ‘test’ blocks of data while training with the surrounding data is illustrated in Figure 2 below. In cross validation, each step is commonly called a fold.
The obvious advantage of cross validation over walkforward testing is that every available case becomes a test case at some point. However, there are several disadvantages to note. The most serious potential problem is that cross validation is sensitive to nonstationarity. In a walkforward test, only relatively recent cases serve as test subjects. But in cross validation, cases all the way back to the beginning of the dataset contribute to test performance results. If the behavior of the market in early days was so different than in later days that the relationship between indicators and the target has seriously changed, incorporating test results from those early days may not be advisable.
Another disadvantage is more philosophical than practical, but it is worthy of note. Unlike a walkforward test, cross validation does not mimic the real-life behavior of a trading system. In cross validation, except for the last fold, we are using data from the future to train the model being tested. In real life this data would not be known at the time that test cases are processed. Some skeptics will raise their eyebrows at this, even though when done correctly it is legitimate, providing nearly unbiased performance estimates. Finally, overlap problems, discussed in the next section, are more troublesome in cross validation than in walkforward tests.
The discussions of cross validation and walkforward testing just presented assume that each case is independent of other cases. In other words, the assumption is that the values of variables for a case are not related to the values of other cases in the dataset. Unfortunately, this is almost never the situation. Cases that are near one another in time will tend to have similar values of indicators and/or targets. This generally comes about in one or both of the following ways:
These facts have several important implications. Because indicators change only slowly, the model’s predictions also change slowly. Hence market positions change slowly; if a prediction is above a threshold, it will tend to remain above the threshold for multiple bars. Conversely, if a prediction is below a threshold, it will tend to remain below that threshold for some time. If the target is looking ahead more than one bar, which results in serial correlation as discussed above, then the result of serial correlation in both positions and targets is serial correlation in returns for the trading system. This immediately invalidates most common statistical significance tests such as the t-test, ordinary bootstrap, and Monte-Carlo permutation test. TSSB does include several statistical significance tests that can lessen the impact of serial correlation. In particular, the stationary bootstrap and tapered block bootstrap will be discussed elsewhere in SSML. Unfortunately, both of these tests rely on assumptions that are often shaky. We’ll return to this issue in more detail later when statistical tests are discussed. For the moment, understand that targets that look ahead more than one bar usually preclude tests of significance or force one to rely on tests having questionable validity.
Lack of independence in indicators and targets has another implication, this one potentially more serious than just invalidating significance tests. The legitimacy of the test results themselves can be undermined by bias. Luckily, this problem is easily solved with a TSSB option called OVERLAP. Its details are discussed in SSML. For now we will simply explore the nature of the problem.
The problem occurs near the boundaries between training data and test data. The simplest situation is for walkforward testing, because there is only one (moving) boundary. Suppose the target involves market movement ten days into the future. Consider the last case in the training block. Its target involves the first ten days after the test block begins. This case, like all training set cases, plays a role in the development of the predictive model. Now consider the case that immediately follows it, the first case in the test block. As has already been noted, its indicator values will be very similar to the indicator values of the prior case. Thus, the model’s prediction will also be similar to that of the prior case. Because the target looks ahead ten days and we have moved ahead only one day, leaving a nine-day overlap, the target for this test case will be similar to the target for the prior case. But the prior case, which is practically identical to this test case, took part in the training of the model! So we have a strong prejudice for the model to do a good job of predicting this case, whose indicators and target are similar to the training case. The result is optimistic bias, the worst sort. Our test results will exceed the results that would have been obtained from an honest test.
This boundary effect manifests itself in an additional fashion in cross validation. Of course, we still have the effect just described when we are near the end of the early section of the training set and the start of the test set. This is the left edge of the red regions in Figure 2. But we also have a boundary effect when we are near the end of the test set and the start of the later part of the training set. This is the right edge of each red region. As before, cases near each other but on opposite sides of the training set / test set boundary have similar values for indicators and the target, which results in optimistic bias in the performance estimate. The bottom line is that bias due to overlap at the boundary between training data and test data is a serious problem for both cross validation and walkforward testing. Fortunately, the user can invoke the OVERLAP option to alleviate this problem.
— By David Aronson
Part 1 of this series can be found here, Predictive-Model Based Trading Systems, Part 1.
David Aronson is a pioneer in machine learning and nonlinear trading system development and signal boosting/filtering. Aronson is Co-designer of TSSB (Trading System Synthesis and Boosting) a software platform for the automated development of statistically sound predictive model based trading systems. He has worked in this field since 1979 and has been a Chartered Market Technician certified by The Market Technicians Association since 1992. He was an adjunct professor of finance, and regularly taught to MBA and financial engineering students a graduate-level course in technical analysis, data mining and predictive analytics. His recently released book, Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments, is a in-depth look at developing predictive-model-based trading systems using TSSB.
Contributing authors are active participants in the financial markets and fully engrossed in technical or quantitative analysis. They desire to share their stories, insights and discovers on System Trader Success and hope to make you a better system trader. Contact us if you would like to be a contributing author and share your message with the world.
Please log in again. The login page will open in a new window. After logging in you can close it and return to this page.