System Parameter Permutation Beats Data Mining Bias

Scaling Out

Recently, an interesting new perspective has emerged regarding trading system development. The National Association of Active Investment Managers (NAAIM) has just announced a $10,000 Wagner Award for Dave Walton of StatisTrade for his pioneering work in exploring a new method for trading system development, which he calls System Parameter Permutation (SPP).

It’s a well-merited prize, since SPP neatly solves the age-old issue of data mining bias.

This article will summarize the implications of System Parameter Permutation, and the entire 33-page paper can be downloaded here. SSP is exciting because it opens up an entirely new horizon in trading system development. It has the power to help boost quant traders to the next level, so it’s worth taking a careful look.

Many traders have been frustrated by trading system performance that doesn’t live up to expectations, and it seems that data mining bias (DMB) is mostly to blame. The effects of DMB are misunderstood and underestimated by most traders and system developers.

Even well-seasoned institutional quants fall prey to the effects of DMB, and very few traders have been able to overcome it. The value of SPP lies in its power to mitigate DMB and allows developers to effectively design mechanical trading systems according to their probability of future success, not past success.

Permutation

System Parameter Permutation seems the perfect tool for active traders who only use mechanical trading systems, and it’s said to be effective for trading systems using any time frame. It helps traders answer these two fundamental questions:

What is the long-term performance expectation for a given trading system?

What is the worst case short-term performance (i.e. drawdown) that must be tolerated in order to achieve that long-term performance expectation?

Defining the parameters for System Parameter Permutation

SPP is intended to be used only with fully mechanical trading systems using quantitative algorithmic trading rules. So, it’s perfect for today’s traders.

In order to keep the explanation in his award-winning paper simple and easy-to-follow, the author keeps the definitions and instructions simple.

For the sake of simplicity, the author’s primary study of System Parameter Permutation was based on ETFs using “long” trades only, since shorts would complicate the input assumptions regarding borrowed shares, callbacks, dividends, and interest charges. The author used a historical simulation period of about seven and a half years. Commissions and other costs were calculated at typical levels.

The results of the SPP-fueled simulations were limited to four metrics: Compounded annual returns, maximum drawdown, annualized information ratio, and the annualized standard deviation of daily returns. To compare the effectiveness of SPP in predicting trading system performance, the simulations were also checked using legacy out-of-sample (OOS) methods.

How to mitigate data mining bias?

Data mining bias, also called over-optimization, curve-fitting or over-fitting, is a trading system developer’s worst enemy.

Most developers build DMB into their systems without understanding what it is and how it poisons systems. As a result, their systems are doomed to perform worse in the future than historical back-testing would have them believe.

There are issues caused by DMB result from the preconditions inherent during the system development process, namely randomness and the multiple-comparison approach. DMB causes the resulting performance metrics to be inflated on the side of success.

Data mining in order to find the best set of trading rules means that developers end up with only the best results of historical performance, which are not the same as the rules for best future performance.

In fact, the law of statistical regression toward the mean indicates that the “good luck” of the past is unlikely to be repeated in the future when using that same set of trading rules.

Savvy system developers use various methods to mitigate DMB, including cross-validation by examining system performance after regression to the mean has already occurred. Or, they may attempt to compensate for bias, or multiply their results by a deflation factor in hopes of neutralizing the effects of data mining bias.

Thus, DMB creates systematic, difficult-to-quantify errors by focusing on the results from good luck, while ignoring the likelihood of bad luck. In contrast, System Parameter Permutation accounts for the occurrence of both good and bad luck.

Using System Parameter Permutation to determine system performance

The limitations of DMB mitigation mean that developers often suffer from inaccurate predictions about trading system performance going forward. In contrast, SPP offers a handy method for accurately estimating performance as well as a way to test the statistical significance of results, free of data mining bias.

Best of all, SPP works very well alongside the standard optimization tools used in commercially-available trading software packages.

Beyond avoiding the issues caused by DMB, System Parameter Permutation also enables traders and developers to objectively check the performance of a trading system’s long-term “edge”. And, it allows them to determine a system’s short-term “worst case” performance. For systems already in use, SPP helps determine when and whether the old rules no longer work.

How SPP works

System Parameter Permutation works by generating a large set of sampling distributions of a system’s performance metrics. Each individual point in the distribution results from a historical simulation of portfolio effects. From these sampling distributions, developers and traders can evaluate the system based on any desired performance metrics.

SPP uses the statistics from these sampling distributions to estimate the system’s performance as well as providing measurements of the statistical significance of the distributions.

In contrast to standard optimization methods, System Parameter Permutation does not merely pick a single “ideal” set of parameters to be used to create a set of trading rules that would have been historically successful. Instead, SPP uses all the performance data for all sets of parameters that were evaluated during the optimization.

For each metric, SPP generates a sampling distribution that incorporates the hypothetical trade results from all combinations of parameters. This approach is far different from DMB compensation or cross-validation, since those methods use only the result of a single “best” set of trades in order to predict the system’s performance.

SPP relies on the median performance in each distribution for several reasons: (1) The median is not influenced by DMB; (2) the shape of the distribution curve is unimportant; and (3) the median is unaffected by outliers.

The steps to SPP

To generate the performance-metric sampling distributions, a developer must first determine an appropriate set of parameter ranges for the trading system, then create a sampling distribution. SPP is based on the following steps:

1. Determine the scan ranges of the parameters for the system;

2. Divide each individual parameter scan range into the desired number of observation points;

3. Perform exhaustive optimization of every possible combination of parameter values using a historical simulation during the chosen time period;

4. Combine together these simulated results from each and every variant to build a sampling distribution regarding each desired performance metric, such as compound annual return (CAR) and maximum drawdown.

With the System Parameter Permutation method, each point in the sampling distribution is derived from the simulation run according to an individual system variant. Depending upon the time and computing power available to the trading system developer, any number of performance metrics may be checked.

The cumulative distribution function (CDF) is then examined for each metric, in order to estimate system performance and arrive at statistical inferences.

It’s critically important to choose SPP parameter scan ranges carefully to avoid data mining bias. For example, if SPP is repeated several times using different scan ranges in search of better results, then it may become infected by a positive bias.

Using SPP to estimate the long-term performance of a trading system

The most important question to be answered by System Parameter Permutation regards the long-term performance of a given trading system. The most accurate long-run estimates are obtained from the sampling distributions based on all available market data. As indicated above, the median value of the distribution offers the best performance estimate for each metric.

Also, traders and system developers can test the statistical significance of these performance estimates, whether in terms of absolute returns or measured against a benchmark. When using SPP, confidence levels and p-values can be estimated directly by using the CDF.

System Parameter Permutation also estimates short-term and worst-case performance

Even though trading system developers are naturally focused on a system’s potential for long-term gains, SPP is also useful for estimating the drawdowns which must be endured in order to achieve those gains.

• All market data are divided into blocks of time equal to the length of the chosen short-term time period (t);

• These time blocks may overlap with adjoining blocks, depending on the time frame chosen for trading signals, e.g. hourly within a day, or monthly within a year;

• The result is a number (m) of time blocks;

• The above-listed Steps 1 through 4 are performed for all (m) time blocks individually.

As with estimating the long-term periods, the length of the short-term periods depends on the trader’s preferences and trading objectives.

So, if a trading system has (n) combinations of parameters, in total (m x n) optimization permutations are calculated from a historical time period t to generate a sampling distribution for each metric being examined during the chosen short-term time frame. Just as with the long-term performance study, the trader may choose any number of metrics for short-term study.

Sampling distributions from the short-term SPP process produce far more individual samples that have a higher variation than those generated by the long-term SPP process. However, each distribution has a shorter time frame and therefore represents fewer closed trades in each sample.

So, the standard error of each short-term sample is greater. When the standard error increases, the variation of the sampling distribution likewise increases.

Armed with these sampling distributions, a trader can make probability-driven decisions about whether to trade a system or not. First, the trader decides on a probability level that he or she considers improbable yet tolerable as a worst case scenario, perhaps a 1% to 5% loss.

Or, the trader may determine the worst case in view of the least-favorable-yet-most-tolerable performance level. The CDF of the short-term sampling distribution is then examined according to the chosen level of performance desired.

If the trader is unable or unwilling to tolerate the indicated probability of loss, then he or she should not trade that system. Thus, System Parameter Permutation provides traders with an objective risk-assessment and risk-management tool.

Why SPP works so well

System Parameter Permutation works because it leverages the statistical law regarding regression to the mean, instead of ignoring it as most other system optimization methods do. As well, SPP takes advantage of modern computing power to quickly extract and utilize the maximum amount of information from all available market data.

Traditional optimization methods calculate performance metrics from the single best set of trades discovered during optimization. Yet, random re-sampling can result in problematic assumptions and data mining bias.

By using a large number of parameter-value combinations, SPP estimates the effect of mean-regression. Using all the available market data ensures that the system is exposed to the widest range of market conditions, and the results contain the smallest possible standard error.

In contrast to random re-sampling, when using SPP the random variations result from changing the entry and exit rules for hypothetical trades using actual market data. Thus, SPP accounts for the effects of both completed trades as well as randomly-skipped trades.

In other words, System Parameter Permutation lets trading system developers explore aspects of a system which otherwise would remain hidden yet possible during real trading.

SPP opens new doors for mechanical traders

Traditionally, developers have built their trading systems according to estimates of performance based on single-point optimization and measures of statistical significance inferred from a limited number of trades. Yet, System Parameter Permutation provides traders with more useful sampling distributions of performance metrics, and it accounts for all historical trades, whether or not they actually occurred.

SPP can help traders confidently predict both long-term gains as well as short-term drawdowns. Best of all, it can help quantitative trading system developers avoid data mining bias which robs them of both their confidence and profits.

What methods do you use to avoid curve-fitting a system?

— by Eddie Flower from blog One Step Removed

Related Articles:

System Parameter Permutation – a better alternative?

Using System Parameter Randomization To Estimate Future Returns

About the Author Eddie Flower

  • Maverick says:

    Correct me if I am wrong but this SPP is not about helping to develop a system that is free of DMB, but merely to provide an early warning signal on a system that is statistically determined to be on the path of failing and thus the trader may choose not to proceed with the system and save himself potential future troubles. Am I right?

    • That’s correct- SPP nor the newer SPR method mentioned in the podcast per Alan’s post are system development methods. Instead they are tools to set more realistic performance expectations on the trading system.

  • Sol says:

    You are correct. But there are more serious problems with the SPP: (a) the method is not new (b) data mining bias is not only curve fitting but mainly selection bias. Watch video from De Prado for a genuine account of data mining bias:

    https://www.youtube.com/watch?v=QxhxLwNbMMg

    There are many more problems with SPP including the arbitrary selection of parameters ranges. In my opinion bit6 Dalton and the author of this articles do not understand statistics and data-mining bias. Sorry but this is true.

  • Alan says:

    This is a repost of an article dated 2014, when the paper was originally published – so not exactly timely…

    I agree there are many issues with this approach, but to give credit to Dave Walton he no longer even advocates this method himself.

    He recently appeared on the Better System Trader podcast, where he addresses some of the concerns and talks about how his thinking has evolved since the paper was first published. Listen here: http://bettersystemtrader.com/051-dave-walton/

    • I think I understand statistics and DMB pretty well. It is a bit presumptuous to make a statement like that. Have you even read the paper, listened to the podcast, or watched the MTA video? I think it is described pretty well and matches Marco’s definition for the most part.

      I’ve talked to David Bailey on the topic and my business partner even helped them with the online tool Marcos mentions in the interview: http://datagrid.lbl.gov/backtest/

      I specifically discuss the selection of parameter ranges in the blog post on BetterSystemTrader: http://bettersystemtrader.com/system-parameter-permutation-a-better-alternative/ and I think Jeff plans on republishing it here.

      As Alan mentions, I readily acknowledge the assumptions and limitations and think there is a better approach than SPP. I think too many folks get wrapped around the axle because DMB, selection bias, selection bias, and many other terms get used interchangeably and can confuse the real issue.

      The goal is to be generally correct, and minimize the probability of being precisely wrong. In other words, we are interested in accuracy much more than precision. Searching for precision is a fools errand given the large amount of noise in financial data.

      Given the high noise/signal level, in order for the “law of large numbers” to work, we need a very large number of samples. There simply isn’t enough historical data to achieve that. SPx generates many observations cross-sectionally, minimizing the probability of luck and maximizing the probability that the observed mean of the approach will converge to the population mean much more quickly.

      I think many would just like a tool or simple set of instructions on how to develop a trading system that will meet their performance objectives and not have to worry about the many pitfalls that includes DMB. I’m sorry to say there is no panacea. If anyone offers you one, you should run away as quickly as possible.

    • Thanks Alan for the reference.

  • James says:

    I watched the video “Learning from the Insurance Industry: Using Stochastic Modeling to Improve Trading System Development” and thought it was excellent. What Dave is calling a Stochastic Simulation I’ve always thought of as a Monte Carlo Simulation. Stochastic Simulation is probably a better name for it because Monte Carlo Simulation implies different shaped distributions are being added together.

    The other variants of Monte Carlo Simulations that Dave refers to I consider to be fake Monte Carlo Simulations and agree that they are very misleading.

    A procedure that I’ve starting doing that I believe to be useful is to evaluate one or a small set of related input series per modeling run. When the model building is complete I do not delete any models. First step is to check the out-of-sample performance of all models as a set. If greater than 90% produced a profit OOS and the level of profitability is strongly biased to the upside then this indicates the input series are probably useful for modeling the target series. If this test is successful then I go ahead and start analyzing individual models. If 50% of the models were profitable with no or little profit bias then the input series is probably not useful so do not consider any of the models.

    Kind Regards,

    James

    • James, thanks for the feedback!

      I agree that the words “Monte Carlo” are overused and, depending on the algorithm, can mean very different things. In its true sense, a Monte Carlo process is simply repeated random sampling. In a trading setting I’ve seen it mainly refer to trade and/or equity curve return series resampling which is why I draw a distinction and refer to stochastic modeling.

      Re: your OOS procedure… I fully agree that OOS is much more useful when you look at performance of all models, not just the optimized one– made that point in the MTA video and the podcast. Kaufman makes almost the same point you are that a high percentage of models should be profitable OOS.

  • aya says:

    My criticism of the SPP:

    1. It doesn’t tell you which parameters to use in actual trading;

    2. If there are many parameters, exhaustive permutation may not be tractable;

    3. If there are too few parameters, the SPP data has low statistical significance;

    4. The SPP isn’t applicable to parameter-less systems, e.g. adaptive/learning ones, which derive their parameters from market data.

    • First all your points are valid. However I address most of these in the Better System Trader podcast/blog post referenced above.

      However, SPP/SPR are methods to facilitate system evaluation, not system development. Therefore #1 was never a goal. With that said, I did offer some suggestions in the podcast.

      #4 is true however I made it clear in the original paper that the method applies only to systems that use parameters.

  • Mark says:

    I’m “just hanging on” a bit to understand this but I think it’s a great idea and something I’ve thought about a lot. In developing a trading system, I feel the entire parameter space needs to be included/tested otherwise you leave the door open to fluke results. In case the Kaufman Dave mentions above is Perry Kaufman, I feel he made this mistake in an article published in the April issue of _Modern Trader_ magazine.

    And if an exhaustive testing of the parameter space is not possible because of too many variables, large ranges, and/or limited degrees of freedom, then my choice would be to junk the system and come up with something simpler. Anything that can’t be exhaustively validated sounds more discretionary to me and I believe it’s easier to make any discretionary system seem as good or bad as anybody likes.

  • >