What Trading System Failure Means: Defining and Quantifying Strategy Failure, Part 1

March 16, 2014 5:00 am3 commentsViews: 3316

One of the most important aspects of algorithmic trading is the removal of trading strategies from live accounts when they fail. Knowing when a strategy fails is extremely important as it allows us to avoid taking loses and missing opportunities while giving us the chance to reallocate our capital to use strategies that might be better-performing under current market conditions. However most people do not have a truly rational plan for system failure and others consider failure in only a very limited scope that is actually better fit to hope than to a rational analysis of a trading strategy’s statistical characteristics. Within these posts I want to discuss with you what trading system failure means and how this failure can be quantified in order to allow you to discard trading strategies, either progressively (next part) or entirely at a certain threshold (today’s article). We will go through problems with defining failure, problems with the removal of strategies and potential solutions to these issues.



What does it mean for a system to fail? Many traders will tell you that they have some sort of “failure horizon” after which a system will be removed from their trading account. This is usually a measure of historical loss – such as a given multiple of the maximum drawdown – after which the trader is committed to quit trading the strategy. However such views of “total loss” are excessively simplistic and do not account for either the way in which the system gets to that drawdown or how long that losing period lasts, they also do not account for loss of opportunity (failure to generate profit) and – most importantly – they do not have any strong statistical basis. Why would you use an N multiple of the maximum drawdown as a failure horizon? Why not 2N or 3N ? It is important therefore to have a clear definition of system failure (which is not subjective) and then apply it rationally and to its full extent to the trading strategies being used.

How can we define failure? Failure is always a relative measure because something only fails when compared against a standard we can call “success”. In the case of algorithmic trading a system will succeed when it performs equally or better than the historical simulations used to create the system and it will fail when it performs worse. But how do we compare our system (which is trading live) against a historical result? We need to compare the distribution of returns of our system with the distribution of returns obtained from our back-tests and judge objectively when one of them is inherently different than the other. You need to judge if the distribution of returns you have obtained through your live trading is likely to be a sub-sample of the historical distribution of returns. If you can reject this hypothesis within a given level of confidence (usually 95%), you can say that your system has failed because – in all likelihood – it is no longer behaving in the same way as the system you created.

Using this criterion is extremely powerful because you will eliminate systems that fail due to many different reasons. Systems that get into drawdowns too fast, systems that get into drawdowns for too long, systems that fail to generate a profit, etc. It evolves beyond the simplicity of simple drawdown measures. 

How do you judge if the live distribution is a sub-sample of the back-testing distribution? There are many techniques to do so but it is extremely important to avoid falling into simplistic assumptions. For example you can use Monte Carlo simulations to tie failure against a distribution to some particular measures for your strategy. Failure within a Monte Carlo simulation should always be determined against the same number of trades as you have in your live trading, because the technique generates worst case scenarios associated with specific trade numbers. You could generate a Monte Carlo worst case scenario based on 1000 trades and this will be extremely different from the Monte carlo worst case scenarios generated for 55 or 100 trades because the expectation of drawdown depth, length, etc will change as the number of trades becomes bigger or smaller. It is not the same thing to get into a 10% drawdown in 20 trades than it is to get into the same drawdown in 50 trades. A drawdown value that might not be a worst case at 400 trades might be worst case at 20 trades, because the likelihood of getting into a drawdown of the same depth with a smaller number of trades might be much lower drawing from the historical distribution of returns. In general I have found Monte Carlo simulations to be impractical because they are computationally expensive and need to be updated very frequently (as they depend on trade number).


In the case of highly linear trading systems (R>0.98 and normal distribution of residuals) you can also evaluate worst case scenarios using violations of the linear hypothesis, using a 3.5 multiple of the standard deviation of returns you can say with a 99% confidence that the trading strategy has failed. This way of evaluating worst cases is much simpler and does not require constant re-evaluation of the worst case scenario parameters as the linear equation used to evaluate system failure is a constant determined from the historical results of the strategy. In this technique you are evaluating the general failure hypothesis (is the distribution a sub-sample of the historical distribution) in a more specific way, taking advantage of the fact that the historical results of the strategy fit a very well defined statistical model.

I would also like to point out again that system failure should never be subjective. When you evaluate a worst case you cannot choose which ones to pick and which ones to ignore based on some subjective perception of whether the system has failed or not. For example you cannot perform a 500 trade Monte Carlo simulation and use it as a worst case while ignoring a worst case on a 50 trade Monte Carlo simulation, because you must consider in general whether your distribution of live returns is a sub-sample of the historical distribution. You cannot judge which ways “count” and which ways do not count. Selective statistical testing is something that usually happens when people have a strong reason to “believe” in a trading system. When the attachment – due to economical, psychological reasons, etc – is too great, a trader will always have problems with saying that a system failed, because the burden of failure might be greater than the burden of financial loss if the system continues to trade. For this reason it is also extremely important to have some method to quickly and efficiently generate strategies to replace those that have failed.

Finally I would like to point out that failure from a statistical perspective is a never-recovery point. Once a strategy fails it will never be able to come back to the point where it could again fit into the historical distribution of returns. A strategy might recover from the draw down/lack of profit scenario but the change in statistics caused by this period has made the strategy a failure (forever) relative to its previous historical results. Sure, if the strategy recovers you can reconsider if you want to trade it but you will no longer measure failure relative to the previous standard but you will need to use the new statistics (the ones that failed against your original standard) as part of your new estimation. My experience tells me that it is better to simply generate new strategies to replace systems that have failed rather than trading a strategy that has recovered with a lower standard for failure.

Note that there are other ways to discard strategies that do not involve “hard thresholds” (progressive discarding), something that we will be discussing on my next post. If you would like to learn more about worst cases and system generation and how you too can generate and replace strategies effectively  please consider joining Asirikuy.com, a website filled with educational videos, trading systems, development and a sound, honest and transparent approach towards automated trading in general . I hope you enjoyed this article ! :o)

By Daniel Fernandez from Mechanical Forex

System Trader Success Contributor

Contributing authors are active participants in the financial markets and fully engrossed in technical or quantitative analysis. They desire to share their stories, insights and discovers on System Trader Success and hope to make you a better system trader. Contact us if you would like to be a contributing author and share your message with the world.



  • Daniel, although I appreciate your emphasis on objectivity, I believe your approach is dangerous. You haven’t mentioned anything about your system development approach but many approaches involve data mining in some form and thus are subject to data mining bias. Standard statistical significance testing in the presence of data mining bias is invalid and inevitably leads to accepting a system whose historical performance was inflated via random good luck. Trading such a system live almost guarantees failure.

    Also, I agree with not recommending simple trade result resampling (what you call Monte Carlo simulation) to test whether a system is functioning properly but not for the reason you state. You say that such an approach is expensive computationally which is curious in this day and age. You can solve the # of trades problem by a priori choosing checkpoints that are meaningful to you (e.g. after 50 trades, 100 trades, etc) and simulating them separately.

    The real problem in applying simple resampling techniques to trading is in the following three assumptions: 1) the result of a SINGLE historical simulation is representative of the lifetime (includes the future) distribution of trade results; 2) trading results are independent and identically distributed; 3) real world portfolio effects combined with position sizing are accurately modeled. None of these assumptions are likely to be valid unfortunately.

  • “. You need to judge if the distribution of returns you have obtained through your live trading is likely to be a sub-sample of the historical distribution of returns”

    By the time they have a distribution most are ruined already. This is the wrong apprach.

  • I actually like the approach. Assuming that we have taken care of data mining bias through adequate testing (I suggest checking other posts on this site) we are not usually going to meet with immediate catastrophic loss. But many systems in my experience begin to decay gradually (sometimes rapidly). I can sit and hope they will come back, or I can use an objective approach to prune the systems I have running.

    I like this approach as it is just what I needed – simple and objective.

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.


Download these four simple steps to help ensure you make money on the live market!