When you see the performance of a trading system, how do you know it’s good? How do you know it’s the right system for you? Many people simply look at the net profit assuming the system with the more profit must be the better system. This is often far from a good idea. When comparing trading systems during the development process or when comparing systems before making a purchase, it is nice to have a few metrics on hand that will allow you to compare the system either to a hypothetical benchmark or against another system. There is no one single score you can use that will work for everyone since we all have unique risk tolerances and definitions on what we consider tradable. Likewise, not all scoring systems are equal or perform under all circumstances. However, in this article I’m going to talk about my favorite methods used to score and rank trading systems. These are my key system performance metrics that I use during the system development process.

Any trading system should have a “significant” number of trades. What is significant? Well that varies. For a swing system that takes no more than 10 trades a year, having 100 trades is good. This represents about 10 years of historical testing. As a given trading system starts to produce more trades per year, I would expect to see more trades utilized during backtesting.

While net profit can be a factor in your decision about a particular trading system, profit factor is often even more important in my opinion. Profit factor measures the efficiency of your trading system. Profit factor is calculated by dividing the generated profit by the generated losses. A profit factor of 1.5 indicates for every two dollars lost, three dollars are gained ($3 win / $2 lost = 1.5). Obviously a number above 1.0 means you are making money. I like to see a profit factor of 1.5 or higher.

Like profit factor, the average profit per trade tells me if a system is making enough money on each trade. When designing a trading system I like to see an average profitable trade above $50 before commissions and slippage are deducted at an absolute minimum. If the average net profit is above $50 with commissions and slippage deducted, that’s even better. The higher the average profit per trade the better.

I don’t follow this too much. I make note of it but it’s not all that important to me. The percent winning trades is simply the number of trades that generated a positive net profit divided by all trades taken. This factor can be important if you don’t like to have a large string of losers. For example, often longer term trend following systems can be very profitable, but only have a win rate of 40% or less. Can you handle many losing trades? Maybe you are only comfortable with systems that tend to produce more winning trades than losing trades. If so, then a system with a win rate of 60% or higher would be better for you. Percent winning trades is a psychological tolerance indicator that will vary between people.

This describes the growth as if it were a steady, fixed rate of return. Obviously this does not happen when trading as your trading system produces a jagged equity curve over time. Yet, this is a way to smooth your return over the same trading period. Let’s say your trading system produces a 5% CAGR over a 10 year period. Over that same period you have a bank CD that also yields a 5% return over the same time frame. Does this make the CD a better investment? Maybe. One thing to keep in mind is this: the CAGR calculation does not take into account the time your money is at risk. For example, while the trading system may be retuning 5% CAGR over 10 years, your money is only actively in the market for a fraction of the time. Most of the time it’s sitting idle in your brokerage or futures account waiting for the next trading signal. CAGR does not take into account the time your money is at risk. Remember, a 5% return in the CD is realized only if your money is locked away 100% of the time. With our example trading system our cash is also freed up to be put to use in other instruments.

This calculation takes into account the time your money is at risk in the market. This is done by taking the CAGR and dividing it by exposure. Exposure is the percentage of time (over the test period) that your money was actively in the market. I like to see a value of 50% or better.

How big are those drawdowns? Can I mentally handle such a drawdown? Along these lines I also look at the shape of the equity curve. Does it climb with shallow pullbacks or does it have steep pullbacks? Are there long extended periods with no new equity highs? Ideally, the equity curve should rise as time goes by, creating new equity highs with shallow pullbacks.

This is one you don’t see much of. The t-Test is a statistical test used to gage how likely your trading system’s results occurred by chance alone. You would like to see a value greater than 1.6 which indicates the trading results are more likely to not be based on chance. Any other value below indicates the trading results might be based upon chance. The t-Test value should be calculated with no less than 30 trades. Below is the t-Test calculation.

t = square root ( number of trades ) * (average profit per trade trade / standard deviation of trades)

Expectancy is a concept that was described in Van Tharps book “Trade Your Way To Financial Freedom”. Expectancy tells you on average how much you expect to make per dollar at risk. Expectancy might also be a value that you optimize when testing different strategy input combinations. While computing the true expectancy of a trading system is beyond this article, it can be estimated with the following simple formula.

Expectancy = Average Net Profit Per Trade / | Average losing trade in dollars |

For those no too familiar with mathematics, the vertical lines around the “Average losing trade in dollars” indicates the absolute value should be used. This simply means if the number is a negative value, we drop the negative sign thus making the value positive.

This value is an annualized expectancy value which produces an objective number that can be used in comparing various trading systems. In essence the Expectancy Score factors in “opportunity” into the value by taking into account how frequently the given trading system produces trades. Thus, this score allows you to compare very different trading systems. The higher the expectancy the more profitable the system.

Expectancy Score = Expectancy * Number of Trades * 365 / Number of strategy trading days

With the above values we can get a decent picture on how the system will perform. There are, of course, other values you could evaluate and even more you can do such as passing the historical trades through a Monte Carlo simulator. But these values discussed in this article are the important values I utilize when designing a system or when evaluating a third party trading system.

Jeff is the founder of System Trader Success - a website and mission to empowering the retail trader with the proper knowledge and tools to become a profitable trader the world of quantitative/automated trading.

**Session expired**

Please log in again. The login page will open in a new tab. After logging in you can close it and return to this page.

Hi Jeff,

you correctly put “number of trades” at the top of your list, because all other metrics depend on the validity of your sample.

Unfortunately you appear to follow a line of reasoning (as many others do) that I believe to be false: that is giving importance to the length of the time span that the sample is derived from. You argued that 100 trades are good if they cover 10 years.

Well, 100 trades are 100 trades independently of the time span. So the important question is whether or not a sample of 100 has validity.

For an eye-opening read I strongly recommend: Kahneman, “Thinking Fast And Slow”, pg. 109ff

The point he makes is that small samples can easily be impacted by (rare) outliers. This effect is in no way reduced simply because your sample covers a longer time period!

Just as an example: How valid are system results based on 1000 years of trades? Very valid?

Well, what if the rules are to buy the major stock index at the close of last day of the century and sell on the open the following day? That would give you a whopping 10 trades to base your analysis on. Now how valid is that?

So how can we deal with the outlier problem?

My suggestion is to cut away the top x% of your winning trades (5%-10% appears reasonable to me) and examine how much your performance degrades (measured by whatever metrics you want to apply). This is called a sensitivity analysis and will show you how much your total performance relies on a few great trades (likely to be outliers that will not repeat as much or at all in the future).

A second method is the t-test that you mentioned. The only difference I would make is to apply it on the risk adjusted profit (or loss) per trade, not on the absolute P/L. Basically you divide the result (in amount of dollars) by the risk as defined by the initial stop (also in amount of dollars). From those numbers you then calculate the average and standard deviation that go into the formula you show above.

Anyway, try to get a copy of that book, which I strongly recommend to all traders!

Cheers,

TK

@TK

I would generally agree with what you are saying about sample size, and one question I was going to address to Jeff was whether, like most statistical procedures carried over to analysis of market data, a larger sample size than 10 is not necessary for the T-Test?

Regarding your suggestion above though (“cut away the top x% of your winning trades”), is this not dependent on the nature of the strategy in question? For instance, given a sample size of 100 and an exreme outlier-dependent strategy with a 10% win rate, then cutting the top 10% means disregarding one single trade. Though, over an average of 10,000 trades, the top X% may on average be only slightly more profitable than the top X+n%, in the case of the sample size of 100 the top X% trade (i.e. the one single trade selected), may be many multiples more profitable than the X+n% remaining.

In other words, what you describe, with certain types of systems and without a very large sample size, risks creating a counter-productive “black swan” style exclusion that is not representative of the intended effect of this process. Instead of mitigating the impact of outliers, you risk introducing a further outlier counter-measure.

Would be interested to hear your thoughts on this if I have explained myself well enough.

Regards,

BlueHorseshoe

Of course, above should have read X-n% – there’s no “edit” button like on the forums!

TK,

Thanks again for the thoughtful reply. So sorry for getting back to this thread days later. It was a hectic week last week. I’ve heard a lot of good things about “Thinking Fast and Slow”. I’m currently reading “The Big Short” and will add your recommendation to my reading list.

@BH

I am not sure if I understood your comment correctly, so forgive me if my answer should not match what you meant.

The purpose of the “cutting procedure” is to find out to what degree the test results were impacted by outliers. If there is a big impact then you have a high risk that the results of the SAMPLE (test) are NOT representative of the real life performance later on.

>>>” …is this not dependent on the nature of the strategy in question?”<< rugged equity curve with high variance).

>>>”For instance, given a sample size of 100 and an exreme outlier-dependent strategy with a 10% win rate, then cutting the top 10% means disregarding one single trade. Though, over an average of 10,000 trades, the top X% may on average be only slightly more profitable than the top X+n%, in the case of the sample size of 100 the top X% trade (i.e. the one single trade selected), may be many multiples more profitable than the X+n% remaining.”<<>>”In other words, what you describe, with certain types of systems and without a very large sample size, risks creating a counter-productive “black swan” style exclusion that is not representative of the intended effect of this process. Instead of mitigating the impact of outliers, you risk introducing a further outlier counter-measure.”<<<

I am not sure what you meant with this paragraph. What I talk about is eliminating the trades from the statistical evaluations. Of course you should NOT introduce any rules to your system that cut "home runs" short if they should really happen. My point is to see if the system can hold up if the excellent trades are a lot less frequent (in real life) than the sample may make you believe they might be.

Looking forward to your reply,

TK

@Jeff

Any comment regarding the lookback period vs. sample size? By the way: the system I described is called "Millenium Bull" and available for a few Galactic Credits at http://www.JabbaTheHut.com =:p

Sorry, there appears to be a problem. I try to post the missing section now:

” …is this not dependent on the nature of the strategy in question?”

Actually this procedure REVEALS the nature of the system. Is it made up of evenly sized profits (small impact of outliers = smooth equity curve with little variance) or a few “homeruns” (big impact of outliers => rugged equity curve with high variance).

TK

P.S. Jeff, feel free to delete the first repost.

What about assessing OOS performance or verifying whether any OOS performance was done at all?

While this article does not talk specifically about out-of-sample vs in-sample, the same metrics apply. Not during all instances will OOS performance available when looking to buy a system however, this is an important step in testing. if you are developing a system you should always test OOS data as it gives you a better idea of how your system performs. The next step is to actually test it on live data. Many people are shocked to see their system fail to perform on the live market as bars form in real-time. Often this is due to an incomplete understanding on how bars are built tick-by-tick and how the trading code is executed against that data. This is very important for intraday trading systems.