Measuring Success: Key Performance Metrics

About the Author Jeff Swanson

Jeff is the founder of System Trader Success - a website and mission to empowering the retail trader with the proper knowledge and tools to become a profitable trader the world of quantitative/automated trading.

follow me on:
  • AnTZ_TK says:

    Hi Jeff,

    you correctly put “number of trades” at the top of your list, because all other metrics depend on the validity of your sample.

    Unfortunately you appear to follow a line of reasoning (as many others do) that I believe to be false: that is giving importance to the length of the time span that the sample is derived from. You argued that 100 trades are good if they cover 10 years.

    Well, 100 trades are 100 trades independently of the time span. So the important question is whether or not a sample of 100 has validity.

    For an eye-opening read I strongly recommend: Kahneman, “Thinking Fast And Slow”, pg. 109ff

    The point he makes is that small samples can easily be impacted by (rare) outliers. This effect is in no way reduced simply because your sample covers a longer time period!

    Just as an example: How valid are system results based on 1000 years of trades? Very valid?
    Well, what if the rules are to buy the major stock index at the close of last day of the century and sell on the open the following day? That would give you a whopping 10 trades to base your analysis on. Now how valid is that?

    So how can we deal with the outlier problem?

    My suggestion is to cut away the top x% of your winning trades (5%-10% appears reasonable to me) and examine how much your performance degrades (measured by whatever metrics you want to apply). This is called a sensitivity analysis and will show you how much your total performance relies on a few great trades (likely to be outliers that will not repeat as much or at all in the future).

    A second method is the t-test that you mentioned. The only difference I would make is to apply it on the risk adjusted profit (or loss) per trade, not on the absolute P/L. Basically you divide the result (in amount of dollars) by the risk as defined by the initial stop (also in amount of dollars). From those numbers you then calculate the average and standard deviation that go into the formula you show above.

    Anyway, try to get a copy of that book, which I strongly recommend to all traders!

    Cheers,
    TK

    • BlueHorseshoe says:

      @TK

      I would generally agree with what you are saying about sample size, and one question I was going to address to Jeff was whether, like most statistical procedures carried over to analysis of market data, a larger sample size than 10 is not necessary for the T-Test?

      Regarding your suggestion above though (“cut away the top x% of your winning trades”), is this not dependent on the nature of the strategy in question? For instance, given a sample size of 100 and an exreme outlier-dependent strategy with a 10% win rate, then cutting the top 10% means disregarding one single trade. Though, over an average of 10,000 trades, the top X% may on average be only slightly more profitable than the top X+n%, in the case of the sample size of 100 the top X% trade (i.e. the one single trade selected), may be many multiples more profitable than the X+n% remaining.

      In other words, what you describe, with certain types of systems and without a very large sample size, risks creating a counter-productive “black swan” style exclusion that is not representative of the intended effect of this process. Instead of mitigating the impact of outliers, you risk introducing a further outlier counter-measure.

      Would be interested to hear your thoughts on this if I have explained myself well enough.

      Regards,

      BlueHorseshoe

      • BlueHorseshoe says:

        Of course, above should have read X-n% – there’s no “edit” button like on the forums!

    • TK,

      Thanks again for the thoughtful reply. So sorry for getting back to this thread days later. It was a hectic week last week. I’ve heard a lot of good things about “Thinking Fast and Slow”. I’m currently reading “The Big Short” and will add your recommendation to my reading list.

  • AnTZ_TK says:

    @BH

    I am not sure if I understood your comment correctly, so forgive me if my answer should not match what you meant.

    The purpose of the “cutting procedure” is to find out to what degree the test results were impacted by outliers. If there is a big impact then you have a high risk that the results of the SAMPLE (test) are NOT representative of the real life performance later on.

    >>>” …is this not dependent on the nature of the strategy in question?”<< rugged equity curve with high variance).

    >>>”For instance, given a sample size of 100 and an exreme outlier-dependent strategy with a 10% win rate, then cutting the top 10% means disregarding one single trade. Though, over an average of 10,000 trades, the top X% may on average be only slightly more profitable than the top X+n%, in the case of the sample size of 100 the top X% trade (i.e. the one single trade selected), may be many multiples more profitable than the X+n% remaining.”<<>>”In other words, what you describe, with certain types of systems and without a very large sample size, risks creating a counter-productive “black swan” style exclusion that is not representative of the intended effect of this process. Instead of mitigating the impact of outliers, you risk introducing a further outlier counter-measure.”<<<

    I am not sure what you meant with this paragraph. What I talk about is eliminating the trades from the statistical evaluations. Of course you should NOT introduce any rules to your system that cut "home runs" short if they should really happen. My point is to see if the system can hold up if the excellent trades are a lot less frequent (in real life) than the sample may make you believe they might be.

    Looking forward to your reply,
    TK

    @Jeff

    Any comment regarding the lookback period vs. sample size? By the way: the system I described is called "Millenium Bull" and available for a few Galactic Credits at http://www.JabbaTheHut.com =:p

  • AnTZ_TK says:

    Sorry, there appears to be a problem. I try to post the missing section now:

    ” …is this not dependent on the nature of the strategy in question?”

    Actually this procedure REVEALS the nature of the system. Is it made up of evenly sized profits (small impact of outliers = smooth equity curve with little variance) or a few “homeruns” (big impact of outliers => rugged equity curve with high variance).

    TK

    P.S. Jeff, feel free to delete the first repost.

  • Mark says:

    What about assessing OOS performance or verifying whether any OOS performance was done at all?

    • While this article does not talk specifically about out-of-sample vs in-sample, the same metrics apply. Not during all instances will OOS performance available when looking to buy a system however, this is an important step in testing. if you are developing a system you should always test OOS data as it gives you a better idea of how your system performs. The next step is to actually test it on live data. Many people are shocked to see their system fail to perform on the live market as bars form in real-time. Often this is due to an incomplete understanding on how bars are built tick-by-tick and how the trading code is executed against that data. This is very important for intraday trading systems.

  • >