In the previous article, “Testing A Simple Gap Strategy“, we were looking at two other filters in an attempt to improve the Gap #1 strategy. The first filter was a day-of-the-week filter while the other filter was based upon the size of the gaps. These filters were tested on our in-sample data segment and appeared to help our trading performance. In this article, let’s see how these filters perform on the out-of-sample (OOS) data segment.
We’ll need a metric to compare our new filters against, and the original baseline system will be perfect. The baseline system will be applied to the OOS and the results will be used as our benchmark.
We can see the baseline performance is performing as expected based upon the in-sample results (see table below). The profit factor, percent winners and annual return are similar between the two data segments. The OOS actually shows a larger profit per trade.
Now we’ll apply our trading model which contains the day-of-week filter to the OOS data segment. The results are below.
Well, this does not look that great. Our equity curve flounders around like a fish out of water. I’d like to see the equity continue to climb and make new equity highs. Not happening here.
Now we’ll apply our trading model which contains the gap size filter to the OOS data segment. The results are below.
This looks better than the day-of-week filter, but the improvement is not much. We have 140 trades on our OOS data segment, but the results are about the same.
It appears our filters provide no real edge or at least, no long lasting edge. I would like to see our OOS performance with our given filter(s) outperform our baseline. It’s not enough to simply be a little better, we would like to see significant difference for the better. In this case we see less performance with our DOW filter and about the same performance with our gap size filter. This may very well mean we should scrap our filter ideas and test other possible improvements.
A word of caution! What we don’t want to do is go back and tweak our current filters to make them perform better on the OOS data. This might be your first natural instinct. This can be a powerful motive if our OOS results looked decent such as with our gap size filter. While going back to our in-sample segment and making them look better is tempting, it’s a big mistake! By making decisions based upon the OOS data segment and using that information to make changes to our system, we’re actually transforming our OOS segment into our in-sample. We are using knowledge gained during our OOS and transferring that modification of the code. Think about that! We want our OOS segment to remain “clean” yet, every time we make a decision based on the OOS results to change our system, we are destroying the very essence of why we have an OOS segment in the first place. We must be careful here.
Now granted, you can’t avoid this completely. I may decide to test another filter and this decision is based upon our OOS results. So, there is some level of information leakage that will taint our design process. This is largly 100% unavoidable. We all bring biases and future information into our historical development. What we hope to avoid is large, glaring mistakes.
Since our performance in our OOS is not an improvement I see three possible actions we can take.
First, we could decide our filters provide no real edge and we must go back to the in-sample segment to try different filters. Note, if we follow this line of action we are taking information obtained during our OOS segment and using that knowledge to modify our system on the in-sample segment. However, this is a reasonable risk I’m willing to accept. I would not go back and tweak the existing day-of-week or gap size filters. We already selected reasonable values based upon our in-sample data segment. It’s a done deal and I would not modify them. I either accept them or discard them.
Second, maybe our filters are working and our system is simply experiencing a normal drawdown or flat period. It may very well resume its upward trend later. Maybe in another six months one of our new filters (or both) may be producing better returns than our baseline for the next five years. It’s possible. Over the years I’ve seen this happen a handful of times. It’s not a likely case but if we were to persue this I would first go back in history during the in-sample data segment and see if we can find a similar period where our performance languished. I want to see if there has been similar periods in our in-sample backtest where the equity curve looks similar to our OOS over a similar time period or number of trades. This would help tell me if the poor performance on the OOS segment is something this system has expereinced before. If it has, just maybe we are not experiencing anything more than a pause before a new push up to equity highs.
To further test this concept, I will set these systems aside and check on them in 6 months or so and see how they are performing.
Third, it’s likely our filters provide no edge and we can simply accept the baseline system as good-enough and move on to the next stage of testing. As the baseline stands, $26 net profit per trade is not large enough for me to risk my captial.
While these three ideas are all valid ideas, I’m going to go back to our in-sample and trying a different filter. That will be for the next article in this series.