The Ivy Portfolio article posted several weeks back resulted in a lot of good discussions. The main thrust of many criticisms of the Ivy Portfolio came down to curve fitting. Was the Ivy Portfolio curve fitted to produce such great results? There was some question on the quality of the historical data provided by ETF Replay, but that will be beyond this article as I don’t have evidence that their data may be tainted. Besides, the curve fitting aspect is much more relevant to our main theme here at System Trader Success.
Curve fitting can be a huge problem as there are many ways to curve fit a system. It can even be introduced into your testing without you knowing about it. In fact, it’s hard to get out of bed in the morning without curve fitting. Within the comments of the original article there were some good examples of where curve fitting can creep into your testing. For example, curve fitting could be introduced by…
- Choosing the lookback period for the simple moving average filter
- Choosing the number of top instruments to trade
- Choosing the instruments to trade
- Choosing the ranking formula
Just about anytime you make a decision when building a trading system, you introduce the possibility of curve fitting. That’s why it’s important to think carefully on how you choose particular values. In my personal endeavors I always test trading system parameters independently over a range of values. When looking at the results I want to see some consistency among the tested values. That is, if I’m testing a lookback period for a moving average I don’t want to see a large variation between a 10 period vs. 20 period. I’ve seen systems where you change one value by a single increment and the results change dramatically. This is a clear warning flag. Instead, I prefer to see a a clustering of results which gives me confidence that the system is not curve fitted based on a select value.
In this article I would like to perform some simple quick-and-dirty testing to see if we can gain more confidence in the performance of the Ivy Ten Portfolio. The tests are far from exhaustive but should shed some light on the robustness of the trading system which was both created and tested in the original article. The system is a simple relative strength portfolio that consists of 10 different ETFs. Every month the top three producing ETFs are picked from the 10 potential ETFs. A simple 5-period moving average is also applied to a monthly chart to act as a regime filter. Trades are only taken if a given ETF is above the moving average. The system is called the Ivy 10 and is explained in more detail here. The same testing assumptions and methods used during the original article are used for all the tests within this article.
Out of Sample Results
The concepts and trading instruments that make up the Ivy Ten trading system were published in early 2009. Thus,the book was written from 2008 and earlier. Just for the sake of testing, let’s assume 2007 as our starting point for out-of-sample results. Yes, there may be some overlap which includes some in-sample results, but starting from 2007 will give us 191 trades over 1,500 days. This should be a nice number of trades for out out-of-sample testing. The following trading results are from 2007 through 2012.
Ivy-10 Trading System
Total Return: 123%
Below is the equity graph of the trading system in green, and our benchmark (SPY) in blue. Overall, it appears the trading system has held up nicely since it was first conceived in 2008.
It’s my impression the Ivy Portfolio was designed to mimic the trading instruments of the Ivy endowments by picking ETFs which represented the large asset classes utilized by the endowments. However, were the instruments picked as to show good results? In other words, is the trading system curve fitted by only performing well on the portfolio of ETFs proposed by the author? To test the robustness of the system I created several different portfolios of broad market ETFs. The assumption here is that we are more likely to not have a curve fitted system if it demonstrates solid performance over different portfolios of ETFs. The following test was conducted on data from 2002 through December 28, 2012.
International Portfolio- ECH, EGPT, EPHE, EPU, EWZ, FXI, GXG, IDX
U.S. Leveraged Portfolio - DDM, DUG, QLD, ROM, SSO, URE, UYG, UYM
U.S. Major Market Indices Portfolio – DIA, IWM, NYC, QQQ, SPY
Sample Portfolio – This was an example portfolio that was included when I joined ETF rewind. It included: EWC, GLD, IEF, SHY, SPY
U.S. Sectors and Group Portfolio – IGN, XLB, XLE, XLF, XLI, XLK, XLP, XLU, XLV, XLY
Bond Portfolio – BLV, BND, CFT, LQD, PCY, SHY, TIP, TLT, WIP
Commodities Portfolio – DIA, IWM, NYC, QQQ, SPY
Currency Portfolio – CEW, FXA, FXC, FXE, FXF, FXY, UDN
The Ivy-10 portfolio is on the high side, but it’s certainly not an outlier as the international portfolio. Five of the other portfolios also generated double digit CAGR results. The U.S. Leveraged Portfolio created nearly identical CAGR but had less Total Return due to the fact the ETFs only existed very recently. What was striking to me as I performed this test is the trading system performed the two main tasks it was designed to do when compared to the benchmark: 1) increase total reruns and 2) reduce drawdown. All portfolios executed against the relative strength strategy did just that.
Testing Lookback Periods
The trading system utilizes a simple moving average applied to a monthly chart to determine if a given instrument is within a bullish or bearish mode. A given ETF will only be purchased if it’s above this moving average. In this test I looked at modifying the lookback period of this moving average over the values two through 16. Below is chart containing the results as well as a graph depicting the total return as the lookback period was modified.
The following test was conducted on data from 2002 through December 28, 2012.
The recommendation of using a lookback period of 10, as stated in the Ivy Portfolio book, is clearly not an optimal value. In my version of the rotational system I halved this value to produce a lookback period of five. The value five produces similar results to lookback periods of two, three, four, and seven. The lookback period of six produces the best results. The lookback period of five does not appear to be an outlier and there is a clear orderly pattern to the values. All lookback periods produced positive returns. It’s interesting to note the longer the lookback period, the less total return achieved. This is clearly seen in the falling trendline in red. This makes sense as the longer you wait to jump into a momentum trade, the less returns you’re likely to make.
Testing Number of Top Ranked
The trading system will pick the top three ranked instruments to trade. In this test I will vary the number of top instruments to test.
The following test was conducted on data from 2002 through December 28, 2012.
Again, there is a clear orderly progression of reduced returns as you increase the number of top ranked instruments. This makes sense when you consider as you increase the number of top ranked instruments to trade you are “diversifying” your holdings. This reduces returns and reduces drawdown. Notice the drawdown also falls as you increase the number of instruments chosen. This is a great example of balancing returns vs drawdown – a classic dilemma. All values produce positive returns. The recommended value of three does not appear to be an outlier.
It appears the trading concept can be applied to other portfolios of ETFs. It also appears two of the key trading parameters (the lookback period and the number of top ranked instruments to pick) are not highly optimized. While this is far from a complete test, it should give some confidence to relative strength trading system proposed in the book, The Ivy Portfolio.
Here is an Excel document that contains the individual trade informatino as generated by ETFReplay.