The Ivy Portfolio

December 10, 2012 5:00 am39 commentsViews: 7427

Several months ago I finished reading a very interesting book called, “The Ivy Portfolio.” This book was written by two money managers, Mebane Faber and Eric Richardson, who work at Cambria Investment Management. The authors wanted to answer the question of why money managers who manage some of the world’s best Ivy League schools produce such consistent results. Routinely Harvard and Yale endowments produce double digit annual returns. Since 1985 Yale University has returned around 16% annual returns and Harvard over 15% annual returns. Not only did they produce outstanding returns, but they did it by also reducing volatility and drawdown.

Wouldn’t it be nice to mimic the investing strategy utilized by these endowments? Well, the authors do just that. Faber and Richardson set out to explore how these endowments produce such great returns and minimize both volatility and drawdown. They go a step further by providing several simplified, yet effective, models to mimic the trading results of these professionals. The heart of one of their proposed models is a simple relative strength, asset allocation strategy using ETFs.

This really caught my attention. Such a simple model could be a great way to invest within a retirement account as most people have access to ETFs. It’s a long-only strategy (no shorting) and you don’t need to place trades very often.

In this article I want to create models based upon Faber’s and Richardson’s recommendations. To do this I’m going to use the fantastic ETF service called, ETF Replay. What this web service allows you to do is create trading strategies and backtest them on a portfolio of ETFs. The website charges a monthly fee to use their service, but it’s very reasonable. Between the book’s recommendation and ETF Replay service, we can produce a trading model.

The Ivy Trading System

We are going to trade a basket of ETFs. Exactly which ETFs will be explained later. We are not going to simply trade all the ETFs at once. We will instead rotate into the three best performers every month. The concept behind the top three performers is they will likely continue to perform into the near-term future. However, we also wish to reduce drawdowns and avoid holding our ETFs when within a bear market because even the best ranked ETFs during this environment will most likely be falling. Sure they are falling slower than the others in the basket, but we don’t want to be holding any of our ETFs if they are in a bear market. In other words, we wish to preserve our capital during a bear market. Remember, this is a long-only strategy. To avoid holding positions in a bear market we use a 100-day (5-month) simple moving average to filter our trades. We will only buy an ETF if it’s above this average.

Here is a summary of the relative strength ranking model:

1) Rank the ETFs based upon their relative strength. There are many ways to do this, but a very straightforward method is to simply rank each ETF based upon two historical results, a 3-month return and a 1-month return. ETFs are ranked for each of these two returns. A weight is then factored for each rank to compute an overall rank. You can find an example on the ETF Replay website. By using two historical returns, we are taking into account both a short-term return and a longer term return.  The ranking score is computed as the sum the equal weighting of the 20-day return and the 3-month return. These numbers are completely arbitrary. They are not optimized.

Overall Rank Score =  ( 20-Day Return ) *.5 + ( 3-Month Return ) *.5

2) Apply regime filter and only hold ETFs within a bull market regime. This is our old familiar regime filter that I’ve written about time and time again. The authors of the book use a 10-month simple moving average. This is similar to a 200-day simple moving average if you estimate about 20 trading days in a month. However, I choose to pick half that value simply because I would like my model to be a bit more responsive when taking into account the possible onset of a bear market. This value is not optimized.

Bull Market = Close > Average( Close, 100 days);

3) Re-balance the portfolio on a monthly schedule. At this time we evaluate our entire basket of ETFs based upon the two methods above. We then take action.

a. SELL all ETFs that either no longer rank in the top three positions and/or who have fallen below the regime filter.
b. BUY the top three ranking ETFs who are within a bull regime. Each ETF will be dedicated to 1/3 of the available account equity.

4) Our money remains in “Cash” (SHY) whenever not being allocated to a specific asset class.

Given this model it’s possible to hold 1, 2, 3 or zero positions. When money is not allocated to an ETF we move it into cash.

Ivy Five Trading System

We have our trading model ready to go, but what basket of ETFs are we going to trade? The authors first start us out with a very simple basket of five ETFs. These ETFs represent the broadest asset classes we wish to diversify over. Our relative strength rotational model will allow us to ride the best performing asset classes while perserving our capital during a bear market. The Ivy Five are:

BND – Vanguard Total bond market (4-5 year)
DBC – PowerShares DB Commodity Index
VEU – Vanguard FTSE All-World ex-US
VNQ – Vanguard MSCI U.S. REIT
VTI – Vanguard MSCI Total U.S. Stock Market

This basket of ETFs gives us a broad exposure to corporate/credit bonds (BND), commodities (DBC), international equity (VEU), REITs, Preferreds and MLPs (VNQ) and U.S. Equity (VTI). Let’s now use the ETF rewind website to test our Ivy Five Portfolio (green line) on our model. We will use the SPY ETF (blue line) as our benchmark. Returns include dividends but exclude commissions and slippage.

 

We can see our portfolio outperforms the benchmark in several ways. First, it produces a higher total return of 204% vs. 95%. But maybe even more important is during the bear market of 2008 we can see a significant difference in the drawdown levels. While the benchmark was down around 55% and our portfolio was down about half that value at 21%. Overall volatility is also significantly reduced with our portfolio. In the end our portfolio returns a 11.8% CAGR while the benchmark returns a 7.0% CAGR. Our model increases returns while reducing both drawdown and volatility.

Ivy Ten Trading System

Trading only five ETFs is rather restrictive when you consider the vast number of ETFs available. Let’s continue to work with the same asset classes but introduce a few specialized ETFs to expand our diversification.The authors recommend the following ten ETFs:

BND – Vanguard Total bond market (4-5 year)
DBC – PowerShares DB Commodity Index
GSG – iShares S&P Commodity-Indexed Trust
RWX – SPDR DJ International Real Estate
TIP – iShares Barclays TIPS (4-8 years)
VB – Vanguard MSCI U.S. Small Cap
VEU – Vanguard FTSE All-World ex-US
VNQ – Vanguard MSCI U.S. REIT
VTI – Vanguard MSCI Total U.S. Stock Market
VWO – Vanguard MSCI Emerging Markets

The results of running this portfolio through out model is below. Again, the SPY ETF is our benchmark. Returns include dividends but exclude commissions and slippage.

We can see our portfolio, once again, outperforms the benchmark in several ways. First, it produces a higher total return of 291% vs. 95%. Notice this is signifcannly higher than our Ivy Five Portfolio. In regards to drawdown, the benchmark was down around 55% while our portfolio was down 29%. It’s interesting to note we do generate a higher return when compared to our Ivy Five Portfolio, but it comes at a cost of more drawdown. In the end our portfolio returns a respectable 14.7% CAGR while the benchmark returns a 7.0% CAGR. We double our returns while significantly reducing drawdown.

So there you have it. The Ivy Ten trading system in a nutshell. It’s dead simple with some very nice returns.  The authors also demonstrate a 20-ETF portfolio that I may look at in a later article. For now, this should give you some insight to how to mimic the returns and low drawdown of the best Ivy League schools. Can you trade this in your retirement accounts? Well, that’s up to you but I’ve been seriously considering something like this. I find the simplicity and monthly balancing very convenient. Fundamentally, I’ve always liked the idea of momentum investing and knowing when to exit a position during a bear market. This seems to do a decent job of capturing these two aspects rather well.

Jeff is the founder of System Trader Success – an inBox magazine dedicated to sharing great ideas and concepts from the world of automated trading systems. Read More Google

Facebook Twitter 

Testing New List Name: Email: We respect your email privacyPowered by AWeber Autoresponder 
Tags:

39 Comments

  • Nice find Jeff. I’ve been very interested in multi-sector relative strength strategies. I’m also researching doing this with pairs, ie, going long the strongest sectors and short the weakest sectors (or long the inverse ETF). I’m guessing this would provide even less volatility but might reduce profits. I’m not sure, haven’t run the numbers yet.

    • I’m planning on introducing a few short ETFs into the mix and see how it affects the results. Those articles will most likely appear early next year. I like the pairs idea as well. I’ll have to look into that.

      • December’s TAS&C (page 10, “Reducing Risk While Finding Profit”) has a nice article on tracking pairs, particularly with sector ETFs as a way to play sector rotation.

        • Hi Shodson,

          That article looks interesting as I am also researching on using pairs as trading vehicles. Do you know where I can get it as a .pdf?

          Regards,

          ACP

  • Hi guys,

    just a word of warning: As nice as ETF Replay looks, the quality of the simulations is not very good. I know a trader who ran a simulation there and in parallel used his own software and data. When he compared the results the differences were huge (i.e., 400%-points less profit than shown by ETF Replay). He couldn’t find out what caused this, but his best guess is that there is something wrong with their data (or how they deal with splits & dividends). Some errors could be caused by logical errors in their ranking and filtering.

    I don’t want to bash ETF Replay and maybe they fixed whatever problems there were. Yet I recommend to double check any results with software that is transparent for you in what is does and utilizes data that is adjusted (or not) in the way that is appropriate for the task.

    TK

  • All ETFs have different starting days. In the first port6folio of 5 ETFs, VEU starts after March 2007, BND after April 2007 and DBC after Feb 2006. Thus, results are skewed by the positive performance of the VNQ and VTI ETFs and as a result this is selection bias.

    I would recommend to you not to rely on backtests of other websites because they may not even know their own assumptions. Try selecting at random 5 ETFs from a universe of 100. Then backtest with specific position sizing rules. You will see that it does not work and buy-and-hold is much better. Fooled by randomness.

    • Great points, Basha. I checked into ETF Replay about a year ago and was disappointed about limited choices of starting day. Rather than a backtest that starts on the first of each month or the 31st, I’d like to run 31 different backtests where each started on a different day of the month. What we should see is similar performance among all. If some perform significantly better then look for a reason why. Maybe the expected performance going forward should be the average of them all.

      Another thing you could do is run some Monte Carlo simulations and then plot equity curves that represent averages. Drawdown analysis could be studied the same way.

      Fooled by randomness, indeed!

    • Basha, out of curiosity I ran the ETF Replay backtest from 2007 and the portfolio still generated very solid performance. The Ivy-10 generates 122% return while the SPY generates a 14% return. Drawdown is 29% vs 55% respectively.

  • Basha,

    Maybe I mis-understand what you are saying? If you randomly select 5 from a universe of 100, you are not following the idea of selecting 5 that are indicating a positive up trend in an uptrending market.

    And, if you randomly select any 5 of 100, shouldn’t you do a Monte Carlo distribution to truely compare the random approach to the Ivy Approach?

    What am I missing?

    • What is the starting capital and position sizing used? Do they reinvest profits?

      • This comment reply was for Jeff actually.

        To Red: that is what I am saying. You also have to also find out which portfolios that met the ranking cirteria did not work out and then average the results.

      • Hello Basha,

        My understanding about the service is this: The starting equity is 100. Each ETF is weighted equally. In the case for my example Ivy models, each ETF is given 33% of the equity since it picks, at most, the top three relative performers. Returns are total returns which includes profits and dividends. Commissions and slippage are not included.

        • Hi Jeff,

          Having now read the book, I’ve just finished implementing this within TS (although I have to export the equity from the print log and assemble the portfolio equity curve in Excel).

          Once the portfolio is increased to 10 ETFs, can you suggest any reason not to also increase the number of instruments in which one is invested each month to 6? And similarly, to 12 for the 20 ETF portfolio? It struck me that increasing the tradeable instruments in the portfolio isn’t actually diversifying if one can only ever be long three of those at any given time.

          Would be interested to hear your thoughts on this . . .

          BlueHorseshoe

          • Hello BlueHorseshoe,

            As you know, anytime you begin to introduce changes you increase the chance of curve fitting so my first inclination is not to change it. However, I would perform a study testing a full range of top instruments (say 1,2,3…10) simply to see how stable the results are. I would also do this across each of the portfolios. Then I could see how selecting the top three instruments within the 5-ETF portfolio, the 10-ETF portfolio and 20-ETF portfolio performs. I would guess off the top-of-my head that you would find that increasing/decreasing the number of tradable instruments will directly affect the total return and drawdown while the the portfolio remains profitable. I have not tested this however.

            My impression is the purpose to increasing the total number of tradable instruments is not to increase diversification, but to increase the potential for total return.

          • Hi BlueHorseShoe,

            Would you be willing to share the TS code or give us an overview of how the code works?

            Apparently TS can only trade data1. How do you initiate trades in alternate data streams based on a filter?

            Thanks,

            –Stan

    • @Red

      The point that Basha makes is about curve fitting. Not only can you curve fit by selecting the PARAMETERS of your system (like the length of averages; threshold values etc.), but also by selecting TRADING INSTRUMENTS that perform well under your system rules and then run your system on those instruments only.

      The question is whether your system really identifies an edge or just a statistical fluke that appears in a few instruments.

      If there is a general edge (here the assumption is that strength in the past is likely to be followed by further strength in the future) it is reasonable to expect that this behavior shows up in almost any instrument. So if you randomly pick the ETFs to run the system on you should get good results most of the times. If not, well then it is highly likely that you didn’t find a general edge and that the “Ivy 5 Portfolio” or “Ivy 10 Portfolio” could be a curve fit selection.

      As you suggested a good way to test the edge is to randomly select 5 ETFs and repeat this process several times as a Monte Carlo simulation.

      I don’t know what you meant with “If you randomly select 5 from a universe of 100, you are not following the idea of selecting 5 that are indicating a positive up trend in an uptrending market.”

      The “Ivy 5 Portfolio” is the same all the time. They are not replaced by other ETFs, so there is no “selecting 5″ once the initial choice was made.

      Or am I missing your point?
      TK

      • I’ve been wondering a lot about this recently, TK. I think what you said about curve fitting here is spot on. However, do we also have to watch out for curve-fitting by the criteria used to select trade candidates for the rotational system?

        In a nebulous way, this is kind of what I am getting at. Monte Carlo simulation basically says trade #X could have been trade #Y–it’s just by luck that they ended up ordered as they did. Therefore, to get a better feel for how a system might perform, let’s randomize the order of trades and do it 100s or 1000s of times and then take averages for the equity curves and drawdowns.

        When a rotational system has identified trades through a position scoring algorithm, though, it is identifying one and only one set of candidates. For some reason, this doesn’t feel entirely right to me–it feels like we could be leaving the door open to “fluke.”

        Mark

        • @Mark

          Maybe you are mixing things up a little. Please allow me to “unmix” what I understood from your comment (sorry, if you meant it differently and you already know what I am going to write).

          I am refering to “do we also have to watch out for curve-fitting by the criteria used to select trade candidates for the rotational system?”:
          Depending on what you meant the answer is “yes”… =;-)
          a) Yes, selecting the 5 ETFs that the system is performed on can be curve fitting (as described in my comment above).
          b) Yes, the design of the system rule(s) that pick a max. of 3 out of the “starting 5″ can also be a source of curve fitting (i.e., calculation factors, thresholds levels, # of ETFs to pick, etc.).

          Refering to “Monte Carlo simulation basically says trade #X could have been trade #Y–it’s just by luck that they ended up ordered as they did.”:
          Yes, the assumption is that trade #X does in no way influence the outcome of trade #Y. Therefore #Y might as well have happened before #X. Personally I don’t think this is 100% correct, because I tend to believe that winners and losers are somewhat clustered. For example you might have several winners in a row during a bullish market phase and then a string of losers when the market gets choppy or bearish. So it might be somewhat unrealistic to mix the order of the trades in a Monte Carlo simulation completely by random (but elaborating on that would lead to far for now).

          Just beware that the Monte Carlo simulation that I talked about in the previous comment was not mixing the trades, it was mixing the ETFs that are in the “starting 5″. For each simulation run a set of 5 would be picked by random out of all suitable ETFs (maybe 100 to 200). Then you simulate the system over a certain time period and leave the order of trades untouched. But on the next simulation run you would do the same thing with a different set of 5 ETFs. So no shuffling of trades, but shuffling of traded instruments that are presented to the system rules.

          Regarding “when a rotational system has identified trades through a position scoring algorithm, though, it is identifying one and only one set of candidates. For some reason, this doesn’t feel entirely right to me”: In my opinion it is OK for a system to choose the trades it wants to take or ignore. This decision may also very well be based not only on the instrument itself, but also on the behavior of the other instruments in the mix. You probably don’t like the idea that at a specific moment ETF “XYZ” would have produced a trade if combined with ETFs “A”, “B”, “C” & “D”, but that it would NOT have produced a trade if combined with ETFs “E”, “F”, “G” & “H”. This is one effect that the choice of the “starting 5″ has. But if the “position scoring” really has some kind of edge then it will tend to pick the best of all available trades. Sometimes the alternatives matter. Just like you might have picked a different girlfriend if there would have been some super models available in the “starting 5″… =;-)

          TK

          P.S. No offence – you certainly picked the best girl in the world!!! :-)

          • @AnTZ_TK:

            You make a good point about an assumption for Monte Carlo testing that all trades be independent. What kind of statistical test could you run to check for this?

            On a related note, I sometimes struggle with whether to run a backtest including only the first buy/sell signal or whether to include every buy/sell signal as a trade. For example, consider a 20-day breakout system where today we get a new high followed by new highs tomorrow and the day after. Most backtesting would take today’s signal and not allow for “redundant” signals of tomorrow and the day after. Certainly a challenge would be how to position size if you were to allow for redundant signals but I often think “is there anything inherently different between today’s new high and tomorrow’s or the day after?” If you believe in mean reversion then you say yes–tomorrow’s is more likely to be followed by a pullback and the day after tomorrow is even more likely to be followed by a pullback. The fact that trend systems work sometimes means mean reversion does not always hold, though, which means each signal should be included in the backtest.

            Any thoughts on this concept of redundant signals?

            The other big issue I see here is how to validate a position scoring algorithm. Consider a simple one-rule trading system based off a 20-MA, for example. I think it’s critical to optimize in order to know performance of the 20-MA is not a fluke. How does the system perform with MA periods between 10-30 by increments of two, for example? If I plot performance vs. MA period then I shouldn’t see a spike high at 20 surrounded by losses. I want to see a high plateau region with 20 roughly in the middle. This suggests that even neighboring values of MA period traded successfully in the past, which makes me think the edge might persist into the future.

            I think all trading rules and filters can be studied this way (the technical term for this process is “evolutionary operation”). How would this apply to validation of the position scoring algorithm itself, however?

          • @Mark
            I don’t know about any kind of test for the “clustering” characteristics of trades. For most purposes the normal MC method should do OK, but I find it inportant to be aware of the assumptions that any method is based on and treat the results with the appropriate grain(s) of salt.

            As for your 2nd question regarding “redundant” signals:
            The answer depends on what you are examining.
            - If you examine whether a system has an edge then you need to simulate a trade for any valid signal that is produced. You don’t want the start of a simulated trade to depend on whether or not another trade has already ended (because you will have different trades to compare with every time you change the exit rules, so that you will not know if the exit rule in itself is better or simply better trades happened to get started). Of course you could add a rule to your system that prevents giving any further signals until some kind of condition is met thereby reducing the number of VALID signals.
            - If you examine what kind of performance you would get from trading the system you MUST simulate in your backtest exactly what would have happened in a real account. So if you would take the first and any following signals in your account then you simulate that. If you would only take the first signal then this is what you simulate.

            Just beware that any skipping of VALID signals is adding random, which is pretty bad. Each trade that you take or simulate should be independent from any other trade that’s going on at the same time. If you don’t take a trade after a VALID signal for any reason then you can easily get into trouble.

            Last thing: You can validate the scoring algorithm by running a series of MC simulations wit the same set of instruments (“starting 5″), but instead of the scoring algorithm you pick 3 ETFs by random each time a reshuffle is possible. Repeat that 1000 times and then compare the key figures that you get from the scoring algorithm (expectancy, max. DD, …) with the range that the random picks produced. If your algorithm does NOT end up in the top range for each of the key figures it probably doesn’t have much merit (if any) and you might as well make random picks.

            Hope that helped,
            TK

          • Ah a new week so renewed potential for understanding, right?

            Two thoughts on this post, TK. First:

            > b) Yes, the design of the system rule(s) that pick a max. of 3 out of
            > the “starting 5″ can also be a source of curve fitting (i.e., calculation
            > factors, thresholds levels, # of ETFs to pick, etc.).

            If even the number of trades to take may be a source for curve
            fitting then I start to wonder if maybe the whole concept of a
            position scoring algorithm for rotation trading is invalid with regard
            to backtesting. I generally think about keeping all factors but
            one constant and then plotting some metric of performance vs.
            different values of the variable. If the variable is “Top N tickers”
            then it seems far too granular for comparison purposes. Does it
            make sense to compare results from “Top 1″ to “Top 2″ to “Top 3?”
            It seems like the standard error bars might be way too large.

            Second thought:

            > Just beware that the Monte Carlo simulation that I talked about
            > in the previous comment was not mixing the trades, it was mixing
            > the ETFs that are in the “starting 5″. For each simulation run a
            > set of 5 would be picked by random out of all suitable ETFs
            > (maybe 100 to 200).

            It sounds like you’re suggesting comparing the equity curve
            generated by taking the Top 3 each month with an equity
            curve generated by taking three randomized tickers each
            month. How do you compare this, though? There is only
            one Top 3 but there are C(n,3) randomized three where
            n = total number of tickers from which to trade (C denotes
            combinations as opposed to permutations).

            Mark

          • @Mark

            >>Ah a new week so renewed potential for understanding, right?

            That’s up to you! :)

            >>If even the number of trades to take may be a source for curve
            >>fitting then I start to wonder if maybe the whole concept of a
            >>position scoring algorithm for rotation trading is invalid with regard
            >>to backtesting. I generally think about keeping all factors but
            >>one constant and then plotting some metric of performance vs.
            >>different values of the variable. If the variable is “Top N tickers”
            >>then it seems far too granular for comparison purposes. Does it
            >>make sense to compare results from “Top 1″ to “Top 2″ to “Top 3?”
            >>It seems like the standard error bars might be way too large.

            Don’t get too frustrated, but ANY part of a trading system can be used to curve fit (anything that has an effect can be changed so that the outcome is “better”). But that’s not the real issue. What matters is whether or not it is likely that “new” data (i.e., the real trades that you might take) will show a performance similar to your backtest.

            “New” data could be either data not used for the system design so far or future testing in real-time. Typically people use up so much data for the design that there’s not much data left to produce a sufficient number of trades for verification purposes. Real-time testing can have the disadvantage that it may take months or years to get a sufficient number of trades. You may want to take a look at the comments on one of the previous articles here called “Measuring Success: Key Performance Metrics”. Never underestimate the number of trades to fulfill the requirement to be “sufficient”!

            >>Second thought:
            >>It sounds like you’re suggesting comparing the equity curve
            >>generated by taking the Top 3 each month with an equity
            >>curve generated by taking three randomized tickers each
            >>month. How do you compare this, though? There is only
            >>one Top 3 but there are C(n,3) randomized three where
            >>n = total number of tickers from which to trade (C denotes
            >>combinations as opposed to permutations).

            Here’s how you do it: Let’s say you test the years 2001 to 2005 (60 months). You always have the same 5 ETFs, but on a monthly basis you pick 3 ETFs by random that you hold for the next month. Do this 60 times and you have the FIRST simulation run. Take note of whatever metrics are important to you (i.e., expectancy). Now you start again with 2001 to simulate 60 trades where you again pick the 3 ETFs by random. Most of the 60 trades in this second run should be different than the ones in the first run. Keep doing that until you have a “large” number of runs (i.e., 1000 runs). Then you simulate the 60 months with the rule how to pick the 3 ETFs in effect. This is what you compare to what the other 1000 runs produced (average, standard deviation). If your rule based result is not more than 1 standard deviation (the more the better) away from the average then it’s highly likely that it adds NO VALUE to your system.

            But if it HAS VALUE then it might be that this rule is curve fit to the “starting 5″ (or the “starting 5″ to the selection rule). In order to find that out you can do what was actually the suggestion that I wrote above:

            Pick by random the “Starting 5″ out of all ~200 ETFs and then apply the rules how to pick the top 3 on those 5 (the same 5 for each of the 60 months). This gives you one run. Then you make another random pick and again simulate the 60 months to get run #2. Repeat that and then compare the “Ivy 5″ with the 1000 random runs. This shows you whether the “Ivy 5″ are the only ETFs that produce good results. If the system rules should really HAVE VALUE then it shouldn’t matter much which starting 5 you use. Beware if there are only a few runs that produce staisfactory results, because then the “Ivy 5″ might be a curve fit selection.

            Clear or even more confused??? =;-)

            TK

            P.S. Did you see that I wrote another reply to you at the very bottom?

          • Hi AnTZ_TK and Mark,

            Very interesting your discussion and really insightful.
            For me there’s 2 main issues on this topic that I wopuld like to ear Jeff’s opinion:
            1. ETF replay doesn’t give me confidence on their data/results and so it’s not a tool that I’ll use
            2. We curve fit either if we choose the best equities for a set of rules or the best rules for a set of equities.

            Regards,

            ACP

      • @AnTZ_TK wrote: “The point that Basha makes is about curve fitting. Not only can you curve fit by selecting the PARAMETERS of your system (like the length of averages; threshold values etc.), but also by selecting TRADING INSTRUMENTS that perform well under your system rules and then run your system on those instruments only.”

        That is exactly what they have done and it is not onoly curve-fititng but borderline cheating.

        This whole thing is based on Post Hoc fallacy.

        • @Basha: I think we have a plethora of different discussion points here.

          MetaStock advertises a software package with dozens of preloaded systems. You can test a system on a wide variety of tickers, find the tickers that perform well with a particular system, and then trade the system going forward.

          Do you see anything wrong with that logic?

          • Yes, it is called survivorship bias. You should take the mean return of a specific system performance for a great number of tickers, form a distribution of means and then test if the mean of the new distribution could have been achieved by chance. Otherwise you are curve-fitting to ticker selection. This is also what the Ivy portfolio did. You should read the book by Aronson.

          • @Mark

            You asked what is wrong with the logic that MetaStock advertises. Try to look at it that way: You sit a million monkeys in front of a typewriter. After watching them hacking away on their keyboards for a while would you pick the monkey that “wrote” something that actually is correct English and offer him a job as a professional writer for a newspaper???

            In other words: throw enough symbols on any kind of “system” and you will have some that show a profitable backtest. But the million dollar question is: Does that have any predictive value regarding FUTURE profits or was it just a result of random that doesn’t give you any indication of future outcomes?

            Therefore I second Basha’s testing suggestion and also recommend Dr. Aronson’s book “Evidence-Based Technical Analysis”.

            TK

          • TK–I do see your reply at the bottom, here. The flip side of that approach to backtesting (which I disagree with, too) is that every ticker may have its own personality due to the institutions that trade it. This would allow for certain strategies to work on some tickers but not others.

            With regard to your post above, I’m going to mull it over for a few days. I hope to understand better if I can sleep on it a few times.

            I did read Aronson’s book a few years ago. Good stuff but not very encouraging as far as finding systems that really work, as I remember. Now that I’m working toward putting all the pieces of system development together, though, I see it *is* that difficult and I see how many alluring dead ends await to trap me in systems I might think are likely profitable that actually are fluke.

            Basha, thanks to your insightful comments on this thread too.

            Thanks to Jeff as well for posting the Ivy Portfolio article in the first place!

      • We had an outstanding discussion in the comments below, TK, but upon rereading this over a month later I want to focus on another point. You write:

        > If there is a general edge (here the assumption is that strength in the past is likely to be
        > followed by further strength in the future) it is reasonable to expect that this behavior
        > shows up in almost any instrument.

        Isn’t this claim debatable?

        As a challenge, I will state the hypothesis that each ticker has its own trading personality. Why might this be? I think we would all agree that institutions are the big players that command market action. For any ticker, the bulk of trading activity is commanded by X institutions. Suppose a number of those institutional traders, for example, follow 50/200-SMA crossovers. We are likely to see some edge when using a 50/200-SMA crossover trading system for that ticker. Different tickers are traded by different institutions (institutional traders) and so different tickers may be more or less influenced by different technical [or other types of] criteria applied by those traders. For this reason, I may find a trading system that works well for one ticker but no others. This trading may continue to work well for this one ticker until the institutions that trade that ticker significantly change or until the traders responsible for those institutions’ trades significantly change (e.g. fund managers are replaced).

        What are your thoughts?

        Mark

  • fyi, ETFreplay return should not be tested against other software — the returns should be tested against a source that has correct data in the first place. Such data is available on the ETF providers website — such as ishares.com and vanguard.com.

    ETFreplays data matches those websites so if your other software doesn’t match ETFreplays — it is likely because your software is incorrect. or use Morningstar data. I tested 200 of the largest ETFs vs morningstar and they were all correct. clearly, effort is made to ensure data accuracy.

  • Interesting stuff here. I’ve been looking at the Gone Fishin’ portfolio and the Permanent Portfolio and the Ivy but have not been completely impressed, as they all look like they need some tweaks. The 5 month MA makes sense, though I might change that to 200-day, simply because that is widely followed and triggers HFTs (high frequency trades based on algos) and is a compromise between 5 and 10 month MAs. I like the idea of combining a momentum strategy with asset diversification. What I have in mind is bonds, stocks (domestic and world), commodities (including gold), and REITS (domestic and world) as a basic asset structure. Add sector ETFs and country ETFs as an option to boost returns. Also as an option, add inverse (but not leveraged) ETFs. You want the best in class in each category, looking at 1-month, 3-month and 6-month returns, with more emphasis on 3-month. But also look at the charts because some of the ETFs that are lower down on the scale might be touching support and breaking out. Those would be better than the ones that are top ranking but overbought and headed for a correction. Monitor monthly or quarterly. Quarterly might be a better idea as you give things time to play out and don’t act hastily.

    • Ross, these are all great ideas to test. I tested a few ideas in this post. Some of your ideas could be tested in ETFReplay. In future articles I plan on testing how adding inverse ETFs changes the performance.

  • The IVY articles and portfolio risk/return are always good reads.

    On the IVY 10 you use GSC and DBC. These both broad commodity tracking ETFs/ETNs so the back-test is backing doubling up on the allocation on commodities. It would be interesting to see if you remove one of these then run the back test again.

  • I think everyone is looking at ETFreplay and momentum in general, incorrectly. Momentum in general has no predictive quality in and of itself. One can simply confirm this by using the ETFreplay screener and choosing the top ETF out of all available. My back tests show this to be far less effective than using a smaller basket of uncorrelated ETF’s. I think you need to think of ETFreplay as a modified bye and hold strategy. You must want to hold all of the individual assets but you just don’t want to hold them in a down trend. You also want concentrate your money each month in the assets that are presently doing the best.

    ETFreplay is without a doubt a curve fit, but that doesn’t mean it doesn’t work.

    First of all, you should only be choosing asset classes that you want to invest in.

    You want uncorrelated funds so that there is always something strong in any market condition. The Ivy Portfolio works because they are uncorrelated. An all currency momentum strategy doesn’t work well because sometimes all the currencies available as ETF’ go down together.

    You also want types of funds that have lengthy trends. For instance. individual commodities tend to be poor funds to pick because the trends typically don’t last for months on end.

    You also need funds that work well together. 2X funds rarely work well with 1X funds because of the volatility difference.

    Remember, ETFreplay is looking for RELATIVE strength. Relative strength is a curve fit.

    So yes, ETFreplay is very much a curve fit. Momentum is just investing in an asset in a uptrend. In my opinion it still works and beats bye and hold by a mile.

  • daniel morton

    Last attempt, it is removing my syntax which makes my scenarios appear illogical.

    Thanks Jeff. I am trying to understand the constraints of the GTAA 13 AGG 3 system. Faber writes “The assets are only included if they are
    above their long-term moving average, otherwise that portion of the portfolio is moved to cash. We also include the effects of only investing in the top three out of thirteen assets. ”

    I don’t see how this is possible, surely I am missing something obvious. If you are investing only in the top 3 of the 13 assets then how is that not contradictory to “otherwise that portion of the portfolio is moved to cash”? Imagine you have $3 to invest. I am trying confirm what happens in each scenario:

    1. All 13 10 Month SMA then you are 33% in that 1 asset and rest in cash?
    3. 3 Assets > 10 month SMA and 10 assets < 10 month SMA then you are 100% invested with 1/3 in each asset in which case you have 0 in cash which contradicts the rule stated before.

    Thanks for any clarity.

    Daniel

    • Daniel-
      Your questions are a bit odd (some formatting lost in the posting?) but I think I can answer your questions and guess at what you meant. I believe the answers are as follows:
      1. If all 13 are below the 10 month MA, then you are 100% in cash.
      2. If, say, 6 are above the 10 month MA then you pick the best 3 of those 6 to invest in (and ditch other positions).
      3. If only 2 of the 13 are above the MA then you are 33% cash and have positions in those 2.

      Think of it like this:
      1. Find “qualifying” ETFs from your list (i.e. above 10 month MA). Ignore the rest for now.
      2. Rank the qualifiers using formula provided above.
      3. Take top 3 qualifiers and allocate 33% to each. If list is smaller than 3, allocate 33% to each qualifier and hold the rest in cash until the next time you check the system.

Leave a Reply



× one = nine

 Name: Email: We respect your email privacyPowered by AWeber Email Marketing Services