There are a lot of people on the web selling algorithms (for many assets) that give an outstanding past return, but that end up performing really badly when used in real time. The usual cause for this difference between past and future performance has its roots in how the algorithm was designed and tested. As a university professor that works in machine learning, I always have a hard time trusting a predictive/reactive algorithm when I know nothing about how it was designed. If the person/website tells you nothing about that part, I think it’s better to run away. My reasoning comes from one of the main issues found in machine learning/artificial intelligence/complex algorithms. For those not familiar with that field, machine learning is a way to train a network/algorithm on past data in order to be able to recognize/predict or react to future events. The best example is Apple Siri on your phone, which learned to recognize the relation between phoneme and words to eventually recognize speech. Learning to recognize important events in your past data (in our case, previous drawdowns or market tops) is usually not that hard. Your algorithm can easily model with almost 100% accuracy your learning data. But we actually don’t really care about how good we are at recognizing the past. The real reason we build these systems is to perform well in the future, in situations we haven’t exactly seen. For that reason we always divide the data we have (for example the Bitcoin data of the past 11 years) into 3 buckets: one that we will use for training/designing the algorithm, one that we will use to optimize the performance, and one last untouched bucket that we will just use for testing at the end.
Why is this approach of training, validation, and testing phases so important? As an example, I have an ongoing Masters student that worked on recognizing imperfections in wood panels to assess the quality of the wood. I’m skipping the details, but he benchmarked 6 different algorithms. The one that worked best on the first two buckets of data had an accuracy of 95% at spotting if there were defects in the wood or not. When trying the algorithm on previously never seen data, the performance fell to only 52%, which is barely better than flipping a coin. Another algorithm had a success rate of 78% on the two first datasets, but was able to maintain a 76% accuracy on the never seen data of the third bucket. This is what we actually want, since only the real world, unseen, future performance is important.
Going back to our problem, if you are not aware of this issue, it’s easy to use all the historical Bitcoin data to build your algorithm and think that the success you have will be repeatable in the future. The dumbest strategy (and this is a pitfall that machine learning can very easily fall into) would be to actually learn/memorize the date of any drawdown and when it ended. You would then get an outstanding return, but one that would not be repeatable in the future. When we worked on our Bitcoin strategy, we initially focused only on the data from 2011-2015. Only when we were happy with the results we then tested it on a longer period. Applying what we did on the 2011-2015 time frame to the 2015-2018 dataset gave instantly (without any optimization) 90-95% of the results of the final strategy. The last 5-10% came after re-optimizing the strategy on the 2015-2018 data. In any case, we had a clear rule: touching 2018 to 2022, even if it was just to test once, was forbidden. Only when we were considered done with designing the algorithm on the afternoon of Friday October 14th 2022, did we allow ourselves to see the end result on the actual 2018-2022 cycle. In that sense, it’s as if these last 4 years are the future for the algorithm, so it gives some overview of its potential real time performance. Naturally, there are one or two trades in that period that are not perfect, and it would be possible to change the algorithm to get better results on that cycle, but there is no point in doing so since we could easily fall into the trap of overfitting (over designing to cater to specific events) which could actually degrade future performance. This doesn’t mean that we will never improve the algorithm, but before doing that we will need to have either more data or discover some new on-chain metrics that we think will give some extra redundancy to the algorithm by tracking another aspect of the blockchain we never thought of.
In summary, I know that this part about design methodology can be a bit harsh or boring, but I hope that you learned two things : 1. That with complex algorithms and machine learning there is a fine line between memorization of the past and actual learning from the past that can be applied to the future, so be careful before trusting some random internet algorithm and 2. That we followed a proper methodology to maximize the chance that the past performance will be repeatable in the future. The future will definitely give us unseen surprises, so I don’t expect to always have the exact same performance, but I sincerely trust the algorithm enough that I have already put 100% of WU capital behind its shield. One last thing that I would highlight is that I am incredibly confident that our Bitcoin strategy will capture the big moves down and still unhedged in time to not miss the important parts of new uptrends. These big moves are really the easiest to get. The hardest situations are usually in the pessimistic sideways markets. These market conditions are ones that trigger a lot of drawdowns that start brutally, thus triggering the hedge flag, but that find their floor pretty quickly. This often leads to going back in the market with 1-3% lost.
Now that you know how and why we made our algorithm, as well as our analysis on its strengths and limitations, we hope that our strategy and indicators can help you make the most informed decisions about how best to protect your capital.
(P#7.3_2022)
Comments