ML Workflow II (Prado inspired)

marco · March 4, 2024, 1:27am

Thank you all for all the great discussion in ML workflow. It inspired me to learn more from Marcos de Prado. Check out his presentation below where he give tons of suggestions. It’s long, but high level, and easy to follow. And it’s NOT ONLY FOR AI. Lots of suggestions could be applied to improve our current tools, which are really dangerous if misused.

# The 7 Reasons Most Machine Learning Funds Fail (YouTube)
Marcos Lopez de Prado from QuantCon 2018

Here’s a list of things we haven’t considered for our AI implementation:

Triple barrier targets rather than targets with fixed period.
F1 score evaluation metric that combines precision and recall scores
Weighting by uniqueness to combat serial correlation of features and targets (specially if using weekly samples)
Embargo in addition to gaps to prevent leakage
Deflated Sharpe Ratio: formula provided that reduces SR the more experiments you run
Fractionally Differentiated Features (this one I’m still digesting)
Backtest is not research and should be used sparingly.

He really drives home the point on just how easy it is to curve fit. Basically you can get sharpe ratios from anything with only a couple hundred experiments (permutations). He also completely destroys research papers by saying that they are all false, and he does it convincingly.

Cheers

Jrinne · March 5, 2024, 10:32am

De Prado slideshow with F1 scores of additional methods to control or quantitate the false discovery rate of investment strategies (topics discussed in the video): Detection of False Investment Strategies through FWER and FDR