If I were managing money, I would hedge with SH and then lever up 3x. The expectation would be for 50% better AR w/ half the drawdown w/r/t SP500 - if the model keep working!
Yes i have variable hedging strategy and crash protection based on CNN fear and greed momentum
Usially i hedge with my short strategy but now i am getting 4% refunds on hedging SP i am shorting mostly with SP futures, the slippage and transaction cost are also 100x smaller
Wow! Options as a way of reducing transaction cost and/or getting leverage? Whatever your answer, very advanced. Nice! I am not capable of that in my SEP-IRA account. Not sure I could do that with a regular account. Are you already selling options or this is all SP futures. which I guess is a type of option? Anyway, pretty uninformed but pretty cool.
I scraped it using waybackmachine, i also recreated it, but it is not 100% the same.
I want to provide my crash protection indicator soon as a service though, not willing to give it for free.
Now i am working on a enhanced one, which uses additional sets of data and machine learning to recognize hidden stock market downside risk.
Getting back to your original question about testing your model’s validity. I fully agree with Yuval. I focused on the number of years of excess returns because it was a new idea (and a good one). But felt i was probably already getting wonky with my mathematical answer.
The best way to formalize what Yuval suggests might be a Wilcoxon Signed-Rank test as you probably already know. Monthly or weekly data if you are not happy with a smaller n.
Also it would be a walk in the part for you to try some Bayesian methods, bootstrapping etc?
IMHO, a p < 0.01 (or BF 10) is necessary but not sufficient (using a Wilcoxon Signed-Rank test say). A better-than-that p-value could be noise or pure garbage still, but anything worse is guaranteed to be useless going forward out-of-sample. Remember, we are NOT randomly selecting anything (including features). We are starting with cherry-picked factors and already over-optimized public ranking system. Some of the stuff is even published which is not necessary a good thing on the topic of cherry-picking. Or, getting back to my point, what p-value you should choose. Or to restate that simply, a published feature (or anything published) should demand a higher (not lower) p-value. Much, much higher if you think about it. Support and a source: Bonferroni correction
Just my approach to your question now that I think you more than understand the math. You probably have you own mathematical methods that you have found useful and I would be interested. FWIW.