Hype vs. Reality: How Are AI-Driven ETFs Performing?

I`m trying to find out if ML will help me generate better results for my trading, and as a part of this, I have been studying the AI-driven ETF that is available. There are not many of them yet, but still, the ones that are out there seem to do poorly. https://www.dividend.com/active-etfs-channel/how-are-ai-driven-etfs-performing/

(BUZZ seems to be the exception. )

Any thoughts on why? Is there anyone who is already using ML actively to get some good results? Are there any of the DMs that use ML, or is there any other public backtest that uses ML?

Where do I begin?

First, I suggest that AI enthusiasts throw out the 2020-2022ish time frame. In our generation we have never experienced a pandemic of this magnitude and chances of seeing these market conditions again are practically zero. Do not make any judgements based on this timeframe, and donā€™t develop models using data from that time period.

Second, the article referenced was biased. The author should have chosen relevant benchmarks for each ETF analyzed. i.e., donā€™t compare value versus broad growth as that makes no sense.

Third, I would be careful about drawing conclusions based on performance of ETFs while not understanding the motives or qualifications of the AI model developer. In many cases, the mastermind behind these ETFs probably has less practical knowledge and skill than many P123 users. The ETFs have to be first movers in a unique area. Once they are established and have large assets under management then ETF performance is almost non-relevant. They can run for years with substandard results.

2 Likes

I donā€™t see slapping a latest buzzword like ā€œAIā€ on top of ETF does make it any better or interesting. What is important is to understand the approach that ETF is using and does it make sense.

I believe ML can generate better results. I view it as an automation tool to automate going through vast amounts of data and test ideas quicker, but itā€™s still hard work. Beside the trading idea, I have two big concerns: 1. overfitting 2. amount and quality of data. How do the ETFs address these?

1 Like

Thank you for the reply both of you, and yes, I agree, there could be weaknesses in the way these AI etfs are constructed (overfitting, short trading period, bad data quality), and I know 2020-2022 is a short timeframe, but it still raises the question, why canā€™t even professionals get this approach to work?

It will be exciting to see how members here will use ML. Maybe our results will be betterā€¦

From your response, it doesnt seem that you recognized my response.

It isnt that 3020-2022 is a short time frame. The time period is so different than anything seen prior that one cannot rationally expect a system (AI) trained on prior data to work well.

Second, the comparison of some of the AI ETFs to a broad growth strategy is simply wrong. The benchmark has to be relevant.

I donā€™t understand this comment. All of our ranking systems are trained on prior data and most of them continued to do well during this period. Are ranking systems more persistent than ML systems? Certainly 2020-2022 were terrific years for me, and for a number of other P123 users too, and it wasnā€™t because I suddenly came up with new approaches.

Iā€™m very much looking forward to trying and using P123ā€™s ML systems once theyā€™re up and running, but I confess to being a bit confused about what advantages theyā€™ll offer to ranking systems. Both the ML systems and ranking systems take a bunch of factors and apply them to the process of stockpicking. Iā€™m guessing that ML systems do this quite differently from ranking systems, but Iā€™d love for someone to point out the shortcomings of ranking systems that ML improves on. As a long-time proponent of comprehensive ranking systems, Iā€™m really interested in what, if any, defects they have compared to ML.

Admittedly this is a Python example but my presently funded model was boostrapped 10,000 times with a clear improvement in the results.

This took about 3 minutes with Python. I admit to not using the optimizer in a long time to be able to compare my final results to results using the optimizer. I do believe the optimizer is a good tool that can give good results. But time-wise, Mod() might have been a little slower for me I think.

Also my ML model is not a common one. Any success (or failure) of my models is not really a comment either way on P123ā€™s AI/ML models. But what I am using could never be done manually in 1,000 years I do respect and appreciate that.

1 Like

I believe from your response that you used ML to create a ranking system. I was asking about using ML instead of a ranking system. The folks who write these ML papers and create these ETFs are not creating ranking systems. Instead theyā€™re using ML to buy and sell stocks without ranking.

1 Like

Yuval,

Thank you for the clarification. My ranking system outperforms everything else I tried. My little studies actually support what you said.

Best,

Jim

Thank you for replying Yuval.

ā€œI donā€™t understand this comment. All of our ranking systems are trained on prior data.ā€

First, there is no point in comparing ranking systems with AI-generated predictions. This is an apples-to-oranges comparison. Ranking systems are not ā€œtrained on dataā€ and the strategy is different.

Second, I think that all of us could use some education on how to evaluate performance. ā€œContinued to do wellā€ is not a scientific assessment. One needs to consider the performance objectives and the benchmark used to evaluate results. The latter is sadly missing from pretty much every analysis done here at P123.

BTW - all of the P123 live strategies were replaced in 2023. Probably a coincidenceā€¦

I do want to point out that @marco will be creating ranks with his AI/ML and continue to use ranks as signals to buy and sell stocks. He has been using ranks before most of us and is well aware of the advantages, I think. Happy to be corrected if something different is planned regarding the use of ranks at P123.

Also I have great respect for what people do with the optimizer. Personally, I would like to try that along with cross-validation. If time and resources allowed I would like to see Marco automate that process. I think it could be automated. Once it was fully automated it would be called: Machine Learning. Machine learning with the use of ranks if you prefer.

The classic optimizer with cross-validation has some serious advantages. Bootstrapping, early stopping and model averaging (ensemble learning) with P123ā€™s optimizerā€“along with cross-validationā€“would be nice. I just do not have the time to do that using spreadsheets myself.

I appreciate the easy (automated) rebalance in the mornings and that is just the start of what Marco has automated if I were to look under the hood, I think.

I appreciate Marcoā€™s continued use of computers and automation, myself. I donā€™t really care if he uses the old P123 optimizers or uses gradient descent to optimize. XGBoost happens to use gradient descent but the developers of XGBoost could have done it with P123ā€™s optimizer. The developers of XGBoost literally could have used P123ā€™s optimizer and would have used it if it was a more efficient method that gradient descent.

I think the optimizers (P123 classic optimizer and a gradient descent optimizer) arrive at the same answer if both are given enough time and resources.

I donā€™t see how it is a question of whether ranks are used or not since ranks will be used in either case. I see it as more a question of whether you prefer full automation of the process and whether you have a rational preference for one method of optimization over another (one method arriving at a more accurate answer and/or being faster than the other).

2 Likes

My understanding, and I could be wrong, is that AI will be used to create potentially better factors that could he used in ranking systems.

3 Likes

Jim - ranking systems are optimized either by hand or using some automatic process. This process maximizes the top bucket or the slope of buckets. But this is different than NN which attempts to minimize the error in some forecasted factor. Apples and oranges.

Gradient descent is a versatile optimization algorithm that can be applied to a wide array of problems. For example gradient descent can be used to optimize linear regression as Sklearn does here: sklearn.linear_model.SGDRegressor

Neural-nets do use gradient descent also as does XGBoost.

BTW, the method of optimization used by many at P123 is an "evolutionary algorithm" or ā€œgenetic algorithm.ā€

With regard to genetic algorithms, ChatGPT has only good things to say:

"In portfolio optimization, genetic algorithms can help identify the optimal or near-optimal combination of assets or factor weightings that maximize returns, minimize risk, or achieve a balance of both, according to the userā€™s objectives. The iterative, adaptive nature of genetic algorithms makes them particularly suited for exploring complex, multidimensional optimization problems where traditional optimization techniques may struggle.

Gradient descent is the winner for ā€œconvex problems.ā€ Genetic algorithms are fine for any problem and may be better for non-convex problems, as far as I can tell.

I am good with P123 expanding genetic algorithms. But whatever optimization algorigthm P123 choses to use, I would prefer to be able to use it with cross-validation, early stopping, ensemble techniques such as bootstrapping and regularization.

@marco is already making that possible. Mostly with gradient descent for now, I believe. I am grateful.

Jim

1 Like

Using AI tools to create more ranking systems has some appeal in that these tools may improve efficiency and speed and offer some performance improvements. In addition, using AI offers the possibilities of creating dynamic ranking systems that change organically over time based on new inputs.

But ranking works in a very specific way (with a huge number of variations). I would be very interested in learning about the alternative approaches to stockpicking that AI might offer that are not based on ranking systems. I have read about AI algorithms in various places, but I have not found an easy-to-understand explanation of how AI can be used to pick stocks to buy and sell. And I am reasonably confident that the managers of AI-based ETFs and the authors of academic papers (e.g. Lopez de Prado) are not using ranking systems when they apply AI to stockpicking. I suppose I could be wrong, but they never seem to mention ranking. At any rate, if anyone comes across an explanation of how AI is applied to stockpicking without ranking, Iā€™d love to read that.

I was kind of dabbling in AI before the pandemic, trying to build better analyst estimates for specific market niches. I found that I could improve the estimates by a small amount up to the start of the pandemic but once the pandemic hit, the results were worse than the original analyst estimates. I canā€™t speak for the modeling that Jim is talking about, but if I get back into AI development I will exclude the period of the pandemic because I believe it would distort the results.

Yuval,

More of a refinement of ranking methods than a substitution for using ranks. You could sort the predicted returns of any model (using machine learning or whatever). Then use the sorted returns to create a rank. A rank where a stock with a rank of 100 has the largest predicted returns over the rebalance period and a rank of 0 has the smallest predicted returns (probably a negative number).

Perhaps this is what P123 will be doing to create a rank with AI/ML but yā€™all are pretty secretive and we have never had any interaction with your AI expert in the forum.

Anyway, you could then predict the transaction costs for selling the lowest-ranked stock you are holding and buying the highest ranked stock you do not hold. Going ahead and making the trade if and only if: (the predicted returns of the purchased stock over the rebalance period) - (predicted return of the lowest-ranking held stock) - (transaction costs for making the trade) is greater than zero ā€”giving you a positive net expected value for the trade. Or set a threshold above zero for this equation before making a trade

Your posts make me think you are probably doing some of this already by looking at the slippage. Your calculations could be made more exact by using the predicted returns in your calculations.

The data will be pretty noisy but if there is little bias in the data it should be a modest improvement over just using ranks in a heuristic way based on backtests. Using ranks or RankPos does workā€”on averageā€”in backtests using inexact slippage data and no information on the expected or predicted returns. This new method would incorporate a little more information including how well the ticker is expected to do and a more exact slippage calculations (which you already use I think).

Even if you were to make no changes from what you are doing now, you could probably run a machine learning algorithm alongside of what you are doing and make a quick calculation of whether there is a positive expected value in making the trade. Especially, if you are already doing slippage calculations.

Maybe just red-flag some transactions. Like not selling a stock to buy a stock with RankPos 15 with low liquidity. I.e., not a higher-ranked stock with a greater predicted return and incurring a high transaction cost.

Applying this methodology could prove particularly insightful for managing large Volume Weighted Average Price (VWAP) transactions, potentially mitigating the adverse market impact of trades executed towards the dayā€™s end that do not contribute net positive value. In other words not making trades at the end of that day that you calculate would have little positive expected value due to the market impact of your trades earlier in the day. Making a smaller VWAP order in the morning or closing the VWAP order later in the day based on the market impact you have seen during the day. You could calculate when or if you might want to close a VWAP trade. You could even do the calculation of when you might want to consider closing a VWAP order proactively in the morning.

Not sure this helps. Just an idea that loosely addresses your question of how machine learning could add to (but not replace) using ranking systems.

BTW, I donā€™t see this as an answer to how do Prado might make trade. He seems to use more technical time-series data and likes to classify the trades using his ā€œtriple-barrierā€ approachā€”as you point out. :slightly_smiling_face:

Jim

1 Like

Fascinating. If the predicted returns are based on some ML algorithm, why not use them instead of ranking? Just buy the stocks with the highest predicted returns and sell the ones with the lowest. That would truly be an interesting alternative to ranking systems. I donā€™t know how ML predicts returns based on factors, but if it does so without multifactor ranking, then itā€™s a true alternative and Iā€™d love to learn more about it.

1 Like

That is the main thing to do. They may not use the word ā€œrankingā€ but they said buy the assets with the highest predictions and sell the one with the lowest predictions.

You can also use the ranking system as an ensemble where instead of ranking factors, it would rank multiple ML predictions and generate a final rank based on that (Another ML model could be used instead of the ranking system for ensembling).
You can generate an infinite number of ML models by changing the algorithm used, the algorithm parameters, the factors used as inputs and the label (what you want to predict, which could be short-mid-long term return, change in volatility, increase in volumeā€¦)

2 Likes

Riccardo, Marco, ChatGPT, the AI expert P123 hired and some texts might be your best resources now that predicted returns have piqued your interest in the subject. I am sure I cannot give subjects like ā€˜gradient decentā€™ and ā€˜backpropigationā€™ the in-depth discussion they deserve for anyone serious about the subject. Not in one post in the forum anyway. But as to the usefulness of predicted returns:

Yuval,

I am not sure this will interest you but I was running HistGradientBoostingRegressor yesterday on a DataMiner download. The predictions were all kept in a Pandas DataFrame that I never looked at until now.

Your post made me interested in what the predicted returns actually were. The ranks here are for the P123 ranking system. But the PredictedReturns are from the boosting model. There is lot to consider here. For example, there is a correlation between the ranks and predicted returns it seems, but the correlation is no 1 (one)ā€¦ This suggests to me that both methods probably have some predictive value but which is actually better? I doubt that the answer will always be the same for every P123 member or set of features. Hmmmā€¦. but if you average the methods or stack them as Azouz suggests maybe you could leverage the strengths of all of the diverse methods including P123 classic. Now that is an idea worth look at!!!

Or in other words, use what works for a given situation. Maybe combine what works into a new model.

Also, in the example below with predicted returns, you might sell a lower ranked stock to buy ZIM as its predicted return is 2.46% and you might expect this return to outweigh the calculated transactions costs. But you might not sell a stock to buy WGO as its predicted return is only 1.12%. While the PredictedReturn for WGO may be better than a Ticker you are holding the transaction costs could make that trade a net expected loss. Anyway, I wanted to look at this myself and thought it may interest you too. I think it makes the whole thing less hypothetical for me to see this:

BTW, if programming were easy for me I probably would have incorporated the slippage into the predicted returns or had a separate column that I could sortā€¦ Use that sorted column to streamline the decision as to whether I should trade a lower ranking stock for a higher ranking stock based on the predicted returns and incorporate the estimates of slippage into that column.

And actually Python could do a ā€œfor loopā€ with a printout of the transactions with a positive expected net gain and the expected net gain for each of those transaction. Add that to the downloaded spreadsheet and leave the final decisions to you.

In other words you could rank the transactions too if you want to keep ranks (i.e., rank the predicted benefit of selling one stock and buying another as a transaction). :slightly_smiling_face:

Jim

2 Likes