Beta LLM FOR PREDICTION: a cost-effective roadmap

Training an LLM model for prediction would be expensive and difficult.

Even if P123 did this people would site this paper that Trendyist posted questioning everything P123 had done:

There is a paper that already has extensive backtesting of a method: Financial Statement Analysis with Large Language Models

To be sure Trendyist's concerns apply here. But the authors were aware of the concerns and addressed them to some extent. People will have various prior beliefs about how good of a job the authors did at addressing those concerns. Personally I don't think I will ever be 100% convinced either way unless I can see some out-of-sample results!

Rather than redoing and trying to improve on what the Chicago Booth School of Business has done you could just duplicate what they did with ChatGPT or DeepSeek's API to make predictions going forward (getting some out-of-sample results in the process).

So I would just replicate what they have done each time a new earnings report comes out which is public information that can obtained through Edgars or some other low-cost sources. People can look at the Booth School paper for a pretty good backtest and develop their own belief as to how useful it is.

Then I would do 2 things:

  1. Give members access with all appropriate caveats. I would at least paper-trade a model using this in the buy rules myself. As would anyone thinking LLM models might work. I would probably use it in the buy rules and exclude any stocks with a prediction of negative returns.

  2. Keep track to give out of sample results myself and P123 could keep detailed records.

Note: Chain-of-Thought (CoT) does not require true training.

Obviously if you think you can do a better, more credible backtest than the Chicago Booth School of business did, you should backtest it yourself.

But it is a tested model that members can decide how convinced they are about the results, and P123 and its members can test it out of sample to update their beliefs. After a while there will be enough out-of-sample results that the Booth School of Business paper and any debates about its methods will be irrelevant.

If it were me I would also test a few other ideas out of sample WITH CoT where needed.

DeepSeek would have the advantage of being open source and in a few years machines and programmers may be cost effective enough to bring it in-house which could be done with an open-source program. And the API for DeepSeek is cheaper now. But the paper used ChatGPT 4o so that API has been specifically tested for this particular method.

Just an idea without me caring much what P123 does with it. But for sure I would not redo what the Chicago Booth School of Business has already done and I wouldn't train a LLM either when CoT preformed so well. CoT is just a script attached to each question.

Or maybe just skip the LLM for prediction for now (or use a better, more cost-effective idea).