A simple answer is that AI is stupid and that our most advanced technology here is really just throw something at the wall and see if it sticks. The long and dull technical analysis of this problem you get is very likely to be wrong at all.
In fact the extremely immature intellectual community in this emerging field (almost all current papers are on arXiv) doesn't really know why things like what you say happen, just as we don't really know why momentum is useful. We only know that it does work, maybe.
Btw, contrary to what many people think, even in economics, which is pretty much the most mathematical and theoretical community outside of math and physics, we still don't know why the economy grows, or even really know if it does.
Would you be willing to share how ChatGPT build this for you?
Heres what I think I would do:
(1) Ask an LLM to come up with a python script to scrape the factor reference page of Portfolio123, including all the factor descriptions. Output it in csv. (https://www.portfolio123.com/doc/doc_index.jsp?vocabType=)
(2) Upload the scraped factor reference of (1) Ɣnd an XML of a ranking system that I have build in the past.
(3) Ask an LLM to come up with individual factors based on (1) and (2). Probably my prompt would be: "Use the factor reference of Portfolio123 (csv) and my example ranking system in Portfolio123 that contains factors that I am content with (xml) to suggest additional factors. The factors should make financial and/or commercial sense and I expect an explanation why each of the suggested factors would work".
So first of all, DeepSeek isnāt my favorite LLM, and if someone wants to argue that Claude 3 is better, Iād simply agree.
That said, maybe the reduced power consumption of a Mixture of Experts (MoE) algorithmāsomething Grok already usesāis a step in the right direction for those of us who care about climate change and energy efficiency in AI.
For sure, Iām not interested in using LLMs to count the number of rās in strawberry or for writing a high school report on China and Taiwan.
Hereās something DeepSeek can do that this forum probably wouldnāt help me withāand it was remarkably fast:
Q: "I trade at Interactive Brokers and want to compare two different trading strategies. For simplicity, I'll define success as getting a buy fill better than yesterday's close and failure as not doing so. Iāll track results in a CSV spreadsheet. Can you give me Python code for a Thompson sampling explore-exploit algorithm to help maximize my use of the best strategy?"
A:
import numpy as np
import pandas as pd
import random
# Load the CSV file
csv_file = 'trading_results.csv'
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
df = pd.DataFrame(columns=['Strategy', 'Success'])
# Ensure both strategies are in the DataFrame
for strategy in ['Strategy A', 'Strategy B']:
if strategy not in df['Strategy'].values:
df = pd.concat([df, pd.DataFrame({'Strategy': [strategy], 'Success': [0]})], ignore_index=True)
# Count successes and failures for each strategy
success_counts = df.groupby('Strategy')['Success'].sum()
failure_counts = df.groupby('Strategy')['Success'].apply(lambda x: len(x) - x.sum())
# Thompson Sampling
def thompson_sampling(success_counts, failure_counts):
return max(success_counts.index, key=lambda s: np.random.beta(success_counts[s] + 1, failure_counts[s] + 1))
# Choose a strategy
chosen_strategy = thompson_sampling(success_counts, failure_counts)
print(f"Chosen Strategy: {chosen_strategy}")
# Simulate a trade (replace with real trade logic)
trade_success = random.choice([0, 1])
# Log the result
df = pd.concat([df, pd.DataFrame({'Strategy': [chosen_strategy], 'Success': [trade_success]})], ignore_index=True)
df.to_csv(csv_file, index=False)
print(f"Trade Result: {'Success' if trade_success else 'Failure'}")
print(f"Updated CSV saved to {csv_file}")
Point beingāLLMs can do some things better than my wife, and even better than this forum at times. It all depends on what you ask them.
Just trying to figure out what to do with them, and focus our efforts in achievable goals.
Lot's of noise about LLMs for stock picking, making predictions, managing portfolios, understanding news better than humans, outperforming analysts, etc. This seems like a waste of time since they are naive and gullible. Not good traits in the market which is all a bunch of head fakes.
So first steps definitely use LLMs as a productivity tool for P123: train them using our forums + reference, answering questions, writing formulas.
After that, not sure yet.
Thanks
Where have I been? I never even heard of WPS, the Chinese counterpart of MS Office, used by 500M people. Seems much slicker than the bloated MS suite. China is quietly pulling ahead in many areas, now also nuclear fusion.
DeepSeek was developed by High-Flyer and uses a Mixture of Experts (MoE) architecture, meaning it likely has a finance-specific module, which DeepSeek has confirmed (assuming it has insight into its own internal workings).
However, as far as I can tell, this does not mean that anyone using DeepSeek has access to the exact same finance module that High-Flyer uses. Iām fairly certain itās not the same, though itās probably not too heavily dumbed down either.
How Youād Really Use It for Finance
To really get value out of it, youād need to train it on domain-specific data. For example, you could: Use FactSet sentiment data Train it with historical early-morning earnings reports Use future returns as a target variable Build a model to predict whether buying at the open or close would yield a better price
In other words, this is just like any other neural-net machine learning approach, except youād be starting with a pre-trained LLM that already understands financial reportsāwhich gives you a serious head start.
Whatās High-Flyer Likely Doing?
I doubt Portfolio123 (P123) can do this out-of-the-box, but Iād guess High-Flyer is doing exactly this (and more) behind the scenes.
The Key Takeaway
Nobody is saying that simply getting API access to ChatGPT, Claude 3, or DeepSeekāwithout any fine-tuning or additional trainingāwill give you high-quality financial predictions. Training would have to be part of any realistic plan.
I prompt engineered it and fed it the XML structure / dictionary - using paid version I keep all p123 into a project.
(So yea - your guess is basically correct).
The key is knowing how to prompt - gotta speak LLM. Not perfect by any means but it did find a strategy strategies that I have not been able to in 10+ years here - nor know of anyone else who's been able to do it.
Got this in 1 days work along with 20+ other models.
Hope you'll provide us a text editor with the AI tool, Marco.
Based on current research, the āseems likeā could be removed.
MML's ability to make news-based predictions only lasts a day. This does not offset the transaction costs.
MML is suitable as a secretary, not a manager or expert. Tasks such as fulfilling programming instructions, teaching novices, or extracting numbers from lengthy financial reports are among its comparative advantages.
To this day, AI remains ineffective in replacing even lowly artists and lowly writers, not to mention the phantom dreams of AGI and even ASI.
I think this would be a challenge for anyone, and if there were an additional cost, Iām not sure Iād sign up. Using LLMs for investing would require a real commitment with an uncertain ROI, which is beyond my pay grade to fully evaluate.
Okay, thatās an exaggeration. I have no idea if this would actually pay off for P123 , which is fine, considering I donāt have to make that decision.
That said, I do use ChatGPT, and Iām still exploring its full potential. At the very least, itās great for technical assistance, like setting up a random forest correctly if you havenāt done it before. And in some cases, Iāve gotten solid insights on more complex questions as well.
But can LLMs actually work for stock prediction?
(Not necessarily for P123, but in general.) Thatās an interesting question.
So again, if someone were to implement some version of this they might end up using āstackingā where the output from the LLM (rank) would be just one input into a larger model. With whatever weight it deserves.
First integrate them to help users get the most out of P123 tools and automate tasks. That's obvious.
Second use them in novel ways to improve alpha. Either directly as an additional input into a larger model, or indirectly by extracting new data points from textual sources, or perhaps charts.
The MS-Top10-Cap-GPT didn't even perform better than the SP100, not to mention that its sample (horizon and number of stocks) is too small and there are too many parameters that can be secretly overfitted
It is a small sample size too (which the authors also acknowledge) meaning anything positive (if there is anything) would have no statistical significance whatsoever.
It's also clear that the authors trained their model, rather than using the same ChatGPT version available to me. My main takeaway from the paper was understanding what training an LLM for stock selection might entail, rather than treating it as evidence that LLMs can actually outperform. The training seemed pretty involved.
I also recognize the potential publication bias and bias in my own search, even though I wasnāt intentionally trying to filter the literature. It was a very brief search.
And, as ChatGPT itself pointed out:
"Without training, LLMs will mostly regurgitate general financial logic, which is far less useful than your carefully validated process."
So at best, LLMs would need significant training to be useful for stock prediction. And even then, thereās no proof that training alone would work in any studies I have seen.
For now, given my budget constraints, I'll continue using LLMs as a productivity toolābut I wonāt be relying on ChatGPT 4o (or similar non-API versions) for any stock predictions.
Are you referring to how features are entered? You can copy features to the clipboard or download to csv, do your edits, and import. When you import you can choose to overwrite.
DeepSeek appears to be pushing competitors to release new models earlier than planned, such as ChatGPT o3 and now Gemini 2.0.
It may also be contributing to the delay of Grok 3's release on X (formerly Twitter) as they work to incorporate CoT reasoning and other enhancements before launch.
The new Gemini 2.0 Flash Thinking Experimental seems to have the most complete chain-of-thought (CoT) outputIāve seen so far. Unlike ChatGPT, which briefly flashes a reasoning preview, Gemini 2.0 keeps the full chain-of-reasoning visible throughout the response.
I was impressed. However, the Flash version struggled to fully understand some of my questions. That said, it explicitlycarried out what it thought I was asking, and I could clearly see its reasoning in the chain-of-thought. Hoping for faster and more refined responses, I signed up for the Gemini 2.0 Pro Experimental version (first month free) after spending all morning testing Gemini 2.0 Flash Thinking Experimental.
I was able to get responses that I couldnāt obtain with any other online LLMs. Gemini (part of Google) seems to have more seamless access to online data, reducing the need to prompt excessively for up-to-date information.
As far as I can tell, Gemini has the highest token allotment for context among current LLMs. This means you can upload multiple research PDFs, ask numerous related questions, and expect it to maintain context throughout the discussionāsomething that can be a challenge for other models.
Note: I am not claiming I now know how this can be used and which LLM is actually best for what purpose.