Has anyone successfully tried other LLM models? Such as FinGPT, FinBERT, or Llama 2?

ZGWZ · January 29, 2025, 7:29am

A simple answer is that AI is stupid and that our most advanced technology here is really just throw something at the wall and see if it sticks. The long and dull technical analysis of this problem you get is very likely to be wrong at all.

In fact the extremely immature intellectual community in this emerging field (almost all current papers are on arXiv) doesn't really know why things like what you say happen, just as we don't really know why momentum is useful. We only know that it does work, maybe.

Btw, contrary to what many people think, even in economics, which is pretty much the most mathematical and theoretical community outside of math and physics, we still don't know why the economy grows, or even really know if it does.

Victor1991 · January 29, 2025, 9:02am

Hi @korr123,

Would you be willing to share how ChatGPT build this for you?

Heres what I think I would do:

(1) Ask an LLM to come up with a python script to scrape the factor reference page of Portfolio123, including all the factor descriptions. Output it in csv. (https://www.portfolio123.com/doc/doc_index.jsp?vocabType=)
(2) Upload the scraped factor reference of (1) ánd an XML of a ranking system that I have build in the past.
(3) Ask an LLM to come up with individual factors based on (1) and (2). Probably my prompt would be: "Use the factor reference of Portfolio123 (csv) and my example ranking system in Portfolio123 that contains factors that I am content with (xml) to suggest additional factors. The factors should make financial and/or commercial sense and I expect an explanation why each of the suggested factors would work".

Best,

Victor

Jrinne · January 29, 2025, 9:32am

So first of all, DeepSeek isn’t my favorite LLM, and if someone wants to argue that Claude 3 is better, I’d simply agree.

That said, maybe the reduced power consumption of a Mixture of Experts (MoE) algorithm—something Grok already uses—is a step in the right direction for those of us who care about climate change and energy efficiency in AI.

For sure, I’m not interested in using LLMs to count the number of r’s in strawberry or for writing a high school report on China and Taiwan.

Here’s something DeepSeek can do that this forum probably wouldn’t help me with—and it was remarkably fast:

Q:
"I trade at Interactive Brokers and want to compare two different trading strategies. For simplicity, I'll define success as getting a buy fill better than yesterday's close and failure as not doing so. I’ll track results in a CSV spreadsheet. Can you give me Python code for a Thompson sampling explore-exploit algorithm to help maximize my use of the best strategy?"

A:

import numpy as np
import pandas as pd
import random

# Load the CSV file
csv_file = 'trading_results.csv'
try:
    df = pd.read_csv(csv_file)
except FileNotFoundError:
    df = pd.DataFrame(columns=['Strategy', 'Success'])

# Ensure both strategies are in the DataFrame
for strategy in ['Strategy A', 'Strategy B']:
    if strategy not in df['Strategy'].values:
        df = pd.concat([df, pd.DataFrame({'Strategy': [strategy], 'Success': [0]})], ignore_index=True)

# Count successes and failures for each strategy
success_counts = df.groupby('Strategy')['Success'].sum()
failure_counts = df.groupby('Strategy')['Success'].apply(lambda x: len(x) - x.sum())

# Thompson Sampling
def thompson_sampling(success_counts, failure_counts):
    return max(success_counts.index, key=lambda s: np.random.beta(success_counts[s] + 1, failure_counts[s] + 1))

# Choose a strategy
chosen_strategy = thompson_sampling(success_counts, failure_counts)
print(f"Chosen Strategy: {chosen_strategy}")

# Simulate a trade (replace with real trade logic)
trade_success = random.choice([0, 1])

# Log the result
df = pd.concat([df, pd.DataFrame({'Strategy': [chosen_strategy], 'Success': [trade_success]})], ignore_index=True)
df.to_csv(csv_file, index=False)

print(f"Trade Result: {'Success' if trade_success else 'Failure'}")
print(f"Updated CSV saved to {csv_file}")

Point being—LLMs can do some things better than my wife, and even better than this forum at times. It all depends on what you ask them.

ZGWZ · January 29, 2025, 10:43am

LLM is nice productivity tool, like Office/WPS.

marco · January 29, 2025, 11:48am

Just trying to figure out what to do with them, and focus our efforts in achievable goals.

Lot's of noise about LLMs for stock picking, making predictions, managing portfolios, understanding news better than humans, outperforming analysts, etc. This seems like a waste of time since they are naive and gullible. Not good traits in the market which is all a bunch of head fakes.

So first steps definitely use LLMs as a productivity tool for P123: train them using our forums + reference, answering questions, writing formulas.

After that, not sure yet.

Thanks

Where have I been? I never even heard of WPS, the Chinese counterpart of MS Office, used by 500M people. Seems much slicker than the bloated MS suite. China is quietly pulling ahead in many areas, now also nuclear fusion.

Jrinne · January 29, 2025, 12:09pm

DeepSeek was developed by High-Flyer and uses a Mixture of Experts (MoE) architecture, meaning it likely has a finance-specific module, which DeepSeek has confirmed (assuming it has insight into its own internal workings).

However, as far as I can tell, this does not mean that anyone using DeepSeek has access to the exact same finance module that High-Flyer uses. I’m fairly certain it’s not the same, though it’s probably not too heavily dumbed down either.

How You’d Really Use It for Finance

To really get value out of it, you’d need to train it on domain-specific data. For example, you could:
Use FactSet sentiment data
Train it with historical early-morning earnings reports
Use future returns as a target variable
Build a model to predict whether buying at the open or close would yield a better price

In other words, this is just like any other neural-net machine learning approach, except you’d be starting with a pre-trained LLM that already understands financial reports—which gives you a serious head start.

What’s High-Flyer Likely Doing?

I doubt Portfolio123 (P123) can do this out-of-the-box, but I’d guess High-Flyer is doing exactly this (and more) behind the scenes.

The Key Takeaway

Nobody is saying that simply getting API access to ChatGPT, Claude 3, or DeepSeek—without any fine-tuning or additional training—will give you high-quality financial predictions. Training would have to be part of any realistic plan.

korr123 · January 29, 2025, 6:07pm

What are your goals?

korr123 · January 29, 2025, 11:16pm

Happy to share.

I prompt engineered it and fed it the XML structure / dictionary - using paid version I keep all p123 into a project.

(So yea - your guess is basically correct).

The key is knowing how to prompt - gotta speak LLM. Not perfect by any means but it did find a strategy strategies that I have not been able to in 10+ years here - nor know of anyone else who's been able to do it.

Got this in 1 days work along with 20+ other models.

Hope you'll provide us a text editor with the AI tool, Marco.

feldy · January 29, 2025, 11:26pm

Sorry, what do you mean by this?

korr123 · January 29, 2025, 11:58pm

Signals that I haven't seen or things that were unsuccessfully tried before.

ZGWZ · January 30, 2025, 8:58am

Based on current research, the “seems like” could be removed.

MML's ability to make news-based predictions only lasts a day. This does not offset the transaction costs.

MML is suitable as a secretary, not a manager or expert. Tasks such as fulfilling programming instructions, teaching novices, or extracting numbers from lengthy financial reports are among its comparative advantages.

To this day, AI remains ineffective in replacing even lowly artists and lowly writers, not to mention the phantom dreams of AGI and even ASI.

Jrinne · January 30, 2025, 9:47am

I think this would be a challenge for anyone, and if there were an additional cost, I’m not sure I’d sign up. Using LLMs for investing would require a real commitment with an uncertain ROI, which is beyond my pay grade to fully evaluate.

Okay, that’s an exaggeration. I have no idea if this would actually pay off for P123 , which is fine, considering I don’t have to make that decision.

That said, I do use ChatGPT, and I’m still exploring its full potential. At the very least, it’s great for technical assistance, like setting up a random forest correctly if you haven’t done it before. And in some cases, I’ve gotten solid insights on more complex questions as well.

But can LLMs actually work for stock prediction?

(Not necessarily for P123, but in general.) That’s an interesting question.

Here are some relevant studies and discussions:

Excerpt from the first reference:

"This process involved incorporating a ranking mechanism into the system…."

So again, if someone were to implement some version of this they might end up using “stacking” where the output from the LLM (rank) would be just one input into a larger model. With whatever weight it deserves.

marco · January 30, 2025, 3:37pm

Still brainstorming

First integrate them to help users get the most out of P123 tools and automate tasks. That's obvious.

Second use them in novel ways to improve alpha. Either directly as an additional input into a larger model, or indirectly by extracting new data points from textual sources, or perhaps charts.

Can you elaborate?

Thanks

ZGWZ · January 30, 2025, 3:41pm

The MS-Top10-Cap-GPT didn't even perform better than the SP100, not to mention that its sample (horizon and number of stocks) is too small and there are too many parameters that can be secretly overfitted

Jrinne · January 30, 2025, 3:47pm

It is a small sample size too (which the authors also acknowledge) meaning anything positive (if there is anything) would have no statistical significance whatsoever.

It's also clear that the authors trained their model, rather than using the same ChatGPT version available to me. My main takeaway from the paper was understanding what training an LLM for stock selection might entail, rather than treating it as evidence that LLMs can actually outperform. The training seemed pretty involved.

I also recognize the potential publication bias and bias in my own search, even though I wasn’t intentionally trying to filter the literature. It was a very brief search.

And, as ChatGPT itself pointed out:

"Without training, LLMs will mostly regurgitate general financial logic, which is far less useful than your carefully validated process."

So at best, LLMs would need significant training to be useful for stock prediction. And even then, there’s no proof that training alone would work in any studies I have seen.

For now, given my budget constraints, I'll continue using LLMs as a productivity tool—but I won’t be relying on ChatGPT 4o (or similar non-API versions) for any stock predictions.

ZGWZ · January 30, 2025, 5:52pm

What I mean by not better than is not statistically insignificant, but the Sharpe ratio is not higher.

korr123 · January 31, 2025, 4:53am

Of course. I was referring to this feature in the ranking system interface, where we can insert xml rather than pointing and clicking.

It'd be wonderful if we can input features into the AI rather than the point and click we have now.

I hope that cleared it up. Please let me know if you have any more questions!

marco · January 31, 2025, 3:18pm

Are you referring to how features are entered? You can copy features to the clipboard or download to csv, do your edits, and import. When you import you can choose to overwrite.

korr123 · February 1, 2025, 7:58am

I appreciate you pointing that out! I'll spare my excuse to keep this thread clean and on topic

Jrinne · February 8, 2025, 2:58pm

Some members found this post helpful

Just an update:

DeepSeek appears to be pushing competitors to release new models earlier than planned, such as ChatGPT o3 and now Gemini 2.0.

It may also be contributing to the delay of Grok 3's release on X (formerly Twitter) as they work to incorporate CoT reasoning and other enhancements before launch.

The new Gemini 2.0 Flash Thinking Experimental seems to have the most complete chain-of-thought (CoT) outputI’ve seen so far. Unlike ChatGPT, which briefly flashes a reasoning preview, Gemini 2.0 keeps the full chain-of-reasoning visible throughout the response.

I was impressed. However, the Flash version struggled to fully understand some of my questions. That said, it explicitlycarried out what it thought I was asking, and I could clearly see its reasoning in the chain-of-thought. Hoping for faster and more refined responses, I signed up for the Gemini 2.0 Pro Experimental version (first month free) after spending all morning testing Gemini 2.0 Flash Thinking Experimental.

I was able to get responses that I couldn’t obtain with any other online LLMs. Gemini (part of Google) seems to have more seamless access to online data, reducing the need to prompt excessively for up-to-date information.

As far as I can tell, Gemini has the highest token allotment for context among current LLMs. This means you can upload multiple research PDFs, ask numerous related questions, and expect it to maintain context throughout the discussion—something that can be a challenge for other models.

Note: I am not claiming I now know how this can be used and which LLM is actually best for what purpose.