LLMs for creating new factors

Interesting paper using LLMs to generate a positive or negative short term signal purely based on a headline. They used the prompt below and generated significant alpha buying or shorting the stock the day after the headline. The main theory is that there's a delay in the diffusion of new information.

The prompt they used:

"Forget all your previous instructions. Pretend you are a financial expert. You are a financial expert with stock recommendation experience. Answer “YES” if good
news, “NO” if bad news, or “UNKNOWN” if uncertain in the first line. Then
elaborate with one short and concise sentence on the next line. Is this headline
good or bad for the stock price of company_name in the short term?
Headline: news_headline"

For example for the headline "Cigna Calls Off Humana Pursuit, Plans Big Stock Buyback" they report that ChatGPT4 replied the following when asking about Humana

"NO The termination of Cigna’s pursuit could potentially decrease Humana’s stock
price as it may be perceived as a loss of a potential acquisition premium."

Based on this answer they short the stock for one day.

However!

When I tried their example ChatGPT 4 gets it completely wrong

Gemini also replied the same

Claude however nails it; and the follow up question too!

So based on this test Claude does much better.

Regardless, very interesting area that we will explore. We could create several factors relatively easily (and cheaply) using articles and filings text.

It's pretty obvious that this is already happening as there's lots of immediate over-reaction right now to not-so-bad news. We will probably focus on prompts to elaborate on longer terms outcomes (like Cigna's buy back above) since that is more inline with what we do now.

Thoughts?
Can ChatGPT Forecast Stock Price Movements.pdf (738.3 KB)

1 Like

A common problem with news sentiment indicators in papers is that the horizon of predictive power is too short. As in this paper, the typical predictive horizon is only one day. It is better to implement LLM to analyze financial reportings and conference calls.

There is also a viable scenario of automatically extracting data from financial reports to speed up data processing, or even building your own database.

New Constructs does this

Marco,

FWIW. Gemini is not shy about giving stock market advice (while ChatGPT is). BTW, I like Claude 3 a lot too. Gemini has real-time access to the web that it seems to use in its analysis.

I started playing with Gemini when it first became available. Gemini seems free to give financial advice while ChatGPT would have none of it today for me. One possible advantage s that Gemini claims to be constantly scraping the web for sentiment analysis. As it states here:

"The sentiment analysis provided was based on a combination of factors:

1. Real-time monitoring: I continuously track social media platforms like Twitter, Stock Twits, and Reddit for mentions of Tesla. This allows me to capture the most recent trends and discussions.
2. Sentiment analysis algorithms: I employ advanced natural language processing (NLP) techniques to analyze the text content of posts. These algorithms can identify positive, negative, or neutral sentiment based on word choice,context, and other linguistic cues."

Question: : Please asses everything you know about TESLA including its financial filings. Give me a prediction as to how it will do relative to the SP500 along with a probability of going in that direction."

Answer. "Outperformance: There is a high probability (70-80%) that Tesla will continue to outperform the S&P 500 over the next several years, driven by its strong market position, innovative products, and growth potential.
** Underperformance: However, there is also a lower probability (20-30%) of underperformance if Tesla faces significant challenges in meeting its production targets, managing costs, or navigating regulatory risks."*

Addendum for ML nerds (like me). Gemini's answer about ML methods that it possibly hallucinates it is using for sentiment analysis: As per Gemini:

"Sentiment Analysis Methods: A Deeper Dive

Yes, I do utilize boosting techniques as part of my sentiment analysis methods. Boosting is a machine learning ensemble method that combines multiple weak learners (classifiers) to create a strong learner. This approach has proven effective in improving the accuracy of sentiment analysis models.

Key Methods Employed:

  1. Natural Language Processing (NLP) Techniques:
  • Tokenization: Breaking down text into individual words or phrases.
  • Stemming and Lemmatization: Reducing words to their root form to capture semantic meaning.
  • Part-of-Speech Tagging: Identifying the grammatical role of words in sentences.
  • Dependency Parsing: Analyzing the syntactic structure of sentences to understand relationships between words.
  1. Feature Engineering:
  • N-grams: Sequences of words of varying lengths (e.g., unigrams, bigrams, trigrams) to capture contextual information.
  • TF-IDF (Term Frequency-Inverse Document Frequency): A weighting scheme that assigns higher scores to words that appear frequently in a document but are less common across the corpus.
  • Word Embeddings: Representing words as dense vectors in a continuous space, capturing semantic similarities.
  1. Machine Learning Algorithms:
  • Support Vector Machines (SVMs): A supervised learning algorithm that finds the optimal hyperplane to separate data points.
  • Naive Bayes: A probabilistic classifier based on Bayes' theorem, assuming independence between features.
  • Random Forest: An ensemble method that combines multiple decision trees to improve accuracy.
  • Gradient Boosting Machines (GBMs): A boosting algorithm that iteratively trains weak learners and combines their predictions.
  1. Deep Learning Models:
  • Recurrent Neural Networks (RNNs): Models that can process sequential data, such as text, by maintaining a hidden state that captures information from previous inputs.
  • Long Short-Term Memory (LSTM) Networks: A type of RNN that addresses the vanishing gradient problem, allowing it to learn long-term dependencies in sequences.
  • Transformer Models: Architectures that have become popular for natural language tasks, leveraging self-attention mechanisms to capture relationships between words.

Boosting is particularly effective in sentiment analysis because it can help address class imbalance and improve the accuracy of models, especially when dealing with complex language patterns and nuances.

By combining these methods and techniques, I can effectively analyze text data and extract meaningful sentiment information."

Jim

Fun idea! The various LLM-companies keep changing their models though. Usually without any documentation or updates for the users. I've had prompts (in non-financial context) go from working great to producing useless results. So maybe it would be helpful to run your own local LLM?

1 Like

Perhaps it makes sense to pay attention to Llama

It's open source and can be downloaded. That's great.

But that's a creative answer....

YES
Cigna's decision to abandon its pursuit of Humana and instead launch a significant stock buyback indicates a reduced risk of Humana being acquired at a potentially undervalued price, likely boosting Humana's stock price in the short term.

This is what happened since that headline on Dec 10 '23

I have seen a buch of studies where they have "backtested" strategies with large language models with increadable results.
But I'm very sceptical to all results I have seen from the studies. The risk of forward looking issues is enourmus.
Would have to disguise all clues like name of the company, dates etc, would have to reformulate the texts, if not, the LLM might figure out which artcile that is beeing checked and would know how the market reacted to the news.

The way I see it, the use case is to generate simple "LLM Factors" to support the quantitative rank, not as a single buy/sell decision.

For example I would avoid stocks with recent accounting irregularities and filing delays. Or put 10 Buy/Sell LLM Factors in the ranking system to create a 1-10 score.

Take for example SuperMicro latest headlines. Deriving these factors and values should be relatively easy for an LLM.

Headline LLM Factor LLM Opinion
Forget Nvidia: 1 Stock-Split AI Stock to Buy Before It Soars 195%, According to Certain Wall Street Analysts Pundit BUY
Needham initiates coverage on Super Micro with Buy rating Analyst BUY
Super Micro Computer, Inc. (SMCI): A Bear Case Theory Pundit SELL
Super Micro Computer, Inc. Announces Receipt of Non-Compliance Letter from Nasdaq Compliance SELL
Super Micro Computer, Inc. (SMCI) Stock Drops Following 10K Filing Delay FilingDelay SELL
Super Micro Computer (SMCI) Faces Renewed Scrutiny Amid Investor Suits InvSuit SELL
Super Micro Computer Stock Plunges on Report of Federal Accounting Probe Accounting SELL
Super Micro Computer stock plunges on report of DOJ probe DOJ SELL
3 Likes

Behind a Bloomberg pay wall but discussion of this article can be found elsewhere. Pertinent to this discussion perhaps. Although admittedly, it may or may not use a LLM: AI Model is Better at Pricing Currencies Than Humans, ING says

As I understand it second-hand, it will affect hiring practices at the bank going forward at a minimum. Less need for humans although they well provide some human oversight for now.

LLM's algorithm isn't that smart, the problem is mainly that people consciously overfit.

What's really useful is financial reporting/conference call sentiment analysis. This is an area where even traditional bag-of-words methods are useful.

The regularized logistic text regression uses log frequencies of individual words (unigrams) and 2-word combinations (bigrams) in documents as independent vari-ables (thebag-of-words approach).

More "nice", "good", "great" and "obviously" (??!?!) in the conference call means the sentiment is more positive and the future returns would be higher. More words in earning presentation is good but more words in earning calls is bad (maybe it is a good signal if the CEO brags about their robotaxi plan for hours and analysts don't want to ask too many questions). Even such simple strategies help in the long run even in US-listed largecaps.

1 Like

I question whether machine learning can beat the market when it comes to mainstream currencies. But I have no doubt that machines are much better than humans in this regard, partly as evidenced by the fact that the people on the forum who claimed to be from investment banks were too ret...uninformed gamblers who have zero self-awareness about it.

For example, they even believe that they, the investment bank traders, do not routinely hold and buy stocks that will underperform in the future, and donate money to the market maker (LOL) in options gambles that are structurally overly complex on non-diversified underlying, and that rely on faulty and unsupported qualitative analysis as opposed to structurally simple diversified trades relying on quantitative analysis so highly mechanized that the traders don't even know what the underlying company does.

After all, it's not mainly that AI is too smart, it's mainly that humans are too stup... unsophisticated.

Edit:

We study the role of financial product complexity in retail investor trading. We find retail trading in complex options surged with the introduction of zero-commissions, and these traders prefer strategies with high volatility, embedded leverage, and lottery-like features. Model-free subjective expectations of volatility extracted from their trades show significant optimism bias. Importantly, their trades on average yield negative returns of-16.4% over three days, with losses increasing with complexity. Our findings suggest that retail investors do not fully grasp the risk/returns trade-offs in complex strategies, and they are lured by their inherent leverage and promise of lottery-like payoffs.

Complex "strategies" don't pay.

More importantly, contrary to conventional wisdom, adverse selection costs are higher for (the most actively traded) at-the-money or slightly out-of-the-money contracts relative to out-of-the money options.

Smart traders only/mainly play ATM or slightly OTM options.

1 Like

Perhaps too general and not involving LLMs but remarkable that the Nobel Prizes in Physics and Chemistry both involved Artificial Intelligence. And suggests that finding new uses for AI can be productive even if it might not lead to Nobel Prize. :slightly_smiling_face:

Developing artificial intelligence in physics: PNI's John Hopfield receives Nobel Prize in physic

Using AI to predict the shape of proteins for chemistry: Nobel Prize in Chemistry Goes to 3 Scientists for Predicting and Creating Proteins

I was under impression you can get different path-dependent answers if you ask multiple times in multiple sessions.

Yes, that's why bags-of-words methods may be better here.