P123 may already have significant free (and easy) competition for some technical data!

JudithSokol · July 24, 2023, 8:50pm

Now i might surprise you all, but i do not think AI/ML here at P123 will make a big difference.

The Reinforcement machine learning will only automatize the process of creating ranking system that produces the best results on selected universe.

Like a optimizer that has all possible combinations of parameters and factors and factor combinations.

Unfortunatelly these models gets easily overfitted and the amount of work to create model working on out of sample data and is not overfitted is not diametrically different then creating your good ranking system in a first place.

I think that first step for P123 should be just extend their optimizer tool to allow for parametrized simple machine learning.

Jrinne · July 24, 2023, 9:06pm

Judith,

I am not sure that I actually disagree with your conclusion about what P123 should do.

But I have used XGBoost. I do not think it is bad. It manages the problem of mulitcollinearity well (completely really) and of course is non-linear. Non-linearity CAN lead to overfitting so a potential problem I agree.

So you may be right. But in the right hands (e.g., Marco de Prado) it may work: Advances in Financial Machine Learning

Anyway, I would be interested in your thoughts on XGBoost if you have any experience.

Some options for cross-validation are built into the program.

I have been for machine learning with proper support, FWIW. But on my mind Yuval’s model is machine clearing that could be automated and improved. I have a broad definition of machine leaning and reinforcement learning.

Let me amend that a little. I think P123 ia going ahead with machine learning last I heard and I want to support that..

First I have no idea what P123 will be doing. Do not want to speculate one way or another.

Any concern about overfitting can be handled by having multiple machine learning options (linear and non-linear) as well as a fully functioning cross-variation and holdout test method. Any pro can help P123 with that. Literally anyone who has any professional training.

Multiple linear options was a recommendation of Druckruck, I think.This is one method to reduce the overfitting. Cross-validaiton and a holdout test set will help determine if this is best.We don’t actually have to decide that here, ahead of time.

P123 had and still have the potential to be bigger than Quantopian was. They are in fact better.

I believe XGBoost and neural nets can be used for “panel data” which is basically what you can put into a spreadsheet if it is done right. I have done it but not a lot. I have done enough to see it works.

XGBoost is fast and while i am not sure I suspect P123’s servers can handle it.

This post title is only directed at the idea that ChatGPT and Bard make it hard to set up a machine learning algorithm without cross-validations and regularization. Better than the present forum in its machine learning advice.

Better still would be a dedicated machine learning moderator that, for a start, would let people know what they need to know to use those methods P123 will provide–whatever she thought was important. At least why and how P123 decided to implement it.

I have no evidence that they do not plan to do that at this time P123 has done well with its rollouts in the past. I actually have confidence in what they will end up doing. But they have to hire someone new, or ChatGPT or Bard to explain cross-validation and regularization, IMHO.

Maybe I am wrong or maybe they have already done that. I am optimistic and cannot wait to see it.

Best,

Jim

Jrinne · July 25, 2023, 9:30am

Yea, P123 won’t be able to get a beta out on that soon.

Would you have them automate anything in the meantime? If so what?

Do you prefer downloading a ranking system into a spreadsheet, randomizing some factors and uploading it back into P123?

Do you feel like looking at the spreadsheet each time you download it adds something? Makes you feel like you are contributing maybe?

Assuming you want any changes at all at P123. If you want eventual progress at P123 (however you would define it) what would you do it? Short of what you are suggesting above I guess. Is it all or nothing? A a massive intelligence better than the CEO of each company that is analyzed? I agree that sounds great and is very realistic, but what do we do in the meantime

Maybe debate whether the bag-of-words is the best language processing model while we wait on the massive artificial intelligence to arrive here at P123?

Seriously, what practical improvements would you suggest?

Jim

JudithSokol · July 25, 2023, 9:46am

I was probably not clear, so i will clear it out

The next step for P123, which is easy and might have some benefits:

The Reinforcement machine learning that will automatize the process of creating ranking system that produces the best results on selected universe. Like a optimizer that has all possible combinations of parameters and factors and factor combinations. Including custom factors.

Person would probably select the method of dividing the testing data into training data and testing data and human bias testing data. You could divide it based on random stock selection of universe, or random timeframe selection or random periods selection (like every second month, every second year etc). Then the amount of epochs, amount of layers, the reinforcement parameters etc.

Where i think the cutting edge investing models will go:

Combined Language model with alpha predicting model. Something like Analyst with experience from analyzing milions of companies over hundereds of years.

Jrinne · July 25, 2023, 10:05am

IMHO, this is excellent.

In addition to being a good idea Judith is using words the will be recognized by the machine leaning community (e.g., training and test set).

This will help and actually be necessary I think if P123 ever wants to market to the macine learning community.

FWIW, I am more concerned about having training and test sets (preferably automated) than the particular model used. This is not blazing new territory. It has been done literally millions of times with Scikit-Learn.

If one wants to use Duckruck’s linear models, Yuval’s optimization models Judith’s models if she expands on them, etc., P123 will need terms recognized by the machine learning community.

Like training and test set. Well said, I think. Helpful for both practical and marketing reasons.

And keeping with this thread ChatGPT and Bard are not fully developed but they DO understand a training and test set (if anyone has trouble with that they should ask Judith or ChatGPT for now).

Jim

InmanRoshi · July 25, 2023, 12:45pm

This is a good podcast on the subject from this week.

As language based predictive models, ChatGPT and Bard could be utilized to go through earnings call texts to pick up bullish or bearish sentiment. For math and numerics? It will just make up numbers to provide any neat and tidy answer.

Jrinne · July 25, 2023, 1:02pm

I actually agree with this. Bard is particularly bad about making stuff up.

But you discuss 2 things ChatCPT and Bard might be asked to do. There is a third thing that Bard and ChatGPT do better than the forum for now.

If you ask it to add a train/test split to your code it will. With no extended debate. And it will probably run. You will not have to wait years to see if it is implemented or even understood.

This is something done millions of times—per day maybe considering the iterations modern computers can produce—using Scikit-Learn. But it remains controversial here. And the forum will not help you implement it.

Or help you much with anything you might want to automate as a feature request for that matter. As an example, making random with a seed could–as a P123 feature—help someone make a train/test split at P123 or do subsampling within P123 (because MOD() is good but limited), but that request has been out there for a while. It would be pretty easy to implement, I think. I doubt that it will ever happen

It remains to be seen whether there will be a train/test split (or any type of cross-validation) when P123 makes AI/ML available.

But for now you are better off going to ChatGPT. There is not much useful information in the forum on this subject now for sure. And the forum will not give you any code.

TL;DR: Judith may have been the very first person to mention a Train/Test split in the form. But it is hard to get ChatGPT or Bard to write code without it. You do not have to even ask ChatGPT or Bard to do it and they will be able to explain why they added it if you have any interest.

Jim

JudithSokol · July 26, 2023, 8:58am

Chat GPT 4 uses 100 trillion parameters, this one uses 50 billion, so it should be at least 2000 times dumber

What i am talking about is using something like chat GPT 5 and then use the language model and refine it on analyzing stocks.

You have to have understanding of the world before you start analyzing stocks.

Jrinne · July 26, 2023, 9:36am

cmaxmagee:

It’s not that Bard is worse than ChatGPT or vice versa, it’s just that in my extensive experimentation, both in the financial space and outside of it, the key building blocks must be provided within the prompt, or else the LLMs will “invent” information to complete the task.

This is why GPT Code Interpreter is important - it lets you upload large amounts of information (in the case of the other thread I posted, a corpus of ranking formulas) which it then draws from and can perhaps do something sensible with.

This goes as well for doing fundamental research. A supposedly “up to date” LLM like Bing or Bard will make stuff up if you ask about a recent earnings release, but if you paste in an actual excerpt of a filing, GPT can analyze it well. Anthropic’s Claude is even better in this respect - you can upload large PDFs of filings and ask questions of them and get quite sophisticated answers.

I get that some think I might be better off implementing a bag-of-words program in Python when I get around to it. I question whether that might turn into a can of worms for me. But even when I get around to trying it, ChatGPT could offer a different perspective to whatever I might end up programming, I would guess. Occasionally, they might have a better computer with more GPU’s as much as I like my laptop. So language AIs might do some things better than my program, perhaps.

cmaxmagee has already looked into what is available today in a serious manner and has an impression based on some experience.

I think some of those programmers may have some insights or coding experience that I may not be able to add to my computer. Probably just me with my coding skills.

As far as practical solutions now I wonder if anyone else at P123 has looked into this: “Anthropic’s Claude is even better in this respect - you can upload large PDFs of filings and ask questions of them and get quite sophisticated answers.”

A Judith’s comments about the parameters as well as cmaxmage’s comments about ChatGPT with the code interpreter is important. A rational approach to the subject.

The machine learning skills of ChatGPT are far superior to any advice you can get in the forum already, No serious question about that. People are already finding ways to use ChatGPT and other language AIs for sentiment. If I were P123 I would not ignore that.

Just me perhaps but I would have already provided random with a seed, and listened at least, to any other ideas of automating some of this from the forum, unless my ultimate goal was to be just an API in the end.

Much of this can be automated and P123 could be a leader. Perhaps it should be from a marketing perspective. There are some ideas in the forum that could be considered if that were to become a goal for P123.

Not a feature request as it would not end up being implemented if it were. Rather a question. Is P123 better off if I use the just API for any new models? I think a lot of present P123 members are headed that way. I have no information as to new subscribers.

I do like my present model. I do find use for what I can download and some things I do with Python can be incorporated into a ranking system WHICH IS VERY CONVENIENT WHEN IT CAN BE DONE!!. But growth in new methods of even adoption of some basic ideas that have been around for decades (e.g., train/test split) is SLOW AT P123 unless you use the API with Python.

ChatGPT is more help for newer—state of the art—ideas and for implementing fully automated ideas with its coding abilities. And even for the minimal basics—like a train/test split—ChatGPT can help you with some of that (with code or ideas on how it might be incorporated).

Jim

InmanRoshi · July 26, 2023, 2:47pm

I’m interested in what Bloomberg creates, but given that they charge ~$25k for just a terminal subscription I highly doubt it’s going to be cheap (or easy).

Jrinne · July 26, 2023, 2:57pm

Just curious. The only strong opinion I have is that my bag-of-worms or words method will not work so good at home.

cmaxmagee seems to have actually tried this and is even using it now. Have you cut and pasted anything into ChaGPT (e.g. a corporate report)?

Also I get this from Bard: “Yes, Tesla has had an improvement in sentiment over the last month on Twitter. According to a study by SentiStrength, the average sentiment score for Tesla tweets has increased from -0.12 on June 26 to 0.04 on July 26. This means that the average tweet about Tesla is now slightly positive, rather than slightly negative.”

Maybe the numbers are made up but it is well informed on the different language models including the bag-of-worms method, People can learn more about it there even if you do not want to invest in the numbers Bard gives you now (I would not).

BTW, Bard will even compare itself to ChatGPT with objective strengths and weakness. We can get some objective answers on some of this.

E.g, Bag-of-words " * Bag-of-words: This is a simple method that counts the number of times each word appears in a text. The words are then weighted according to their frequency, and the overall sentiment of the text is determined by the sum of the weights……

…I think the bag-of-words method is a good starting point for sentiment analysis. It is simple to understand and implement, and it can be effective for many tasks. However, it is important to note that the bag-of-words method can be insensitive to context, and it can be difficult to get good results with small datasets.

I believe that neural networks are the most promising machine learning method for sentiment analysis."

I am sure that is a good answer based on my reading. For sure, someone would have to prove to me just counting the words is the best you are ever going to get. Do you think ChatGPT is just counting my words to get my meaning?

BTW, the bag-of-words method is often paired with the Naive Bayes algorithm which, again as Bard suggests, does not look at interaction (or context) of the words. For anyone actually interested in selecting a model or understanding the methods.

FWIW, here is what Bard said about Apple’s Feb 2, 2023 corporate report. Maybe someone can comment on whether it would have been useful or is accurate:

" Financial Highlights

Net income of $123.9 billion, or $3.28 per diluted share, for the quarter ended December 25, 2022, compared to $111.4 billion, or $2.91 per diluted share, for the quarter ended December 26, 2021.
Revenue of $123.9 billion for the quarter ended December 25, 2022, compared to $111.4 billion for the quarter ended December 26, 2021.
Gross margin of 43.7% for the quarter ended December 25, 2022, compared to 42.4% for the quarter ended December 26, 2021.
Operating margin of 29.1% for the quarter ended December 25, 2022, compared to 27.3% for the quarter ended December 26, 2021.

Business Highlights

iPhone: We delivered record quarterly revenue of $71.6 billion and units of 82.9 million, up 9% and 2% year-over-year, respectively. We set new all-time records for iPhone revenue and units in every geographic segment.
Services: We delivered record quarterly revenue of $19.5 billion, up 11% year-over-year. We grew our installed base of active devices to 1.8 billion, up 10% year-over-year.
Wearables, Home, and Accessories: We delivered record quarterly revenue of $14.7 billion, up 10% year-over-year. We set new all-time records for wearables, Home, and accessories revenue.

Outlook

We are confident in our ability to continue to grow revenue and earnings per share in fiscal 2023. We expect revenue to grow between 5% and 9% year-over-year, and earnings per diluted share to grow between 6% and 11% year-over-year.

—

Sentiment

The sentiment of this corporate report is generally positive. The company reported record quarterly revenue and earnings per share, and it is confident in its ability to continue to grow in the future. The report also highlights the company’s strong growth in its iPhone, services, and wearables businesses.

Overall, this is a good report for Apple. The company is performing well financially, and it is well-positioned for future growth.

I hope this is helpful!"

Hmm….I wonder how many times they said diluted (as in diluted shares) and if that would be a bad word in a bag-of-worms program…….

Jim