ML/AI guides/resources

For those looking to learn how to do ML. There are quite a few posts already, sorry for any overlap, but here is one that is focused more on resources for those looking to learn how to implement ML algorithms or generally learn more about it. Partially to answer a question in another post, but not take that post over!

Google actually has a pretty good course on ML that also uses Colab to do hands on examples:

I am currently doing the crash course.

Another good resource is python libraries like scikit-learn: machine learning in Python — scikit-learn 1.3.2 documentation
But they are very python focused.

A more specific guide for factor investing someone mentioned a while ago on the forum:
https://www.mlfactor.com/index.html

Finally I think some colleges/universities make their lessons available online, but I expect those to be very math heavy and thus have not dug into them yet.

If other people know of good resources that start basic to learn ML I am also curious!

1 Like

Introduction to Machine Learning with Python: A Guide for Data Scientists

Also you can do most of the machine learning with dropdown menus with JASP. Basically, no Python needed. Email me if you need help with the limited Python requirements. I’ll ask ChatGPT for you (did I mention before that I don’t really know how to program which works out well because you do not need to).

JASP is limited to about 1,000,000 rows. You can subsample much larger Pandas DataFrame with this code. This code uploads 7 DataMiner files then downloads a file that is a random sample with 1,000,000 rows onto my desktop, ready to then be uploaded into JASP.

Here is the only Python code you will have to use. Unmodified if you keep the file names the same or maybe get ChatGPT to modify it for windows:

import pandas as pd

Load your data (replace ‘your_data.csv’ with the actual path to your data file)
df1 = pd.read_csv(‘~/Desktop/DataMiner/DM1.csv’)
df2 = pd.read_csv(‘~/Desktop/DataMiner/DM2.csv’)
df3 = pd.read_csv(‘~/Desktop/DataMiner/DM3.csv’)
df4 = pd.read_csv(‘~/Desktop/DataMiner/DM4.csv’)
df5 = pd.read_csv(‘~/Desktop/DataMiner/DM5.csv’)
df6 = pd.read_csv(‘~/Desktop/DataMiner/DM6.csv’)
df7 = pd.read_csv(‘~/Desktop/DataMiner/DM7.csv’)
df8 = pd.read_csv(‘~/Desktop/DataMiner/DM7.csv’)
df = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8], ignore_index=True)

Check if the data has more than 1,000,000 rows
if len(df) > 1000000:
# Randomly sample 1,000,000 rows from the DataFrame
subsample = df.sample(n=1000000, random_state=3)
else:
print(“The dataset has less than 1,000,000 rows.”)

Save the subsample to a new CSV file on your desktop
subsample.to_csv(‘~/Desktop/subsample.csv’, index=False)

Having a smaller file may seem like a weakness but if you add multiple results it is called subsampling and duplicates what is done by many using MOD() to create many universes. Often considered to be a good thing at P123.

You can also model average e.g., add the results of a regression with that of a random forest and a neural-net. So perhaps not a weakness at all.

It does not get easier than that!!! JASP is a free download for Windows or Mac. JASP will even do a neural-net for you (with a dropdown menu) in the machine learning module.

You might start with a regression in JASP. A million rows is a large enough sample that it will give you the same results as doing a regression on the entire data in Python.

For Rank normalization (and a Monday rebalance), no one has to wait to see what P123 does with AI/ML. Irregardless of their skill-level with Python.

This is something I truly believe as I think I have done this myself: You can build a good ML system with DataMiner downloads and just JASP (not my present port but I would have no problem funding it).

Jim

Bayesian Modeling and Computation in Python

https://bayesiancomputationbook.com/welcome.html

If you like watching videos on the subject and need a general overview, this guy does a great job and he is entertaining…

Tony,

Really good!!! Thank you. As you know I discussed this with you before but StatQuest does an excellent job with anything statistical. I believe you could search any statistical topic and include StatQuest and get a near-perfect introduction to the topic. His songs area an acquired taste :slightly_smiling_face:

Jim

Jonpaul,

This Machine Learning Program from Coursera includes a certificate from University of Washington that may help your career after you have finished the course.

You may want to check it out.

Regards
James

1 Like

James,

Thank you. I took a lot of Coursera courses when you could audit them. Andrew Ng’s courses on Deep Learning are all great!!! Coursera generally charges now but if you are serious the fees are not great.

I have been recently taking some MIT courses on Python which have been really good!!!

Here is one on machine learing from MIT that is free. I am pretty sure it is good it is quality instruction. Not sure how hard it is. But it does include boosting. XGBoost is a great addition that P123 will be offering: Prediction: Machine Learning And Statistics

Jim

Jonpaul / Jim,

Another program from Coursera with certificate.

This one is tailor-made for trading in financial markets.

Regards
James

More Coursera goodness;