Introduction to Machine Learning with Python: A Guide for Data Scientists
Also you can do most of the machine learning with dropdown menus with JASP. Basically, no Python needed. Email me if you need help with the limited Python requirements. I’ll ask ChatGPT for you (did I mention before that I don’t really know how to program which works out well because you do not need to).
JASP is limited to about 1,000,000 rows. You can subsample much larger Pandas DataFrame with this code. This code uploads 7 DataMiner files then downloads a file that is a random sample with 1,000,000 rows onto my desktop, ready to then be uploaded into JASP.
Here is the only Python code you will have to use. Unmodified if you keep the file names the same or maybe get ChatGPT to modify it for windows:
import pandas as pd
Load your data (replace ‘your_data.csv’ with the actual path to your data file)
df1 = pd.read_csv(‘~/Desktop/DataMiner/DM1.csv’)
df2 = pd.read_csv(‘~/Desktop/DataMiner/DM2.csv’)
df3 = pd.read_csv(‘~/Desktop/DataMiner/DM3.csv’)
df4 = pd.read_csv(‘~/Desktop/DataMiner/DM4.csv’)
df5 = pd.read_csv(‘~/Desktop/DataMiner/DM5.csv’)
df6 = pd.read_csv(‘~/Desktop/DataMiner/DM6.csv’)
df7 = pd.read_csv(‘~/Desktop/DataMiner/DM7.csv’)
df8 = pd.read_csv(‘~/Desktop/DataMiner/DM7.csv’)
df = pd.concat([df1, df2, df3, df4, df5, df6, df7, df8], ignore_index=True)
Check if the data has more than 1,000,000 rows
if len(df) > 1000000:
# Randomly sample 1,000,000 rows from the DataFrame
subsample = df.sample(n=1000000, random_state=3)
else:
print(“The dataset has less than 1,000,000 rows.”)
Save the subsample to a new CSV file on your desktop
subsample.to_csv(‘~/Desktop/subsample.csv’, index=False)
Having a smaller file may seem like a weakness but if you add multiple results it is called subsampling and duplicates what is done by many using MOD() to create many universes. Often considered to be a good thing at P123.
You can also model average e.g., add the results of a regression with that of a random forest and a neural-net. So perhaps not a weakness at all.
It does not get easier than that!!! JASP is a free download for Windows or Mac. JASP will even do a neural-net for you (with a dropdown menu) in the machine learning module.
You might start with a regression in JASP. A million rows is a large enough sample that it will give you the same results as doing a regression on the entire data in Python.
For Rank normalization (and a Monday rebalance), no one has to wait to see what P123 does with AI/ML. Irregardless of their skill-level with Python.
This is something I truly believe as I think I have done this myself: You can build a good ML system with DataMiner downloads and just JASP (not my present port but I would have no problem funding it).
Jim