Whyciffe’s,
Yes, I have just now started using DataMiner thanks to Dan’s kind help. I am not a good Python programmer and I needed his help to get started!!!
I can say with 100% factual certitude that you are a better Python programmer than I am .I would rather put my money on that fact than bet that the sun will rise tomorrow.
So, I can say with certainty that you can find something that will at least give you some ideas over at Scikit-Learn. And at least some ideas from Bard and ChatGPT.
Let me start that for you with a (probably poor) understanding of what you appear to be interested in:
So this is a fairly detailed question to ChatGPT with my perception being that you are a good programmer and can sort through some of this on your own:
Q to ChatGPT: “I have some data in a csv file. It has 150 “features” and my target it the “next months returns for the ticker.” The index would be the ticker and the date. Can you give me a program for a ridge regression for the features and the target being the returns. Maybe use 5-fold cross-validation for the hyper-parameters.”
Code:
pip install numpy pandas sklearn
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
import numpy as np
Load the dataset
data = pd.read_csv(‘your_data.csv’, index_col=[‘ticker’, ‘date’])
Separate features and target
X = data.drop(‘next_month_return’, axis=1)
y = data[‘next_month_return’]
Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create a Ridge regressor object
ridge = Ridge()
Create a dictionary of all values we want to test for n_estimators
params_ridge = {‘alpha’: np.logspace(-4, 4, 20)}
Use gridsearch to test all values for n_estimators
ridge_gs = GridSearchCV(ridge, params_ridge, cv=5)
Fit model to training data
ridge_gs.fit(X_train, y_train)
Save best model
ridge_best = ridge_gs.best_estimator_
Check best alpha value
print("Best alpha: ", ridge_gs.best_params_)
Predict target for the test set
y_pred = ridge_best.predict(X_test)
Check test data accuracy
mse = mean_squared_error(y_test, y_pred)
print("MSE: ", mse)
ChatGPT’s comments on the code:
" Replace 'your_data.csv'
with the path to your CSV file. The 'alpha'
values in the params_ridge
dictionary are the hyperparameters to be tested in the grid search. You can modify these values based on your requirements.
Please note that the grid search can be quite computationally expensive if you have a lot of data or if you are testing a lot of hyperparameters. The computation time will increase exponentially with the number of hyperparameters and the amount of data you have.
Additionally, remember that the effectiveness of your model will highly depend on the quality of your data and the appropriateness of the model for your data. If Ridge Regression does not provide good results, you might need to experiment with other types of models or preprocess your data differently."
Back to jrinne: You do not need me for this and definitely this is the meat-and-potatoes of many of the papers you are linking to. It may seem a little hard at first but definelty easier than wading though those papers. For me anyway.
Hope that help a little with finding what is right for you!
Best,
Jim