TL;DR: P123 cannot expect to attract machine learners without giving access to this somehow. The best evidence for the need of this can be seen in the overfitting of the designer models that have produced no excess returns (on average for the last 2 years). This is the most used method for preventing overfitting on the planet!
If P123 will not make it easy to download updated data and use k-fold cross-validation with Python, I wonder if P123 might make it available as a feature within its platform.
Here is the request to make it easier with Python: Data download the day of rebalance for machine learning - #13 by Jrinne
Case use: machine learning and AI is not possible without some sort of cross-validation. That is never done anywhere. If P123 wants to attract machine learners this will be necessary. And it would not hurt if it were easy, I would think.
If I am wrong about this being the most used then some of these other methods can be made available over at Sklearn and/or within the P123 platform and have a higher priority at P123 if we want to be serious about machine learning:
Other Techniques: There are many other methods specifically designed to prevent overfitting:
- Regularization: Techniques like L1 (Lasso) and L2 (Ridge) regularization directly address overfitting by adding penalty terms to the model’s loss function.
- Pruning: Used in decision trees to remove branches that have little power in predicting the target variable.
- Dropout: Commonly used in neural networks to prevent overfitting by randomly setting a fraction of input units to 0 at each update during training.
- Early Stopping: In iterative algorithms like gradient descent, training can be stopped early if validation performance stops improving, preventing overfitting.
- Data Augmentation: Especially in deep learning, augmenting the data by applying random transformations can prevent overfitting.
But you will never see machine learning done anywhere without one or more of these methods. It should not be a goal of P123 to restrict access to them. That would go by the name"unforced error", “shooting oneself in the foot” etc. if the goal is attracting machine learners.
Jim