Dear All,
We're excited to announce some plans for the overhaul of our platform, AI Factor 2.0, to transform our solution into a true MLOps powerhouse for stock picking ML models. Our goal is to address the current version's inflexibility and streamline the process. We're seeking your feedback on the high-level changes outlined below.
Current version: The Monolithic Problem
The current version of AI Factor, while functional, has proven to be inflexible and cumbersome. Its monolithic structure—where a single "AI Factor" component contains the dataset, normalizations, model experiments (validations), and predictors—results in a complex workflow with too many steps and missing key functionality.
AI Factor 2.0 Solution
We are breaking this monolith into independent, modular components. The changes are below, organized by topic.
Dataset
We will be creating a standalone “Dataset” component. It will have the following new properties:
- Datasets will contain raw values so you will be able to run different experiments without having to continually reload
- Datasets will be able to automatically update with latest data so you don’t have to reload a dataset to re-train your production model
- You will be able to choose which features of your dataset are used for a particular experiment or operation.
- You will be able to specify default normalizations for your features (but they will not be calculated).
- Datasets will be usable by different components like:
- Experiments (a.k.a. validations for model tuning)
- Feature engineering
- Inference
- We will support easy imports of external dataset, like a Snowflake Dataset.
- Ability to merge internal and external datasets
Feature Engineering
The main approach to feature engineering will be with purpose built Python apps. We will also work with app developers to create an app marketplace. This will be more clear once we launch our first app which is being tested.
Experiments (Model Tuning & Validation)
This component, which incorporates the former Validation and Result sections, will reference an existing Dataset and utilize custom normalizations.
Some additional features will also be added like:
- New "fold-level" normalization, which ensures data integrity by only using past data windows, thereby preventing future data leakage with existing “dataset-level” normalizations.
- WandB integration. If you supply your WandB API key we will be log interim results.
- Additional statistics like accuracy, and accuracy trend.
- Advanced model introspection and explainability features (e.g., SHAP values) will be integrated.
- More efficient grid search like WandB “Sweeps” that do random and Bayesian search. Here’s a quote I found in some post: “I’ve replaced entire weeks of manual hyperparameter tuning with W&B sweeps. It’s legitimately one of the best features.”
Inference (Predictors Models)
Predictors (Models) will be decoupled from the core AI Factor, allowing for independent management, versioning, and deployment. In addition, Predictors will support automatic retraining upon Dataset updates.
Deployment
We will strive to make model deployment and tracking as easy as 1-2-3. For example:
- You will be able to choose either a Ranking System or a Predictor. This way you don’t have to create Ranking System wrappers for each Predictor.
- You will be able to launch live strategies with auto-retained Predictors. Tracking many models out-of-sample will therefore require zero effort.
Conclusion
In hindsight, some of the decisions we took were truly strange. Fixing it will be a lot of work of course, but we think it's worthwhile. Let us know you pain-points.
Thank you.
Cheers


