FactorMiner is live! Faster factor discovery for alpha research and AI feature engineering

Dear All,

Portfolio123’s new FactorMiner is now available. FactorMiner evaluates each factor independently and reports key metrics such as annualized alpha, beta, tail-weighted information coefficient, t-statistics, high/low quantile returns, missing-value coverage, and factor correlations. It can rank factors by Alpha or IC, automatically detect the best sort direction for the H-L portfolio. It then creates a subset of the best factors using a maximum correlation coefficient.

How to access it

NOTE: FactorMiner currently lives inside "Factor List", but it will soon be moved to a new home together with other "apps".

  1. Go to RESEARCH → TOOLS → Download Factors
  2. Create a Factor List with your superset of factors
  3. Generate the dataset (this step currently requires API credits)
  4. Click Analyze → FactorMiner Launch
  5. You will be redirected to a Streamlit app running in our servers
  6. Review settings and click analyze

What it does

FactorMiner helps you quickly identify high-alpha factors from large factor datasets. Instead of manually testing hundreds or thousands of factor formulas one at a time, FactorMiner automates univariate factor analysis and produces a refined list of the strongest candidates for use in Ranking Systems, Screens, AIFactor, or external research workflows.

Additional Info

Please see the Knowledge Base article here for screenshots and more details on each feature

Testimonials

Some of the reactions from our QA testers:

I just was able for the first time since several months to substantially improve my US strategy... Thanks to FactorMiner

20 year simulation based on ML validations: Sim CAGR before 46% --> Sim CAGR now 61% at same turnover...

would never have thought that there is THAT much juice left to squeeze in simple univariate feature engineering

Anyway... I love it so far

Roadmap

  1. Datasets are coming. FactorMiner currently lives inside "Factor Lists" which were created for downloading normalized dataset. This is temporary while we create a proper "Dataset" component (extensible, updatable, re-usable, sharable, etc)
  2. Add more flexibility with how performance is calculated. Add more stats. Auto correlation analysis. Etc.
  3. User contributions! FactorMiner is an open source project. You can download it from our repo portfolio-123 (Portfolio123) · GitHub and run it locally in your computer. Contact us if you would like to contribute to the official version that runs in our server.

Why you should use it

Factor libraries can be extremely large. A single idea—value, quality, growth, sentiment, revisions, momentum, or risk—can have many formula variations. FactorMiner turns that problem into a repeatable research workflow: generate a dataset, run the analysis, review the best factors, and export the results.

This is especially useful for univariate feature engineering. By testing one factor at a time, you can quickly answer practical questions:

  • Which formula variation has the strongest standalone predictive power?
  • Should higher or lower values be preferred?
  • Does the factor work in the top tail, bottom tail, or long/short spread?
  • Is the factor too sparse because of missing data?
  • Is it redundant with other selected factors?

FactorMiner for AI and AIFactor

FactorMiner is a strong pre-processing tool for AI workflows. Machine learning models often suffer when the input feature set contains too many weak, noisy, missing, or highly correlated features. FactorMiner helps reduce that problem before model training.

AI use cases

Feature engineering: Quickly generate candidate features from Portfolio123 factors or formulas

Feature selection: Keep factors with stronger alpha, IC, and t-statistics while excluding weaker candidates.

Correlation reduction: Use the correlation threshold to avoid selecting highly redundant factors, improving diversification across model inputs.

Cleaner AI datasets: Filter out factors with excessive missing data before they reach the model.

Alpha-focused model inputs: Build AIFactor models from features that already demonstrated standalone predictive value.

For Ranking Systems

Use FactorMiner to identify factors with the best IC, remove highly correlated duplicates, and combine the remaining candidates into a ranking system. The result is a more disciplined alpha-generation process:

  1. Start with a broad set of candidate factors.
  2. Use FactorMiner to measure each factor’s standalone predictive power.
  3. Select factors with strong IC or alpha.
  4. Remove highly correlated features.
  5. Combine the best factors in a Ranking System
  6. Validate the result in full simulations.

FactorMiner is designed as a pre-analysis tool, not a replacement for full strategy validation. Portfolio123 recommends validating results in full trading simulations that account for slippage, fees, liquidity, transaction prices, and factor interactions before using them in live strategies.

Bottom line: FactorMiner gives Portfolio123 users a faster path from factor ideas to usable alpha candidates. It helps researchers discover high-performing formulas, engineer better AI features, reduce correlated inputs, and build more focused models from the factors with the strongest evidence of predictive power.

9 Likes

how does one Create a Factor List and feed it into the factor miner?

I got it working after a few rounds in Antigravity. I'm not a coder, so understanding how to install it and how to proceed when there are database errors, etc., can be a bit challenging. However, it helps to give Antigravity access to the folder to clean up some of it for you.

I think this is very good. It has a lot in common with the software that Algoman created some time ago.

Here's what I would like to see:

  • A simple method for how to install something like this.
  • Easy solutions to achieve a transferable for use within Portfolio123 AI-Factor, for example, by creating a factor list that I can paste directly into AI-Factor, consisting of, say, the 100 best factors.
  • A simple solution for extracting the best formulas and quickly converting them into a ranking system.
  • A user manual that describes in some detail the best 25 tips for the correct and best use of the platform.

Just suggestions, but this was a very good starting point. More solutions for those of us who are not so "technical" would be very beneficial.

2 Likes

Thank you very much for this new tool, which will certainly be a great help to us.

I agree with Whycliffes. I gave it a try. It’s fairly easy to get the hang of, but I got stuck for a while on what seemed to me to be the simplest part: importing the factors once the analysis was complete. A transfer tool or a similar CSV file format would be welcome.

I also think a quick-start guide with the best tips would help users get the most out of the tool. I must have messed up somewhere, because instead of getting better results with the best factors, I got exactly the opposite :slight_smile:

1 Like

Interesting—thanks for sharing the tool, @marco.

I’ve been developing my own approach/tool for Portfolio123, working with large parquet datasets and 300+ factors (still limited in porftolio123 if I saw it well). I’ve put several months into building and refining AI-driven models (with portoflio and external sources) from the ground up when it was launched, and I’m starting to see some encouraging results across different markets.

This tool could be a nice complement to that work. That said, there’s still a limitation when it comes to analyzing so many factors at once, not to mention the associated API costs.

Curious to see how others are approaching this trade-off.

You want to import into a Ranking System or AI Factor?

For AI Factor should be relatively easy using a spreadsheet as the middleman. Using FactorMiner (FM) for improving ranking system requires "finesse" right now for several reasons:

  • Rank factors require a specific order
  • Some rank nodes are not supported
  • Ranking within Industry (all 4 levels) is typically done with relative nodes in a ranking system, and with formulas and transformations in AI Factor and Factor List
  • Ranking systems typically only do cross-sectional for a specific date
  • No easy import/export

We'll try to address some of the these difficulties soon.

API costs will go away for the most part once we introduce static Datasets which will use Resource Units (RUs). And yes, much bigger datasets will be supported. Right now the generated datasets are automatically deleted since they were meant for downloads of normalized factors.

2 Likes

ScifoSpace, here are my calculations.

So I wanted to test this over max period for 72 features and was about to click but saw it would use 3669 API credits (for 72 features) or about 51 credits per feature.

I don’t know whether to call this expensive or cheap. Depends on your AUM and how many features you plan to screen monthly, I guess. But I decided not to run that myself.

In relative terms using Claude Cowork with the API, I can download a rank performance test for 3 API credits and have Claude derive all the custom stats I need by manipulating the results of the rank performance test with Python. As a bonus Claude automatically puts the results into a spreadsheet in a folder on my desktop–keeping tack of all my runs.

I do a lot of there things with the API now--including testing new models–and am burning through my credits. But I have been doing my search for new features with less than the 1,000 API credits per month allowance.

Anyway, a savings of 48 API credits per feature or over 10 times as much to run it with FactorMiner.

Did I do the math right?

Thats the :old_key:

1 Like