FactorMiner Preview

Dear All,

As mentioned before we are working on extending our tools by creating an environment where "apps" can run (created by us or the community). @AlgoMan is working on one here: AlgoMan HFRE v2.4 β€” Hierarchical Factor Ranking Engine (Download). His app right now requires you to download the dataset. Our app can also run internally, so it has a few more advantages.

Ours first app is very modest. It runs long/short portfolios for each factor and reports the best "N" factors that pass your filters (min alpha, max correlation, etc). Here's a preview of the results for a use-case I had. I think it's really useful, runs really fast and produces actionable results. The app has several other neat features. We hope to release it early next week.

Looking forward to your feedback.

Cheers.

Use case

I want the "best" 10 factors from this list

  • Buyback Yield
  • Earning Yield
  • Earnings Yield Incl R&D
  • EV to Sales
  • Price to Sales
  • Price to Book

With all these normalizations:

  • Cross sectionally (for each date)
  • Within the sector
  • Within the sub sector
  • Relative to previous 1Y values , then cross sectionally
  • Relative to previous 1Y values , then within the sector
  • Relative to previous 1Y values , then within the subsector

For a total of 36 different factors normalization permutations

Results

I fed the dataset to our new app for the past 20 years, every 4 weeks for the russell 3000. I set up some restrictions and produced these results in a matter of seconds.

As you can see the best factor is "Earning Yield" normalized cross sectionally (in a ranking system this would be "vs. the Universe").

Factors 2-8 are skipped due to the restrictions I imposed. The next best factor is "Price to Book" also normalized cross sectionally.

And so on. The fourth "best factor" is the first one normalized vs the SubSector. Earning Yield appears several times with different normalizations, including relative to previous historical values (for example is the stock trading at a high earnings yield vs it's 1Y history)

It's good to see that the alpha signs appear correct. Only one of the factors has a +ve alpha, Buyback Yield, since "higher values" are usually better for that one. All the others, lower is better, so the short portfolio does better.

Results: All Factors

The app also shows you all the factors and the reasons why they were excluded

7 Likes

Lovely, I’m doing something similar via the REST API (Although not in a matter of seconds!). Ideally this can be integrated into a pipeline, i.e. we like some factors lets push them directly into the (upcoming) AI workflow or a ranking system.

Pretty sure I will incorporate this into my workflow. Claude Cowork set me up to do some of this with P123’s API but not as complete as what you have done I think. Claude is not fully integrated for sure.

1 Like

Seems like a great way to find normalizations.

some feedback:

Does it run on both long/short and long-only? The long/short winners might not be the same as the long-only ones. Would be good to have in terms of factor choice and design for long-only strategies.

Is the size of the l/s buckets a setting?

Eventually, an option to set a number of random or alternatively (and perhaps better to avoid questions on replicability), a deterministically split set of universes taken from the overall universe at the start say 3

Thanks for sharing. Looking forward to trying it

Looks great. I have to ask how you manged to calculate IC so fast, do you use Spearman or Pearson?

I have been playing around quite a bit similar Normalization app, but more focus features for tree based models (yours is explicit checking linear measures). Reach a dead end when I realized I could not get the raw data. For tree based features, I have found that transformation can be very important for certain features, but it is really hard to measure the impact with a single metric.

For this Factor Miner that is focused on linear measurements I would add IR (IC/stddev), it shows the the quality of the IC. I have also tested a Conditional IC that is quite effective, measuring IC only where the factor signal is strongest, e.g., the extreme ends of the distribution.
I posted two examples before where I tested this method on.

With traditional IC this feature scores really high IC, but mediocre Conditional IC.

With traditional IC this feature scores really low IC, but a great Conditional IC.

By the way, when do you think you will have some kind of "Developer Documentation" or SDK/API Documentation for the coming Streamlit apps?

1 Like

Here's another screenshot that should answer more questions

. The flow is rudimentary:

  1. L/S factor portfolios are created
  2. Factors that don't pass performance filters are eliminated
  3. Rest of the factors are sorted based on absolute alpha
  4. The best one is automatically picked
  5. The next best "N" factors have to pass correlation threshold with all the other best factors

I didn't code it. We'll publish the source code soon in our repo.

This validates what we are doing (framework to run apps internally) to get around license problem

We hope that these apps get improved by the community. We will then release them inside.

Our API will be the main way apps interact with your private data. Apps have access to storage in our network store. Currently we integrated authentication so a user cannot access someone else's dataset + factors.

1 Like

thanks. Yes I get the current state better now. Assuming if we set short % to zero that performs a long only analysis.

I don't get this. Earnings yield is a higher-is-better factor while price to book and price to sales are lower-better. Earnings yield should have a positive alpha. Unless your formula is upside down somehow.

1 Like

Earnings Yield should be treated similarly to buyback yield

Looks promising.
When evaluating a factor, I typically assess its incremental contribution relative to a baseline ranking (e.g., my existing ranking system). It would be useful to allow users to specify a baseline ranking model and then measure the marginal change in IC and other performance statistics after incorporating the candidate factor into that baseline.

This would be a nice addition (perhaps as a future feature) to help quantify the factor’s incremental predictive power rather than evaluating it in isolation.

2 Likes

Will we be able allowed to generate AI predictors and AI Factor Validations to be used in the rest of the P123 environment with the "apps"?

1 Like

This sounds like a great development. I can envision an ideal workflow where you get a cloud compute environment that has access to raw data via SQL, the ability to install custom software, and SSH access. This would let us run AI agents in the cloud environment or locally.

Key features I'd love to see:

  • Persistent storage for saving research artifacts across sessions
  • Full environment control β€” install any packages, libraries, or tools needed
  • Standardized API to export rankings back to the P123 platform for simulation and live trading

Ideally, I'd build the entire ranking solution in this custom environment and simply push the final rankings to P123 via API. Flexible research environment + clean integration layer β€” would be incredibly powerful.

Hi Marco, any update on when this might be released? Keen to try it out. Thanks!

@marco Has P123 considered implementing OAuth to allow third parties (like us) to develop apps that use the P123 API for general consumption?

Or is this already part of the plan?

Tony