I agree to some extent in that it is definitely not bullish. This behavior is so pervasive, though, that I predict it would create a lot of bearish transactions/signals that are unlikely to be statistically great. I do dislike this behavior personally and wish more executives were serious about being long term shareholders with skin in the game.
I am happy with it either way myself because I will always check each manually but do feel these will create a lot of noise. You saw this yourself in that it happens quite a lot- I agree that it is pretty common and not desired but that is a bit of a worry too in terms of finding signals as it could “dilute” the ones with declines.
Would be great if we will be able to set custom limits in the factor, something like
InsiderTrans(weeks,offset,SigPur=0.5,ConExDisp=3.0,PlannedAll=5.0,PlannedDec=5.0)
With a custom formula SZ could get his chart by setting PlannedAll to 100, so the state would never be true…
I used my favorite LLM tools (ChatGPT, Gemeni - Deep Research, and Claude and also compare with other platforms. This is the improved version for the insider transaction signal.
Yes. Check the amount. It is window dressing. I see it a lot in very troubled firms. CEO will come in and buy 20k. That is part of why the % of holdings will be useful, although here that would not have worked since he didn’t own almost anything
Super stuff @sraby you have to show me how you generated the flowchart & graphic, we need it for our docs/marketing! I tried and tried and could not get an LLM to draw the correct flowchart with my pseudo code. So I resorted to mermaid charts.
Few quick comments
I like the introduction of "data quality" since it is a problem. Sometimes the #shares owned is wrong for example, and FactSet doesn't try too hard (doesn't read footnotes, etc).
The cluster buying will have to be a post-rating adjustment otherwise logic has to be moved to the engine instead of where it is now (in a database function at extraction point).
I think "Open market trade" should just be simplified to mean buys, which would need require revisions in some of the paths (disposition are almost always open market, with exception of things like gifts which should just be ignored)
In any event, it's obvious that we will have to support different version of this logic, or allow for customization.
Thanks for the feedback. We'll continue to brainstorm this.
I use AI/LLM tools in my management consulting practice. I also deliver AI workshops. Happy to prepare an AI workshop for you and your team / All P123 members!
I generally agree with this except the last point made is a bit underdeveloped. One should not buy just because of the insider buys but a spinoff or such is not required. There are many forms a “theme” can take.
we might have look into these % stake figures or their calculation a bit more and see how reliable they are in average. In these examples the amount is too low to really be 50% stakes. It might be just an outlier though:
I have been really liking the feature so far, just wanted to point this out for awareness. I guess at the end of the day as long as they are above a small % the rating would not change anyways
Great thread everyone. As an aside Google NotebooksLM also creates slide decks which seem to have more information than the infographics. The paid version, at least, will also create a nice video. The flash cards are in depth and one can learn from these.
So here is an example of what AlgoMan means using the Titanic dataset from Kaggle. I could not find the P123 dataset for insider information.
Maybe @AlgoMan and others (advanced ML members too numerous to mention now) could have a go at this and other challenges. Why should Kaggle have all the fun?
I also want to mention a related ML/LLM tree method mentioned previously by @pitmaster that may be less prone to overfitting. As everyone probably already knows the reason we have random forests with many aggregated decision trees is because single decision trees tend to overfit. Pitmaster addresses that by using more than one tree and also having the LLM put the aggregate tree results into words. Content by the Judge on X - #10 by pitmaster
So ideally we could construct short trees that are great for interactions and create an AI factor with multiple short AI factors in the ranking system.
At present XGBoost and other models allow you to specify interactions with arrays. This is very clunky as it turns off interactions (lots to turn off). An LLM could speed that up possibly. But I think we may not have access to that in P123’s AI at the present time in any case.
The other problem is that a random forest will find thousand of interactions. Many of them will be spurious–finding interactions in past market regimes that no longer exist. With so many interactions there will be many false interactions.
A solution might be focused interactions as P123 has already done here. Great idea @marco that could be expanded using @AlgoMan’s suggestion!!!
Here is a Sklearn generated tree telling you who was likely to survive on the Titanic (classification decision tree):
For those interested in Google NotebookLM’s ability to create videos, here’s a short visual explainer on how decision trees are used conceptually in finance (NotebookLM video) Click on “AI Trees in Finance” to the right:
Building on this: Standard RF/ET models often generera "deep" trees that capture idiosyncratic noise (overfitting) and creeate impossible-to-audit black boxes.
The solution is Tree-Factor Ranking (TFR) —literally turning Trees into Factors.
The Process would as follows:
Train: Fit a shallow RF or ET model (~1000 estimators), strictly limiting max_depth to 3 or 4.
Extract: Convert each tree into a P123 factor using Eval().
Curate: Filter out factors with low predictive power or high correlation (@AlgoMan's toolbox would be useful)
Select: Quantitatively select the top 20–30 factors to build your ranking system.
(Optional): Feed the rules to an LLM to flag any "stupid" economic logic.
While TFR accepts slightly higher bias (lower raw power), it offers significantly lower variance and full transparency—crucial for explaining "Alpha" to clients.
Technical Caveat: Python trees route NaN automatically and can handle categorical variables. Using P123’s Eval() is simplistic approach to handle only numeric values.
Interestingly, TFR is also applicable for Boosting (LGBM, additive modeels) . You can read the logic as a sum of corrections (thresholds are ranks):
Tree_1 (Base Logic):
IF ROE > 80 & SalesGr > 60 THEN Score += 8
Tree_2 (The Correction):
IF P/S > 90 THEN Score += -5
// Corrects overvalued "traps" found in T1
Tree_3 (The Context):
IF Sentiment > 75 & GrossMargin > 90 THEN Score += 3
Total Rank = 8 + (-5) + 3 = 6
Looks like Mr. Rickersten is loading up again on MSTR. He sold everything at $371, a very good exit! While he's had some mis-timed exits in 2024 (around $50 and $70), he seems pretty good at finding the entry points.
Full disclosure: I bought quite a bit too around the same price last week
How do you pickup so quickly with P123 AI factor in such a short time? and build some pretty decent designer models.
What is your background if you don’t mind sharing?
I was a trader/PM before and find it extremely difficult to catchup without any coding background. I probably need to hire some coaching help or invest in someone else models (like what I did with a few hedge funds).
Not fast at all—just a lot of hours since day one. I kept asking questions, learning from advanced users, and experimenting nonstop from features to ranking systems (no viceversa). My background is Telecommunications Engineering, which trained me to break complex problems into smaller pieces (signal processing, pattern recognition, early ML). Later, I applied machine learning to tennis big data in Matlab (I should have focus then focused in Python but I did not then), learning the hard realities of data quality, volatility, modeling and the big Issue =liquidity!. In parallel, I worked in finance/accounting and studied valuation, options, and structured products. Portfolio construction and systematic research eventually tied everything together. It’s really just long-term effort—not magic...well the magic is platform 123 itself...incredible tools and features