Estimize data as an add-on?

Jrinne · February 7, 2023, 2:20pm

Werner,

Edit: Extreme overfitting is possible with a lot of different methods. I agree that machine learning can DEFINITELY BE MISUSED. I have also been able to overfit some models using the P123 optimizer in the past. There are ways to mitigate these problems.

My present models are performing well and if I were to detail them one would have to agree that they are resistant to overfitting despite being heavily reliant on Python programs and other platforms.

We have heard little about what will be done at P123 as far as cross-validation etc in that regard. But the tool I suggested could actually be helpful if used properly.

Or could be misused as you accurately point out.

My point was different. Assuming it is possible, one might want to see what they are paying for. In as objective of a way as is possible. And if someone suggests using machine learning then I do not think it is too radical to suggest a machine learning method for that.

Jim

feldy · March 4, 2023, 12:22am

I have a real world example over the past couple months that’s caused me to shift my priorities for 3rd party datasets. News sentiment would now be at the top of my list.

On Jan 10, my short model recommended shorting CDZI, which at the time was a 150 MM market cap stock with no analyst coverage. From what the ranking system could see, the stock was overbought, but ironically there’s a news story that shows up on the P123 timeline feed announcing a development partnership that’s clearly driving the price movement.

The problem is that my P123 ranking system is completely oblivious to the news – all it can see is a large upwards price/volume move. At the time, this was a 150 MM market cap stock with no analyst coverage, so I couldn’t rely on any analyst related factors in P123 to explain the price move.

My next thought was to check Estimize, but there’s no crowd coverage for CDZI, so no luck there either. As mentioned above, it looks like the universe coverage tends to skew larger cap, and a lot of folks here tend to play more in the small cap arena, so the intersection may not be great.

A news sentiment data source could be immensely valuable here, especially for Biopharmaceuticals where there’s often price movements driven by FDA approvals/rejections. These news sentiment data sources likely have wide universe coverage and may be informative, timely, and orthogonal to existing factors.

Some news providers listed on Quantconnect like Tiingo or Benzinga provide raw news which is not really feasible for consumption on P123, but others like Brain build their own NLP models and provide sentiment scores which could integrate nicely. I’ve seen others on Factset Marketplace like Sentifi, but have not tried any of these out personally.

There are other providers of sentiment scores based on Twitter (e.g. S-Factor) or StockTwits, etc., but I worry about susceptibility to manipulation and lack of history compared to news feeds, so they’re not as intriguing to me.