Nice ML integration on FactSet

Hi Marco, this is little off-topic I guess - but Factset (FDS) is showing up one of my screens (SP Global SPGI does sometimes too) and I was wondering if you had any general statements about the financial data vendors. Do they really have tremendous pricing leverage over customers, or is it becoming increasingly more competitive? I notice they’re buying up data competitors so it seems like may be a sort of marketshare/monopoly positioning effort going on, but kindof wonder how you see the competitive environment? If you feel free to comment, I appreciate any perspective you can share, either company or industry specific.

I wish it was more competitive. But when you need fundamentals + estimates + industry classification it’s either reuters , s&p or fds. And if you need dead companies keeping everything aligned with prices is a nightmare if you use different vendors. Probably a good trade with flat stock price and earnings coming up. I should have bought their stock long ago for the same reason I bought Comcast. Can’t get away from them.

Would be interested!

For anyone wanting to explore this further, I just signed up for it and got my “$500 credit.”

It did not require a credit card.

So I am going to have to give STRONG SUPPORT TO STEVE ON THIS. At least with the data I have so far.

As you may recall I ran some data on JASP which gives me some control of the hyperparameters and that model was predictive of stock returns. The hyperparameters and the results can be found here: Boosting your returns

So I ran the same data on DataRobot and got the image below: boosting was worthless with their hyperparameters. This is their implementation of boosting (XGBoost in particular).

DataRobot runs a lot of models very quickly. It looks like they may optimize the hyperparameters for XGBoost to some extent.

BUT not like Steve does in his Colab program or even what I did with JASP, it seems. I think DataRobot MAY not optimize the hyperparameters like you would want if you were going to put money into a system. May not spend the computer time fully optimizing all of the hyperparameter for the multitude of models that they run so quickly. And some human art can be involved in finding the best hyperparameters.

And some of the ML/AI models cannot be expected to work with rank data (e.g., Ridge Regression). Ridge regression could work well with Z-score when P123 makes that available. But keep in mind ridge regression is about 3 lines of code in Python (once you have downloaded the libraries). 3 lines of code for ridge regression and a few additional lines of code in Scikit-Learn for the cross-validation. I am not claiming every newbie could do it without a little guidance from Steve (and other members), P123 or a combination of both

I need more data on this. I am not done and I do not have a firm conclusion on any of this. This is just one piece of data for me.

But score a point for JASP and some human intervention for now. And a point for what Steve is doing over at Colab.

I will look at this with XGBoost. Take a good long look with better data and a hold-out sample that will not have any potential for data-leakage whatsoever. Take my time getting it right. And also continue to look at this over at DataRobot (as long as my $500 credit holds out). No conclusions for now but I thought I would share some of what I have at this point.

Jim


I have had a chance to play with DataRobot a little.

I just wanted to say I have gotten some results that are better than above although not necessarily anything I would invest in or recommend for investing based on the the results below. In fact, the results below as well as what I get in Python (with the same data) is making me inclined to not invest in the strategy. Or if I do to use P123 classic (perhaps with a bit of factor analysis) for this data and these factors.

This is a holdout test set (not validation data).

There is some capacity to change the hyperparameters in the program which I had not found when I wrote the above.

I like the program at this point. I am suspicious about the pricing as I cannot find the pricing anywhere and they have not contacted me yet. Although the fact that I have not burned through my $500 credit yet is a plus I guess.

Marco, I do not know if you are still looking at DataRobot but I did not want to leave an unfair opinion as my last post on this. Still not an expert on DataRobot by any means.

Jim


Marco,

In addition to DataRobot you have mentioned Azure. Steve Auger introduced the forum to Colab. This is in addition to the obvious possible use of Python on a personal computer. (e.g., Jupyter Notebooks).

Now, IBM Watson Studio Cloud is offering some free services. It is perhaps similar to Colab. At least similar in the sense that one can run XGBoost and TensorFlow for free on it (they say). However, I have yet to find a command-prompt on the site after signing up.

For comparison, it was easy for me to find (and use) Colab’s command prompt.

I became aware of this because the new version of Anaconda has this integrated into it (somehow).

FWIW if you are still looking at ways to market P123’s API and its possible use for ML/AI applications.

Jim

See my post for ML update