Python code for calling 123 API

marco · December 16, 2020, 3:49pm

This is what we are doing. should be ready very soon.

We are re-working two API endpoints /data/universe and /rank/ranks . Both of these will be able to download technical data (past & future) and the ZScore or FRank of a formula. So should be well suited for ML/AI. With a data license you will be able to download anything. One API call will only download the universe’s data for one date. For now we will enhance the DataMiner operations that use them to do multiple dates at once (like 10 years weekly).

The /data endpoint will not be suitable for AI/ML . It’s intended for small lists of stocks, and at the moment requires a data licence no matter what you try to download

The future performance functions are live now. I will post some documentation . We’re also working on samples, knowledge base, etc. Hang in there.

Thanks

InspectorSector · December 16, 2020, 4:16pm

OK thanks - for the Rank API, it is one resource unit per date? And additional limitations depending on the number of data points?

marco · December 17, 2020, 4:06am

Both the data/universe and rank/ranks endpoints will have a variable cost depending on the number of data points retrieved. The minimum cost is 1 API request of course. For data points we’re changing the cost . Instead of 5K data points being 1 API requests we’re upping it to 20K data points 1 API request.

InspectorSector · December 17, 2020, 1:32pm

Marco - this is pretty much a non-starter. Just as an example, I have a project that requires 5 years of weekly data. But I am looking into the future 1 year. So I need to separate the training/validation/test datasets by 1 year. This adds two more years. Then I have to split off the last year which becomes prediction dataset. That brings me up to 8 years of data. Then, I want to throw out the pandemic year because it isn’t representative of normal markets. So I am looking at retrieving 9 years of weekly data. This amounts to 9 x 52 API calls or 468 API calls just to capture one set of data that may or may not be even close to a final dataset for the project. It is just one stab at a solution and I am going to have to repeat the data collection many times as I evaluate inputs and targets.

So the fact is that most people here won’t be able to finish one project without buying more resource credits. And because of that, your Big Data attempt is going to fail. Adding insult to injury, you are allowing deep pocket customers with a data license to retrieve data without the one date = 1 resource limitation, meaning that you are providing preferred treatment to deep pockets. Maybe that is the intent or maybe you just don’t realize it.

In any case, when people taste big data they will want more. But if they run out of resources before they taste it, then it ain’t going to happen.

marco · December 17, 2020, 2:07pm

Steve,

we’re still tweaking things.

Forget the api requests for a second . From a purely data quantity aspect we’re thinking of a cost of around $200 for 1Billion data points

Can you show me the math for a use case that uses way more than 1B data points ?

marco · December 17, 2020, 2:11pm

And for a one time cost of $500 you can get 500K api requests that can be used to download 10B data points. Doesn’t seem like big pockets to me !

InspectorSector · December 17, 2020, 2:33pm

Marco - you seem to think I am commenting on the price per datapoint. I’m NOT. I could care less what you charge for pulling data. I am saying that if you want a grass roots movement then get rid of the resource unit per API call. That’s all.

marco · December 17, 2020, 4:00pm

Resource units & api calls are two different , separate things. Resource units are the storage space of your account.

Did you see them mixed up somewhere?

InspectorSector · December 17, 2020, 4:20pm

OK I don’t care what they are called. My understanding is as follows: the Rank API collects data for one date only. Therefore, 9 years of data requires close to 500 API calls and hence ~500 resource units. Depending on account type users may have 5000 units per month. And somewhere in the subscription plans, users are restricted to 100 or 200 per hour. So if I am wrong, then please correct me. Somewhere along the lines, someone called them resource units.

** EDIT ** I am probably confusing APi limits with datapoint limits above. Let me know how things are processed/charged, Thanks.

InspectorSector · December 17, 2020, 4:49pm

OK - I just looked it up. I have a legacy screener membership so we will ignore my situation (I’ll likely have to upgrade). But for a portfolio membership for example, the subscriber is allowed 100 quant engine requests per hour. I assume that one API call is 1 quant engine request? Then Portfolio members are allowed 1000 API calls per month. This is enough for people to get a bad taste in their mouth, not a taste that makes them want more. Anyways, this is the heart of the problem. People want to try before they buy and you will have difficulties convincing people to buy more APIs without having already developed a basic application. You need an API that calls up a date range in order for this to be practical.

Jrinne · December 17, 2020, 6:03pm

So, Steve, Philip and Marco are on top of this as far as training.

But do not forget the predictions part of this.

For a brief time I was predicting daily.

I was pulling just 5 factors and/or node ranks from the screener. Philip is talking about maybe 20 factors or nodes. Could be 252 times a year for some. I was doing daily—again for a short while.

Sorting and concatenating 5 factors was no big deal. 20 would be a little bit of a deal. A little bit of a deal 252 times a year could be meaningful for some. You may want to think about this too, if you have not already.

Ideally, you could pull all of the factors for a ranking system at once with no concatenating.

And this would be a consideration as far as how many calls would be used by members interested in AI/Machine learning, of course.

philjoe · December 17, 2020, 7:09pm

I can’t get it to work properly without blowing up my requests. This will have to wait until next month sorry guys.

edit: nvm I’m back in business

Jrinne · December 18, 2020, 11:17am

Is this more complicated than it needs to be?

I note that Steve and Philip, who are highly skilled programmers, aren’t using this yet. Philip has been working on this for a while and unable to do anything with it yet. I have not bothered with this and feel rewarded that I have used my time elsewhere. Not wasted my time in other words.

Steve may be making some progress but he is an extreme with regard to motivation and abilities. An elite club with one member. Steve is a great programmer and he has the most extreme ability to learn, adapt and focus that I have ever encountered—even in the upper echelons of elite university medical centers. To be fair to medicine, R. Doyle Stulting, M.D., Ph.D. is in Steve’s league.

Steve has worked with machine learning professionally. Steve is not too happy yet I might add.

I get that Steve probably needs some options and some complexity is probably helpful to him. Cool that he will have something complex to work with.

I also get that Philip will have the ability to work-with and work-around the complexity when P123 provides a range of dates in a download. So he remains engaged. But does he really need or even want all of the complexity? Maybe.

But I might just ask Philip myself (since P123 won’t) if he has not been spending a lot of time to get one VERY SIMPLE ARRAY. Likewise P123 is spending a lot of resources on this one array that 99.9% of machine learners will use:

Column heads: date, ticker, factor1, factor2,……,factorn, excess returns relative to the mean of the universe over the rebalance period. That is it. That is all that 99.9% of machine learners will ever need.

Granted–one way or another–they will need to get some metrics to see how their Boosting/TensorFlow model has done. This is something anyone with any training in machine learning has already done. Also, this is where P123’s time and resources would be better spent, IMHO. This is where P123 has excelled in the past—e.g., the equity curves of sims and the buckets for rank performance.

P123 should make it so an undergraduate in Finance can get this array easily—even if they are using SPSS or JASP. Right now I am wondering if even Philip will ever get this array.

But if people like complexity give them that too. I have nothing against complexity for complexity’s sake. If a programmer are having fun with this: enjoy!

Just me? I would look at what ETFOptimize says in another post before I would make that arguement: [url=https://www.portfolio123.com/mvnforum/viewthread_thread,12642#!#76370]https://www.portfolio123.com/mvnforum/viewthread_thread,12642#!#76370[/url]

ETFOptimize is a professional. He is extremely intelligent. Maximally savvy with P123 classic. But he will never use this or even get it, I would bet. Not a reflection on him, I think. But in any case, not just me.

I see that as a marketing problem for P123. One that can probably be addressed and turned to a benefit over any competition.

I wish P123 the best whatever path you ultimately take,

Jim

philjoe · December 18, 2020, 2:23pm

My problem is that I can’t make any mistakes when pulling the data, otherwise I run out of API requests. I made a mistake in my first time pulling ranks and I had to redo it and oops now I am out of API requests so am waiting until January.

As far as I can tell the API works really well in python, and I have been able to get a simple array done… its just the capacity issues that need to be sorted.

Oh also I tried XGBoost and some SciKitLearn stuff and it spit out garbage

InspectorSector · December 18, 2020, 3:03pm

I am going to make one more attempt here at getting through to P123 in the course that is being set.

P123 was founded early in the century, 2001 I think, and started by giving away the service for a year before it even thought about charging money. Since then P123 has grown into a research-intensive platform for a flat fee. And a reasonable fee at that. I am pretty sure that many subscribers have hundreds of sims and also many portfolios. That is what P123 has been up until now. But the business model is changing. And I get it. Marco wants or needs to start making more money off of the platform. And there is an opportunity here with big data and machine learning. So the business model is changing from flat fee to consumption-based. When we run out of resources, whether it be datapoint downloads or API calls, we can purchase more at a “reasonable” price.

So this is all great. P123 has a new business model to propel it into the future. But there is a problem here. As I see it, the majority of subscribers are used to the flat fee model. I have been doing research for a long time and you can trust me when I tell you that success doesn’t come by choosing one set of factors, downloading them and voila! you get great results. It comes from many hours of blood, sweat, and tears, a lot of data downloads, a lot of trial and terror. By my estimation, building a model based on machine learning with P123s fees is going to cost at least $1K and that is on top of the current membership fee. And for subscribers to do the intensive research which I believe is necessary to come up with a handful of decent investment strategies is going to be (easily) into six figures.

What it really boils down to is what customer is P123 trying to serve? If it is big business then continue on with the current path but be prepared to hire a direct salesforce and cold call every large business in America. That is the only way to succeed here. There is one alternative, that software companies are employing with success. And that is a freemium model. Get developers and users going for free. And then some progress into paying services. I’m not suggesting that P123 give everything away. But at least give existing subscribers some minimum set of resources that will allow basic work to be done without having to pay into a bottomless pit while not understanding what he or she is getting out of it is too much to ask.

To start, I have made the suggestion that P123 needs to either get rid of the API call limitations, or alternatively make it more practical. BY more practical, I mean allowing a date range for one API call, and have the data assembled and returned in one API call. That is palatable. What P123 is offering currently IS NOT.

Once the API resource limitation problem is dealt with then it is all about the cost of downloading datapoints. P123 should keep in mind that its forte is not data redistribution, but portfolios and research, an area that it does an excellent job at. The datapoints that are downloaded are not real data but processed data. I can get real data from other sources like Quandl or Xignite, at a similar or lower cost. And then, what do I need P123 for? I’m not suggesting that P123 should offer free datapoint download, but they need to understand their niche, and it is not data redistribution.

Marco needs to remember his roots and how he built the business. Continue on the same path or what is being done will likely be DOA.

Jrinne · December 18, 2020, 3:06pm

i would be happy to share what I have done with XGBoost. There is some art to it.

For example, you will be surprised with how large of a child_weight you will want to use—probably around 500. You will not read that anywhere but then again I have not seen de Prado share any of his hyperparameters. He writes about using boosting but shares none of his hyperparameters.

Jim

philjoe · December 18, 2020, 3:12pm

how many trees and what depth would you recommend

marco · December 18, 2020, 3:16pm

Steve, I get it , sort of. But I think it’s not going to be an issue. You’ve yet to show me a simple use-case that uses more than 50K api’s/mo. We’ll see , we can always add more.

And if it’s a problem for you I’m sure we can barter.

Jrinne · December 18, 2020, 3:25pm

Going from memory I use a depth of 6. But you use a lot more nodes so maybe more.

The number of trees is controlled by early stopping and eta. So make sure to set up early stopping and validation.

If you used the default eta (0.1 maybe) I would expect garbage. You need to go down to 0.01 and maybe even 0.001. This will increase the number of trees but XGBoost is pretty fast and your processor (with parallel processing) can probably handle that. You will probably end up with close to 1,000 trees.

Eventually, you will want to use subsampling, column sampling and monotone_constraints. Email me when you are ready and I will share my ideas on monotone_constraints.

But you could probably turn on subsampling to 0.5 right away and column sampling to a low number. I have used 1/number of columns with some models.

If you are worried about outliers switch the metric to mae (ranther than rmse). But when the child_weight is large this may not make much difference.

To start. Depth: 6, eta: 0.01, Early stopping. Huge min_child_weight starting around 500. I think gamma defaults to 0. Probably keep it there or set it to zero.

Hope this helps some as a start. But you will develop your own hyperparamters for your own systems (depending on the number of nodes you tend to use etc.)

Jim

InspectorSector · December 18, 2020, 4:14pm

“You’ve yet to show me a simple use-case that uses more than 50K api’s/mo. We’ll see , we can always add more.”

Marco - The actual limitations right now are 1,000 API calls per month for a Portfolio membership. Can’t remember what it is for an Ultimate membership. And then there is the limit of Quant requests per hour, 100/hr or 200/hr. My guesstimate is that 25K-50K API calls will be sufficient to develop an application, and maybe 5% of applications will work out in real life. Also keep in mind that I am (and so are a lot of others) a P123 junkie. This is a lifestyle for me, a geeky lifestyle. The consumption model puts a damper on that

“And if it’s a problem for you I’m sure we can barter.”

I am an advocate for P123. And I don’t mind preaching to the choir, but I do mind if I am preaching to an empty church with the electricity turned off. What I am saying is that it isn’t just about my situation. I want to make sure this takes hold and everybody is walking talking ML/AI. Then you will make lots of money!

I will very shortly be starting a ML/AI group. You will see more then.