Python code for calling 123 API

Jrinne · November 25, 2020, 12:25pm

Thanks Steve,

I am particularly impressed with what you have done with what is called a “grid search” for optimizing hyperparameters in your Colab programs.

You have moved beyond anything I have ever done with that.

I will take you up on you offer of assistance with this and other aspects of XGBoost.

InspectorSector · November 25, 2020, 2:19pm

Jim - I know you don’t understand everything I am doing with my project. Part of it has to do with my Seeking Alpha activities, the rest is knocking down big data barriers here on P123. There are still roadblocks, like trying to get data back into P123. But we just work one problem at a time and eventually we will have something that you can apply to your project. You don’t see the benefit right now but you just need to be patient.

Jrinne · November 25, 2020, 2:42pm

Steve,

I like what you are doing. From what I see I think there is a time-series aspect to your project. Which is not my forte.

Cross-sectional analysis is so easy for a simple mind like mine: 1) put a bunch of factors in columns, 2) spend a couple of seconds agonizing about what a good target might be: Like how much money I will make if I buy this ticker? 3) Push enter.

XGBoost is magical in that it self-selects the factors—automatically selecting the important factors. And if you feel a need to understand how the program did that you can find it in the feature importance printout. Double-good for a simple mind like mine.

Let me repeat that so I can remember it……Uh, Factors in columns, figure out a target, push enter.

Uh wait. It is shift enter in Python. Okay got it now.

I think that is what you are seeing in my posts. There is just a world of difference in my understanding and abilities between cross-sectional analysis and anything resembling a time-series analysis for me personally.

Best,

Jim

InspectorSector · November 25, 2020, 3:33pm

There in no real time series stuff in my application. There are some filters on data, but I am not feeding time series dependent data into the model. I can shuffle the rows of data without any problem. What I meant to say is that I think that maybe you are not seeing the value in my project because there is no direct relationship to profits/losses.

In any case, the important thing is that you need to make sure that Marco/P123 are giving you the tool that you need so you don’t spend days splicing spreadsheets of data together. I know that you expressed a need to have excess returns brought out. I agree and hopefully Marco is listening. What else do you need in the API to make your project work that hasn’t been acknowledged?

Jrinne · November 25, 2020, 3:57pm

You are so right about this being part of my simple-minded approach as I mention above.

I get that as being an argument for this not being time-series data. And an argument could be made that I am so bad at time-series data I cannot even recognize when it is (or isn’t) time-series data.

I can get something together along the lines of what I did “Boosting you Profits” in time. Probably.

If Marco wants the definitive study before committing to this he can send me the 3 Excel csv files this will take. It will take 3 as I have come to learn that the maximum number of row per spreadsheet is 1,048,576.

I.e., 3000 stocks Russell3000 universe * 52 weeks * 15 years = 2,340,000 rows

Yep. 3 Excel spreadsheets. Yea. Marco not helping is not an option if he wants the definitive study.

But I can do with less for now. Probably. After a while.

Jim

InspectorSector · November 25, 2020, 4:13pm

Jim - set the “definitive study” aside for the moment. What data do you need to collect to make your effort easier? Don’t worry about defending what you are doing. What data do you need?

Jrinne · November 25, 2020, 4:28pm

Thanks Steve. I can get it. I am sure Marco would help but I do not need any.

Jim

philjoe · December 11, 2020, 3:48pm

Has the functionality to make the label column using return data been added yet?

philjoe · December 14, 2020, 10:59am

?

InspectorSector · December 14, 2020, 1:56pm

philjoe - they are still working on the details of the new API. Patience!

philjoe · December 14, 2020, 2:13pm

ok thx for the update.

marco · December 14, 2020, 3:22pm

We did add the future functions for performance and relative performance (in addition to allowing negatives in FHist) which I will document soon.

There will be two ways to generate the ML data for features & labels. Working on it. Maybe tomorrow or wed. Thanks

InspectorSector · December 14, 2020, 4:15pm

Marco - will there be an update on calculation of resources used? You mentioned that 5000 data points will be considered as one resource unit regardless of API call. But will the API still be restricted to one date or will there be a date range for one API call? Makes a big difference in cost of resource units for me at least.

yuvaltaylor · December 14, 2020, 8:22pm

See https://help.portfolio123.com/hc/en-us/articles/360053182312-Portfolio123-API and scroll down to the bottom. These limits are now in place.

InspectorSector · December 14, 2020, 10:36pm

Thanks Yuvol. It is as clear as mud to me. It looks like if one uses the “data” mode then 5000 data points = 1 credit. “Data” requires a license or one can access certain factors for certain S&P 500 stocks back a maximum of 5 years? Otherwise, if one uses the “rank” API then 1 date = 1 credit up to 5000 data points.

Is that how I should be interpreting it? If that is the case then I guess I’ll probably have to drop what I was working on, or try to make do with the original rank API that I was using before and scrimping on accessing data.

philjoe · December 15, 2020, 12:57am

So we can use the “data” operation to make the label?

danparquette · December 15, 2020, 1:00am

Hi Steve,
Data and DataUniverse will require a data license IF you want to return the actual data values. That part about the SP500 and 5 years is going to be removed soon. But soon you will be able to return FRank and ZScores for factors without having the data license. To determine the number of data points used by a these Data and DataUniverse operations, you multiply #dates x #tickers x #formulas then divide that number by 5000 to get the cost in API credits. So for example: using DataUniverse with a universe of 2000 stocks and returning FRank values for 20 factors, the cost will be 8 API credits per date. 2000201/5000

InspectorSector · December 15, 2020, 2:22am

"That part about the SP500 and 5 years is going to be removed soon. "

THanks Dan. But if I’m only retrieving 1000 datapoints per date for one year it is 12000 datapoints (if monthly). So would that be 3 credits i.e. 12000/5000, or would that be 12 credits because there are 12 dates? It is probably a stupid question but I just want to make sure I understand.

philjoe · December 15, 2020, 10:43am

I can’t get my python to work with the “Data” to make labels… anyone else had any luck?

InspectorSector · December 15, 2020, 1:21pm

“I can’t get my python to work with the “Data” to make labels”

I have not tried the API yet as I thought it was still a work in process. I’m not sure what you mean by “labels” because the Silicon Valley potheads decided that labels were the targets of neural nets. If you are talking about column headers, then I would assume that you want to create a pandas DataFrame and then you can attach headers of your choice…