Thanks Marco. Can you explain what you mean by endpoints? And is this something that might happen soon?
And API endpoint is just a fancy name for API function. So we would add an API funciton/endpoint to get the stats for a model. To get the stats for multiple models you need to call it multiple times.
Steve, our APIs are not meant to return 5 years of data every 4 weeks for example. They are more basic operations that return json structures. That’s why we created the DataMiner. It does more complex operations that involve multiple API calls. It also returns the data in a more readable “2D” csv format. But each of the DataMiner operations has to be specifically coded up. It’s open source and we’re hoping others will enhance it.
Thanks Marco.
I am pretty sure I can learn this or someone can help me (Steve for example).
Probably the main reason I am not as up to speed on this as I could be is that some of the downloads you already provide are excellent already. I am thinking of the download method I described in the post “Boosting for Profits.” I hope that is not a burden on you system when I download this way.
There seem to be a lot of interest. I know Steve is making good use of this. I have using downloads already available.
Jim
Thinking about it some more, it probably makes sense to have our APIs a bit more flexible (like getting multiple periods) so they can be more readily used in colab. I’ll discuss with the team
“Would you be requesting ranks on mixed dates with somewhat random gaps between them?
Or could we create an ranks endpoint that is like the RanksPeriod operation in DataMiner which takes a start date, end date and frequency (1 wk, 4 wk, etc)?”
For my use, I would be gathering data for 5 years on a weekly basis. For now, monthly. So essentially I am making 260 (or 60) API calls, one for each week. Then I am concatenating the files together to make one big file, and I attach a date column so I know the date for each row.
“Steve, our APIs are not meant to return 5 years of data every 4 weeks for example. They are more basic operations that return json structures. That’s why we created the DataMiner. It does more complex operations that involve multiple API calls. It also returns the data in a more readable “2D” csv format. But each of the DataMiner operations has to be specifically coded up. It’s open source and we’re hoping others will enhance it.”
I have Python code that I use now but it assumes a flat RS and it makes 60 or 260 API calls. I can use this as is, but if you want the general community to use ML then you need to decide whether it is better to have 260 API calls with small amounts of data or one API call returning large amounts of data. I don’t know your setup. I can tell you that I will be posting about this in a couple of weeks so there is the potential that a lot of people will be trying out my Python code. Be prepared for a lot of API calls ![]()
As for the JSON, if someone can provide code that takes the JSON and outputs a 2D array, then that is fine. My solution is not the general case and I suspect others will struggle if they don’t have a universal solution. i.e. if special code is required depending on the RS structure.
How are you doing the target column? Isn’t it pulling the performance for the prior week and not the next week?
No Phil - I am using inputs delayed by 52 weeks and the target is current. I have a request with Marco to give us factors that accept a negative offset, such as Sales(TTM, -1), which will make things easier in the long run. Historically, you will be able to look forward, and in the present you will get NA because you don’t know the future.
And you are creating the “delay” in your python code? I did the same thing, just wondering if there is a better way.
Philip,
And how do you get excess returns easily? I encourage you to use excess returns if you are not doing this already.
There is too much noise in the market from FED meetings, Trump tweets, Goldman Sachs’ (usually wrong) predictions etc. without it.
Maybe your experience will be different but I think you will want to do that (if you are not already)…
Best,
Jim
Phil - I am using FHist() in the P123 ranking system. I am still studying the possibilities with it. For example, what happens if I use FHist(“Sales(TTM,0)”,52). Is that a correct formula for looking back 1 year? I’m not sure and perhaps P123 can tell me. I seem to recall P123 saying that it won’t give me the results that I am looking for.
Steve that should be fine. FHist uses weeks offset. BTW, hats off to you finding ways around the current limitations
We should get to the proposed enhancements soon, which is to allow specifying additional “label” data on the rank API (again, purely based on technical data).
We have not decided yet if we will allow negative bars or offsets to pull future stats (which could cause unforeseen problems elsewhere and could be confusing), or just create some additional “Future Statistics” functions like “FutureReturn(bars)” , which is also clearer to new users.
To decide we need to know the scope of the type of label data
So what is the minimum requirements for label data ?
Stock total return in the future (example future 1Mo return)
Stock total return relative to the benchmark in the future (example future 1Mo relative return)
Stock future sharpe, sortino, or StdDev
IF these are most if not all of the types of label data I would steer towards adding some future functions rather than using negative offsets/bars.
Thanks
Excellent!!!
In my experience if one cannot get EXCESS RETURNS their time would be better spent cleaning their closets or memorizing the US presidents in order.
There may be some exceptions (that I have not seen) but I can guarantee it will come in handy for many.
Some benchmarks will be better than others. I have always used the average returns of the universe itself. I cannot imagine that a cap-weighted benchmark could work very well. That would be the definition of “bias.”
Jim
Marco - As Jim says, we need to have excess returns as part of the new label.
As for negative offsets, I view it as a critical piece if you want to be a Big Data player. As it stands right now, I have to write some very convoluted virtually unreadable formulae to get the data that I need. I am using filters and custom moving averages, all in the ranking system node. And it is made much more complex by having to use FHist() everywhere. It is all prone to human error and difficult to maintain. Then it is a further problem because I have to create a different ranking system for model predictions. And who knows if I have duplicated everything correctly from the other ranking system.
It is not a case of supporting some technicals with negative offsets in the label data. If something is to be implemented then it needs to be for fundamentals as well and useable in the ranking nodes.
Thanks for listening.
Marco - I was under the impression that FHist(“Sales(TTM,0)”,52) would take the current salesTTM and attempt to retrieve the figure from 52 weeks ago (which is not what I want). What I want is the Sales(TTM,0) as it was 52 weeks in the past. Can I take your work that the latter is correct? Because I think you stated otherwise a couple of years ago.
Also, if I use FHist() in a custom formula, can I again apply FHist() to the to the formula. It translates into FHist( FHist() )
THanks
Steve
Steve,
I still don’t see the need for -ve offsets everywhere. We cannot output fundamental data so can you give me a concrete example of a label not supported by the ones I proposed ?
Marco - If you can’t do it then you can’t do it. I’m not asking for something using technical-based labels. I am just saying that 99% of people won’t be able to follow what I am going to present without having negative offsets. Its not a problem for me.
Here is an example. The target is actual Sales Growth TTM % - estimated Sales Growth TTM %
The target formula is written for training data, so imagine that the inputs all have FHist(“”,52) applied to them. As a general rule, I apply a 1 week delay to all fundamentals. (No offense, I just don’t trust the PIT).
Custom Formulae:
$FiltNTMSalesMean (10FHist(“NTMSalesMean”,53)+6FHist(“NTMSalesMean”,54)+3FHist(“NTMSalesMean”,55)+1FHist(“NTMSalesMean”,56))/20
$D1WSalesTTM FHist(“Sales(0,TTM)”,1)
$D1WSalesPTM FHist(“Sales(0,TTM)”,53)
Target Node: 100*($D1WSalesTTM-$D1WSalesPTM)/$D1WSalesPTM - 100*($FiltNTMSalesMean-$D1WSalesPTM)/$D1WSalesPTM
This is for the target node. There will be similar equations for the inputs, some using loopsum(), loopstd(), etc. If the above works as I intend it (and that is a big IF), then I have to rewrite everything for prediction data sets and take out the FHist( ,52) everywhere.
So yes I believe that it can be done with your existing setup, but honestly only the real geeks will be using AI here.
So I would not have done this. And it looks (falsely) complex.
But what is the problem? Steve just uses a rank as a target. He is predicting a future rank. Nothing wrong with that really (whether I would have done that or not).
Can’t it be downloaded as one of the factors or inputs? Python can take that as a label.
Isn’t the only question whether Hist is behaving in the desired way. Maybe I am missing something but also maybe it looks more complex that it really is.
The target node would look something like this if we had negative offsets:
Custom Formulae:
$FiltNTMSalesMean (10FHist(“NTMSalesMean”,1)+6FHist(“NTMSalesMean”,2)+3FHist(“NTMSalesMean”,3)+1FHist(“NTMSalesMean”,4))/20
$D1WSalesTTM FHist(“Sales(0,TTM)”,-51)
$D1WSalesPTM FHist(“Sales(0,TTM)”,1)
Target Node: 100*($D1WSalesTTM-$D1WSalesPTM)/$D1WSalesPTM - 100*($FiltNTMSalesMean-$D1WSalesPTM)/$D1WSalesPTM
There would be one set of custom formulae and one ranking system that would apply to both training data and prediction data. For prediction data, the target would be NA. Note: it could be simplified even further if FHist( FHist() ) works. But I am probably pushing things too much ![]()
