Thank You Dan!

Jrinne · July 28, 2023, 9:40am

All,

TL;DR: Just one csv file downloads would change everything for P123 forever. It starts and ends with the cvs file. Multiple ways to use it could and should be made available. It does not have to be and should not be made harder than that.

I recently requested that ranks be downloadable in the screener with more than 500 ranks available.

Dan informed me that I can get that in the DataMiner without any programming skills and I very much appreciate that.

I have mentioned before that I would like to in the future (and have done in the past) XGBoost with data from P123. I think the API was actually started when I informed P123 that the fundamentals are not necessary for XGBoost, Random Forests or neural nets.

It is debatable whether fundamentals would help regression. Maybe for some but I think one might try a regression with the ranks as the independent variable and get better answers than if you use the fundamentals.

Or you could just use the z-score instead of fundamentals and get the same answer. The potential for P123 is Ginormous, HUGE, AWESOME, Unlimited!!! There is no limit to the machine learning abilities at P123. I do not know what other adjectives to use.

I do not care if someone prefers spreadsheets. Please use spreadsheets, an abacus of whatever you like. You do you. Whatever floats your boat. I am not causing anyone any harm if I ask for some data in a formant the facilities machine learning.

TL;DR: P123 has absolutely incredible potential for machine learning. No fundamental data required. I thought that was already understood.

Dan was very kind and responded to my post. When I clarified that I only need ranks he responded (paraphrasing) that I should check out the DataMiner—which I am doing now.

ANY MACHINE LEARNING PROGRAM JUST NEEDS COLUMN HEADS OF FACTORS OR FEATURES (e.g., price to free cash flow) WITH THE RANKS IN THE COLUMN AND THE TARGET (OFTEN THE RETURNS FOR THE NEXT WEEK OR THE NEXT MONTH). And an index I guess which might as well be ticker and date.

I get now that P123 is never going to answer any questions about collinearity, how Ridge Regression could help with that or whether I want to use out-of-bag validation for a random forest, monotonic constraints for XGBoost or subsampling of the columns and that I will have to set up my own methods for early stopping. I get that now.

So again, I will do it myself one way or the other.

Please check if I am alone in this but it would be nice to have a dowload the has a column of the features (rank or z-score) and the next week or month’s returns.

All lined up without having merge returns (when I do should I do an external merge or internal merge and wha t if I want to sort more than one thing) and ranks with a sort.

If this is available would you please point me to it. If I am not alone in thinking that would be nice would you consider doing it. But I think that would be useful for ANYONE doing machine learning AND anyone wanting to use SPSS or JASP for that matter.

Newbies will use this. It should be FRONT AND CENTER ON THE HOME PAGE if you want to attract undergrads looking to manipulate data with Python, R, SPSS, JASP or whatever their professors like.

If P123 wants to buy new servers, hire someone who has some experience doing machine learning and take years to develop this that is fine. Keep this all mysterious and out of control of the P123 members with little ability to get requests answered (except when Dan responds) I just have one question: is that the best business model.

People, for now, will get immediate answers from ChatGPT as well as some help with any code.

Please point me to that DataMiner download or consider providing it if it is not available. I do not think you will regret it from a business perspective (irregardless of what I may want).

Ii will run it myself in Colab paying a fee for faster runs if necessary. I would prefer to avoid sorting separate, individual rank download and return downloads merging them (external merge or internal merge?) and hoping it all lines up with no errors.

And again thank you Dan for responding to a forum reques. The answer does not always have to be yes and you did respond. You understood when I clarified about ranks and responded again. Interacted

If P123 were to help newbies for now…how do I make the date the indiex?…How do I double sort ranke and date?….internatl merge or external merge?…Which sort is best?

ChatGTP is better for any manchine learning for now. P123 could at least help people who are not expert programs do some of the munging and data wrangling.!

So with no irony. Multiple downloads, Switch the indexes to date AND ticker. External merge. Then double index. Make sure the set how you handle NAs. Everyone can do that right? Plus it is fun.

Just one csv file download would change everything forever for P123 I think (in a good way). More effective and I would guess less expensive than anything done with the AI/ML to date.

Jim

danp · July 28, 2023, 5:34pm

Hi Jim,

it would be nice to have a download the has a column of the features (rank or z-score) and the next week or month’s returns.

The DataUniverse operation in the DataMiner does what you are asking except that the .csv file is sorted only by Date. Sounds like you want a secondary sort done on Ticker, correct? Please look at the examples below and let us know if there is any other functionality missing.

Example:

Main:
    Operation: DataUniverse
    On Error:  Stop
    Precision: 4

Default Settings:
    PIT Method: Prelim
    Start Date: 2020-01-01
    End Date: 2020-03-01
    Frequency: 1Week # ( [ 1Week ] | 2Weeks | 3Weeks | 4Weeks | 6Weeks | 8Weeks | 13Weeks | 26Weeks | 52Weeks )
    Universe: DJIA
    Include Names: false
    Formulas:
        #Target ie future excess returns
        - FutRelRet_SPY: FutureRel%Chg(20,GetSeries("SPY"))  #4 week future total return relative to the SPY ETF
        - FutRelRet_Ind: FutureRel%Chg(20,#Industry)  #4 week future total return relative to its industry
        #Other 'future' return functions: FutureRel%Chg_D() FutureRel%Chg_W() Future%Chg() Future%Chg_D() Future%Chg_W()

        #Factors. Up to 100 formulas.
        - FRank("EarnYield",#ALL,#DESC)
        - ZScore("Pr2SalesQ",#All)

The output of the above example is a csv file. Here it is opened in Excel:

The Ranks operation can be used also if you wanted to get the ranks from an existing ranking system and then include the future return factors in the Additional Data section. Example:

Main:
    Operation: Ranks
    On Error:  Stop # ( [Stop] | Continue )
    Precision: 4 # ( [ 2 ] | 3 | 4 )

Default Settings:
    PIT Method: Prelim # ( [Complete] | Prelim )
    Ranking System: 'Core: Quality'
    Ranking Method: NAsNegative
    Start Date: 2020-01-01
    End Date: 2020-06-01
    Frequency: 4Weeks
    Universe: DJIA
    Columns: factor #( [ranks] | composite | factor )
    Include Names: true #( true | [false] )
    Additional Data:
        - Future 1MoRet: Future%Chg(20)
        - FutureRel%Chg(20,GetSeries("$SP500EQ")) #Relative return vs SP500 equal weight
        - PS Rank: FRank("Pr2SalesTTM",#all,#asc)
        #Up to 100 formulas

The output from this one is too wide to paste here, so I narrowed columns L thru W:

Jrinne · July 28, 2023, 5:42pm

Thank you Dan.

I assume each time I do this I can just concatenate each new factor (either in Python or a spreadsheet) and this will line up automatically (no double sorts)? Probably, I assume, and I can check this on my own without you answering this now.

Actually I can even do a double-sort (e.g., date then ticker) if it ends up lining up.

Thank you Dan. I will look a that closely and only repost with real questions that I have trouble answering on my own.

Awesome!!!

Jrinne · July 29, 2023, 12:18pm

Thank you again Dan.

Not sure why I could not get it before but I appreciate your help and your example. Also thank you for the Mac OS implementation!!!

The only question I would have is whether FutureRel%Chg(20,GetSeries(“$SP500EQ”)) is adequate. In my experience one has to de-noise and de-tone (de-tone was a concept developed by de Prado or at least he made it common) by taking excess returns. Without it the results are meaningless or simply without statistical or practical significance with machine learning.

I have taken excess returns relative to the universe in the past. Maybe I can create my own series or something. Maybe I do not need to do that .Maybe excess returns to a highly correlated universe would be adequate (maybe even better, I am not sure). I will try a few things with this.

Kind of like programming to me and I am not the best programmer but it gives a good amount of control which is obviously a good thing that I continue to explore and will take advantage of I am sure.

In any case, pretty incredible what you have done with DataMiner and I definitely appreciate your help.

So P123 is now for sure the best machine learning site available to retail investors. I wish more people knew that. Maybe they do. The most advanced machine learners to do not post much I think. Only those who have much to learn yet (like me) still post, I believe. So there maybe be more machine learners than I am aware at P123. I’m pretty sure there are—in no small part due to what you have done with DataMiner and the API.

Jim

Jrinne · July 30, 2023, 12:42pm

I moved this post to a new thread. Dan has already provided a perfect answer to my question. There is not much to add to his answer here.