Automatically test and optimize ranking systems?

InspectorSector · July 23, 2023, 2:20am

Why do I care what that paper says?

Jump on the trend and find the best factors for that trend. It is a simple concept that doesn’t need PHD mumbo jumbo to make me believe in it.

Its great that you are getting good results with cross-universe testing.

If I had tried to make my Cloud Computing strategy work on Oil & Gas or even one of the main indices, then I would have abandoned it long before deployment.

Go with whatever works for you. The cross-validation doesn’t work for me. I stopped using that a long time ago.

InspectorSector · July 23, 2023, 2:31am

There is nothing inherently evil with over-optimization provided you understand what you are doing and how to use the results. It sounds like your mind is fixed and that is cool. Go with what works for you.

Whycliffes · July 23, 2023, 2:53am

You are to smart for me; much of what you say is way over my head I can understand that there is great danger in overoptimization, but wouldn’t some of the problem be solved with testing the strategy in subuniverses, or by optimising the ranking to the last 10 years and letting the first 10 year be your out of sample?

Whycliffes · July 23, 2023, 2:54am

Where can I find this? Is there also a thread that is discussing this?

Whycliffes · July 23, 2023, 3:17am

I want to test - in some way - what you and testuser have described as your prosesses in optimising the node weight in a rankingsystem.

What I understood:

creat a large node rankingsystem in nodes you belive in
give a random weight of 2-12%, and many of them 0%
Test the ranking in 5 sub-universes
Each node will be given a weight in at least 20 different test
and so on…

This will give some hint on what nodes drive performance and also in what combination with other nodes gives the best performance.

My main goal is actually very simple, I want to overoptimize the weighting of nodes in a ranking system without manually creating 1000 ranking system.

I tried looking at dataminer, but dident see any function where to test a massiv combination of node weights in a rankingsystem? (DataMiner Operations - Help Center)

In my subscriotion I dont have acsess to Optimizer, but maybe I should upgrade.

Whycliffes · July 23, 2023, 3:36am

This discussion really took off. Thank you for all your replies!

The main request from me was a simple one, I want to find the best way to (over)optimize the weight of the 100 nodes I believe in. But without doing the manual work of creating 1000 of rankingsystems with the different weight combinations.

Jrinne · July 23, 2023, 7:56am

Whychliffes,

I agree these are good solutions. Testing the sub-universes is called subsampling. And it is common in machine learning.

We are in agreement that this is a good idea. You wanted it automated at P123? I agree P123 needs to automate this if it is going to do machine learning (including neural nets). Unless P123 wants to reinvent the entire field of machine learning.

You also seem to agree that cross-validation is a good idea as this is cross-validation: “or by optimising the ranking to the last 10 years and letting the first 10 year be your out of sample?”

I agree you have a good idea here. You wanted it automated, perhaps. P123 will need to automate something like this if it wants to do machine learnings (including neural nets). Unless P123 wants to reinvent the entire field of machine learning.

I think we may be in complete agreement as near as I can tell.

In addition, I also agreed that you algorithm was a good idea and could be automated—especially since the computer time will be used one way or another (if you run it manually or if it its fully automated) And since it aligns somewhat with what Yuval is recommending (perhaps you got some of it from him).

I did support your algorithm as something that was being done elsewhere and as being mainstream. Calling it an Evolutionary Algorithm and a type of gradient descent.

There was some jargon in support of your ideas and methods. I agree on that too

BTW, good thing all of those papers you link to in the forum do not have any complex concepts. In any case, thank you for those links. I think they belong in the forum. I would only add we should occasionally use some of the ideas and methods in those links even if it occasionally involved discussion of the method in the papers (in the forum).

Here is what I mean about sincerely liking your papers. You sited this paper in your post: Useful factors are fewer than you think

This paper is mainly about the “the false discovery rate (FDR)”
it refers to “instrumented PCA (IPCA)” which Duckruck has discussed. So you may be the first person to introduce this to the forum!!!
It also discussed " Benjamini-Hochberg Procedure" which is used to set the false discovery rate (see #1 above)

I liked you paper and discussed the Benjamini-Hochberg (BH) Procedure which can be done in a spreadsheet, Good paper.

I actually think someone could take all of Dan’s factors and then use the BH procedure to select the best factors if they did not want to use all of them. It would be only one way to do it but a good one I think.

But almost the entire paper was about the use of the BH procedure and the conclusion they arrived at using that procedure.So here is what I truly do not understand: Were we supposed to just read the abstract and accept the conclusions without any understanding of how the authors reached those conclusions?

You endorse the conclusions with no need to understand the method? Or maybe you like the method a lot and that is why you presented the paper but no need to consider ever using it even though it could be done with Dan’s factors in a spreadsheet? I truly do not quite get it.

Anyway, good paper without any irony. And in fact let me say great paper with a method that could easily be made useful for some. I learned something. Thank you for that.

Tl;DR: You have some great ideas, IMHO. Some of them could be improved or streamlined, I think, and perhaps you should not be against that progress. They could all be automated by P123 and some will have to be automated if P123 wants be respected and endorsed by the machine learning community (and market to the machine learning community). I do not believe that all of the things machine learners do contradict what you find useful.

Not every good idea will be trivial to understand or implement.

Jim

yuvaltaylor · July 23, 2023, 3:02pm

Whycliffes:

I want to test - in some way - what you and testuser have described as your prosesses in optimising the node weight in a rankingsystem.

What I understood:

creat a large node rankingsystem in nodes you belive in

give a random weight of 2-12%, and many of them 0%

Test the ranking in 5 sub-universes

Each node will be given a weight in at least 20 different test

and so on…

This will give some hint on what nodes drive performance and also in what combination with other nodes gives the best performance.

My main goal is actually very simple, I want to overoptimize the weighting of nodes in a ranking system without manually creating 1000 ranking system.

I tried looking at dataminer, but dident see any function where to test a massiv combination of node weights in a rankingsystem? (DataMiner Operations - Help Center)

OK, here’s what I do, and I think you can do it too. Use Excel to generate a bunch of ranking system weights. There are countless different ways to do this. I would suggest starting with, say, 100 random systems with each node getting either 0% or a multiple of 4%. You could, for instance, generate 100 columns, 25 rows long, of random numbers between 1 and the number of nodes you want to test, and then use the Countif command in 100 additional columns. After you test those you could test variations by increasing some nodes by 2% or 4% and decreasing others by 2% or 4%. You can create .xml files for all of these using the method you suggest in your initial post. You can then create 100 ranking systems by simply copying one ranking system and substituting the .xml files. That’s a bit of grunt work but not so bad.

The DataMiner code would then look something like this:

Main:
    Operation: RankPerformance
    On Error:  Stop
Default Settings:
    PIT Method: Prelim
    Buckets: 10
    Rebalance Frequency: 4Weeks
    Benchmark: IWM
    Start Date: 2013-09-04
    End Date: 2023-01-18
Iterations:
    -  
        Name: 1 US A
        Ranking: "all f - Copy"        
        Universe: eval universe 1
    -
        Name: 1 US B
        Ranking: "all f - Copy(2)"        
        Universe: eval universe 1
    -
        Name: 1 US C
        Ranking: "all f - Copy(3)"        
        Universe: eval universe 1
    -
        Name: 1 US D
        Ranking: "all f - Copy(4)"        
        Universe: eval universe 1
    -
        Name: 1 US E
        Ranking: "all f - Copy(5)"        
        Universe: eval universe 1
    -
        Name: 1 US F
        Ranking: "all f - Copy(6)"        
        Universe: eval universe 1

And so on. After you go through Copy(100) you switch to eval universe 2, then eval universe 3, and so on.

Try it with just a few ranking systems and universes and see what you think. I’ve found it pretty easy to use.

If you don’t want to create your own ranking systems with new XML files you could actually imbed the XML files into the DataMiner instructions. See Dropbox - setting_ranking.yaml - Simplify your life

Try it and let me know what you think! You can certainly vary the parameters quite a bit–like if you want 20 buckets instead of 10, and so on.

Whycliffes · July 23, 2023, 3:41pm

Thank your for the last post.

Very practical, but I’m having some difficulties understanding how to best produce 100 ranking systems with different rank weights in Excel.

I have made an example and randomly sorted some of my nodes into separate rows and let the column with weight be open for editing. Is there a simpler and better way of doing this?

What I then usually do is finish the “code” in Excel, copy it to Notepad to remove space, and paste it into the ranking system.

yuvaltaylor · July 23, 2023, 4:26pm

A better way would be to write a program in Python that will do this for you. But if you can’t do that, then this seems like a very good way to generate ranking systems.

Jrinne · July 23, 2023, 5:10pm

Yuval,

Thank you for the method. It has a lot of good ideas and it seems to work for you and perhaps Whycliffes if he as been doing this for a while.

Also I am getting the data I need now. I don’t get paid to promote P123 and I don’t sell my systems so maybe I can be forgiven for not going into details. But P123 gives me what I need and this is meant as a compliment to P123 and its usefulness.

I will say I use a lot of Python but my poor programming skills are what limit me. Bard and ChatGPT help. You have someone who helps you write Python code for you being in the business and in an office full of programmers?

Whycliffes does wish for something simple. Of note is he seems to program well. Maybe extremely well. So even someone who seems excellent at Python programing wishes for something a little easier.

It probably is not really in my interest to make this easier for everyone else as I have what I need now.

But FWIW if someone knows no Python at all they cannot even get the data they need much of the time here at P123. They cannot do what you are suggesting here AT ALL (because of the data downloads if noting else).

I don’t know who you are trying market to really. Your marketing seems contradictory at times in the forum. I guess those using the API for deep learning can ignore questions about their techniques that are essential to what they are doing with machine learning and the API. I think they generally do. Clearly, most have no need for the forum and one becomes aware of them only because a technical issue arises.

I guess my practical question is this: Does the contract with FactSet allow for any more direct Excel downloads from P123 that would be useful to people who are not experts in Python and who do not come here just for the API? And if so, have you asked what they might want in that regard?

Jim

Whycliffes · July 30, 2023, 4:00pm

Im not sure what will be the best and most effective method to create the 100 and more system? #substituting the .xml files"

And also, I used this: Generate Random Matrices – Online Math Tools to create something like this weighting

Is that what you meant?

Then it would look like this: Nodes Ranking YT - Google Sheets