Single factor testing tool developed using the API

Hello Dan, I will test it out, since I have been using your cloab python code for a while now. I had to make some changes because authentication was broken. I am also looking into converting from colab to a local python on my mac (would save on the colab pro subscription).

InmanRoshi
Thanks for sharing the auth fix. I made that change to the shared Colab script and it ran with no issues.

port123i2022
The shared Google drive mentioned earlier in this thread has a “ScoredFactors_Template.xls” file. Click that file and select Download. This file has the single factor results from 2021 that has 10 years of data.
The link to that shared folder is: https://drive.google.com/drive/folders/10P3ZnGVOFQeCjpXx_oFuNHhAprQ8X3Df?usp=sharing

Thanks for this. It’s really one of the better things I’ve seen, which is of incredible help, and with some surprising results. I hope that such a “study” are published regularly on the forum :blush:

I understand that is has been tested in 10 years, but which universe is it tested on?

The starting universe was United States Incl Foreign Primary. The additional universe rules are shown on the ‘variables’ tab in the Excel file.

I dont know if I will post results like these in the future because we should soon have an AI tool available that would give better information. Users may want to learn to use this Colab script to run the test for themselves so that they can change the universe and rank performance settings to meet their specific needs.

I’ve now tested some of the nodes in the spreadsheet (top 200 score). Thanks for the information danp!! I hope you and others continue to publish such overviews, even if we get ML on P123. This spreadsheet and post have been immensely helpful.

I also suspect to many ML — including myself — will bee very difficult, and hard to learn. :frowning:

1 Like

I have used Danp’s tool extensively, it has been incredibly useful and informative, not only for building a ranking system, but also for seeing and studying how different the results can be between “good” nodes in the EU vs. US vs. Canada, among other things.

Would anyone like to share their spreadsheets? I am happy to share mine as well. I am interested in as much information and spreadsheets as possible, including to see which 1. markets you are testing it against, 2. which nodes you are testing, and 3. how Rebalance Frequency affects the results (I have mostly used 1 week).

By the way, I have created this spreadsheet to quickly compile different results from the spreadsheet into a ranking system using a “text editor”: https://docs.google.com/spreadsheets/d/113PgHttct83sSLlPa7SGd-RjMlmRb-2B4NDv-EuJhi4/edit?usp=sharing This way I can quickly test combined ranking systems, from “factor themes” portfolios, large systems with several hundred nodes to small ranking systems.

Hello Whycliffes, I will share my “input” spreadhseet, I have been adding more factors over time.

I wrote a python script that basically allows one to extract every factor from any public ranking system, remove the duplicates, and spit out a new XML ranking system. It has several hundred nodes. Its not perfect though. It simply grabs each formula/factor from the XML, removes all the white spaces, and excludes exact matches. I can share the script and/or the XML ranking system if anyone is interested.

Tony

I’m quite interested!!

1 Like

I am also very interrested

I would like to see that, too. I need to catalog the factors used in my ranking systems.

I would love that. I have used this time consuming solution: Extract Text Between Two Characters - PhraseFix to filter out.

Here ya go.

Copy this python code to a file called Dups.py (or whatever you chose. The name makes no difference )

Start with a ranking system you like.

Make sure to copy the RS XML from “raw editor (no ajax)” section in the ranking system screen or the XML will not be formatted correctly.

Save the RS XML as a file called “in.xml” in the same directory as the Ptyhon program “dups.py”

Many of the public RSes have old depricated factors that will give an error when you try to paste them back into an RS on the website.

There is a text file called “invalidFactors.txt”. The program will check each factor against the list in that file and remove the bad factors.

If you come across any more bad factors, you can add it to this file to save yourself future grief.

Copy and past a bunch of new factors from some RS into “In.xml” without the beginning/ending <RankingSystem RankType="Higher"> </RankingSystem> tags.

You cannot have those tags more than once in an RS.

Run dups.py

It will remove all the duplicate factors and save the output to “out.xml”.

Copy “out.xml” to “in.xml” (or rename the files) so that “in.xml” now conatins all your unique factors.

Repeat… copy another RS to the end of “in.xml” (minus the <Ranksystem> tags) and run dups.py again

After each running of dups.py, replace in.xml with out.xml.

Make sure any XML you copy to in.xml stays within the
<RankingSystem RankType="Higher"> </RankingSystem> tags.

Those should be the first and last tags in every RS XML file and only appear once.

When you are happy with your giant library, you can copy your XML file back into a blank RS on the website using the “text editor” button on the RS page.

If you have no idea how XML files are constructed you may want to read up on it. You don’t need to know much about XML to use this.

The best place I know to find lots of public factors is in the website Search box → Search for Systems and Strategies.

The program strips out white spaces and comments only when doing the duplicate comparison.
It writes the original factor unmolested to out.xml.

I have not yet converted Dan’s list of factors in his Excel file to an RS.

If someone has converted it to an RS, please send it to me.

If you think of any interesting additions you would like to see, please let me know.

If enough people show interest I may add requested features.

# Dups.py

# Will delete duplicate factors using different criteria depending on the factor type
# Duplicate Compsites and Conditionals with the same name will be deleted
# For all other factors and formulas, the actual factor/formula is used regardless if the names are the same or not

import lxml.etree as ET
import pprint

source_XML      = 'in.xml'
destination_XML = 'out.xml'

tree = ET.parse(source_XML)
root = tree.getroot()

with open('invalidFactors.txt') as f:
    BadFactors = f.read().splitlines()

for elem in list(tree.iter()):
    if elem.tag in ("StockFormula","StockFactor"):
        for e in elem:
            if e.text in (BadFactors):
                parent = elem.getparent()
                print("Deleting Bad Factor: ", elem.tag, e.text)
                parent.remove(elem)




def find_in_list_of_list(char):
    for sub_list in factorList:
        if char in sub_list:
            return (factorList.index(sub_list), sub_list.index(char))

def childTextsList(node):
    global factorList
    texts= list()
    sep = '//'
    new = False

    if any (factor in node.tag for factor in ["Composite","Conditional"]):
        s = node.attrib['Name']
        factName = "".join(s.split())

        if sep in factName:
            factName = factName.split(sep, 1)[0]  # STRIP OUT COMMENTS

        found = find_in_list_of_list(factName)
        if not found:
            new = [child.tag, factName, 1]
            factorList.append(new)
        else:
            factorList[found[0]][2] += 1
            new = False

    elif any (factor in node.tag for factor in ["StockFactor","IndFactor","StockFormula","IndFormula","SecFormula"]):                  
            for subchild in list(node):
                s = subchild.text
                # Strip out white spaces
                factName = "".join(s.split())
                if sep in factName:
                    factName = factName.split(sep, 1)[0]  # STRIP OUT COMMENTS

                found = find_in_list_of_list(factName)
                if not found:
                    new = [child.tag, factName, 1]
                    factorList.append(new)
                else:
                    factorList[found[0]][2] += 1
                    new = False
    return new

nodes = root.xpath('//RankingSystem/*')

StartCount = len(root.xpath("//RankingSystem/*"))
print("*****************************************") 
print("Beginning Total Count: ", StartCount)
print("*****************************************") 
totalDeleted = 0

factorList = list()

for child in nodes:
    newFactor=childTextsList(child)
    if not newFactor:
       child.getparent().remove(child)
       totalDeleted += 1


pprint.pprint(factorList)
print("******************")
print("Duplicates deleted:", totalDeleted)
EndCount =  len(root.xpath("//RankingSystem/*"))

print("******************")
print("Start Count:\t",StartCount)
print("End Count:\t", EndCount)
print(EndCount - StartCount,"factors difference.")



tree.write(destination_XML)

invalidFactors.txt

BV5YCGr%
Sales3YCGr%
PEG
Prc2SalesIncDebt
InsOwnerSh%
EarnYieldOld
ShsOutAvgTTM
Beta
CF5YCGr%
NI%ChgPQ
NI5YCGr%
Sales5YCGr%
SGRP
SSGA
SOPI
RTLR
PEGInclXor
LTGrthRtLow

Does the API not return each credit transaction with the number of credits used and remaining?

Whenever I update APIuniverse or APIrankingsystem, I get an xml string returned with that info.

Tony

Hi Whycliffes - I replied to the same question in the chat you sent before I saw this post. The tests you were running are still using 2 credits per test (ie each factor tested). I provided more detail from the log files in the chat. If you have questions, it would be better to discuss them in that chat instead of this forum thread.

I use a program on my mac called “EasyTransformData” to do the manipulation back and forth (both ways)

Danp, I really appreciate what you did here. I’m having some trouble so am posting hoping to see what I’m doing wrong. The code seems to work until it gets to factor 85, as seen in the first screenshot. I added the second screenshot in case it shows a useful error message.


The Colab script has been fixed. I also replaced some of the factors in the factor list that were disabled back in 2021. There are probably other factors in this list that may not work since this list was created back in 2021, but the script should not fail if any bad formulas are encountered.

1 Like

For someone like me this is amazing, thank you. Just so I’m clear, as of now there are not other tools that extend factor/ranking analysis under different assumptions, at least not as directly as this, correct? And I would assume the ML implementation is not directly related? Previously, I was spending a lot of time changing one thing at a time and re-running.

This is currently the only tool that automates running rank performance tests on a large number of different factors. We are discussing the possibility of creating a new tool for this that would not require the user to deal with Colab since that is confusing for some users.

This single factor testing tool is not directly related to the AI project but it could be useful to create a list of factors to use as features in an AI model.

We also have the Optimizer which lets you define a set of tests where you vary the weights assigned to each factor in the ranking system and then run all the iterations and return the results.

1 Like