Single factor testing tool developed using the API

sraby · February 23, 2024, 3:22pm

I converted the Colabe code to PyCharm (Python) so this can be run locally. I hardvoded everything (did not had much time to make changes), but the scipt works. Maybe a Gibhub project would be nice to keep track of the changes.

import p123api
import pandas as pd
import time
import sys
import datetime

def WriteToResultsFile(df_results):
    # Get current datetime to use in file name to make it unique.
    ts = datetime.datetime.now()
    strFileName = 'SingleFactorResults ' + ts.strftime("%m-%d_%H-%M")

    # Include the directory path
    directory_path = '/Users/sraby/Investing/Output/'
    full_file_path = directory_path + strFileName + '.xlsx'

    # Write the settings dataframe and results dataframe to different sheets in the same Excel file
#    with pd.ExcelWriter(strFileName + '.xlsx') as writer:
    with pd.ExcelWriter(full_file_path) as writer:
#        df_settings.to_excel(writer, sheet_name='Settings')
        df_results.to_excel(writer, sheet_name='Results')

    print("Results file created!")


try:
    ##############################
    # HARD CODED INPUT PARAMETERS
    ##############################
    apiId = "xxx"
    apiKey = "xxxxxxxxxxxxxxxxxxxxxxx"
    RankingSystem = "ApiRankingSystem"
    RankingMethod = "2"
    Bench = "SPY"
    Universe ="CAD: Unicorn Universe"
    StartDate = "2014-02-12"
    EndDate = "2024-02-12"
    Freq = "Every Week"
    NumBuckets = 20
    MinPrice = "1"
    PitMethod = "Prelim"
    TransType = "Long"
    Slippage = "0.25"

    #df_settings = pd.read_excel(
    #           '/Users/sraby/Investing/Input/FactorsList_RUN.xlsx',
    #            header=0,
    #            usecols='A:L',
    #            sheet_name='Settings',
    #            engine='openpyxl')
    #   print(df_settings)
    #   df_settings.columns = df_settings.iloc[0]
    #   df_settings = df_settings.iloc[1:]
    #   apiId = str(df_settings.iloc[0]["ID"])
    #    print("API ID:", apiId)
    #    apiKey = str(df_settings.iloc[0]["KEY"])
    #    print("API KEY:", apiKey)
    # RankingSystem = str(df_settings.iloc[0]["RankingSystem"])
    #    RankingSystem = "ApiRankingSystem"  # Hard code for now so users dont accidently wipe out one of their other ranking systems.
    #    RankingMethod = str(df_settings.iloc[0]["RankingMethod"])
    #    Bench = str(df_settings.iloc[0]["Bench"])
    #    Universe = str(df_settings.iloc[0]["Universe"])
    # Sector = str(df_settings.iloc[0]["Sector"])  #Sector is not working. Have request to Dev to look into it.
    #    StartDate = str(df_settings.iloc[0]["StartDate"])
    #    print("Start Date:", StartDate)
    #    EndDate = str(df_settings.iloc[0]["EndDate"])
    #    print("End Date:", EndDate)
    #    Freq = str(df_settings.iloc[0]["Freq"])
    #    NumBuckets = 20  # str(df_settings.iloc[0]["NumBuckets"])  #Current code below only supports 20 buckets.
    #    MinPrice = str(df_settings.iloc[0]["MinPrice"])
    #    PitMethod = str(df_settings.iloc[0]["PitMethod"])
    #    TransType = str(df_settings.iloc[0]["TransType"])
   #    Slippage = str(df_settings.iloc[0]["Slippage"])

    # Read the entire Excel sheet of factor inputs into a dataframe.
    df_factors = pd.read_excel('/Users/sraby/Investing/Input/FactorsList_RUN.xlsx', sheet_name='FactorList')
    testRows = df_factors.shape[0]  # Get the count from the dataframe
    print("Shape of df_factors:", df_factors.shape)

    # Connect
    client = p123api.Client(api_id=apiId, api_key=apiKey)

    # Create the dataframe and column names for the df that will hold the results.
    df_results = pd.DataFrame(
        columns=['FactNum', 'Category', 'Formula', 'VsInd', 'HighLow', 'Description', 'Bench', '1', '2', '3', '4', '5',
                 '6',
                 '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20'])

    # Start the main loop which calls to API to run the perf tests and capture the results for each factor in the input file.
    FactorCount = 1
    for FactorCount in range(0, testRows):
        # Read the next factor/function from the xls file with formula list
        factNum = df_factors.loc[FactorCount, "FactNum"]
        category = df_factors.loc[FactorCount, "Category"]
        formula = df_factors.loc[FactorCount, "Factor/Formula"]
        vsInd = df_factors.loc[FactorCount, "VsIndustry"]
        highLow = df_factors.loc[FactorCount, "HigherOrLower"]
        description = df_factors.loc[FactorCount, "Description"]

        print(str(FactorCount + 1) + " of " + str(testRows) + ". Current factor is " + formula)

        # Future enhancement: I could code this to accept up to a certain number of additional factors (and weight, vsInd, etc) from the spreadsheet. Worthwhile?
        # If do it, create a tab for the additional factors in the input file. Those factors would be added to the ranking system in every run.
        # For example, if we wanted to see what factors complement the FCFGr%TTM and FCFYield factors, I would add the text below:
        #        baseFactors = "<StockFactor Weight=\"0%\" RankType=\"Higher\" Scope=\"Universe\"> \
        #                    <Factor>FCFGr%TTM</Factor> \
        #                     </StockFactor> \
        #                     <StockFactor Weight=\"0%\" RankType=\"Higher\" Scope=\"Universe\"> \
        #                    <Factor>FCFYield</Factor> \
        #                     </StockFactor>"
        baseFactors = ""  # For now, just use this script to test single factors.

        # Change the ApiRankingSystem ranking system to use that factor/function
        strNodes = "<RankingSystem RankType=\"Higher\">" \
                   "<StockFormula Weight=\"0%\" RankType=\"" + highLow + "\" Name=\"TestFactor\" Description=\"\" Scope=\"" + vsInd + "\">" \
                                                                                                                                      "<Formula>" + formula + "</Formula>" \
                                                                                                                                                              "</StockFormula>" \
                   + baseFactors + \
                   "</RankingSystem>"
        data = {
            "type": "stock",
            "rankingMethod": RankingMethod,
            "nodes": strNodes
        }
        dict = client.rank_update(data)

        data = {
            "rankingSystem": RankingSystem, "rankingMethod": RankingMethod, "startDt": StartDate, "endDt": EndDate,
            "rebalFreq": Freq,
            "benchmark": Bench, "universe": Universe, "numBuckets": NumBuckets, "minPrice": MinPrice,
            # bug. Sector not working so removed it "sector": Sector,
            "slippage": Slippage, "pitMethod": PitMethod, "outputType": "ann", "transType": TransType
        }

        try:
            # API call to run the RankPerf
            dict = client.rank_perf(data)
            time.sleep(
                3)  # Colab is not waiting for rank_perf to complete and this is causing the same results to be saved for multiple factors. Added a sleep for now until figure out how to fix it.

            # Append the results from the API call to the dataframe containing the results.
            # Probably a better way to do this? Can only use 20 buckets unless make changes to this code.
            #            df_results = df_results.append({'FactNum': factNum,'Category': category,'Formula': formula,'VsInd': vsInd,'HighLow': highLow,'Description': description,'Bench': dict["benchmarkAnnRet"],
            #                                            '1': dict["bucketAnnRet"][0],'2': dict["bucketAnnRet"][1],'3': dict["bucketAnnRet"][2],'4': dict["bucketAnnRet"][3],'5': dict["bucketAnnRet"][4],'6': dict["bucketAnnRet"][5],
            #                                            '7': dict["bucketAnnRet"][6],'8': dict["bucketAnnRet"][7],'9': dict["bucketAnnRet"][8],'10': dict["bucketAnnRet"][9],'11': dict["bucketAnnRet"][10],'12': dict["bucketAnnRet"][11],
            #                                            '13': dict["bucketAnnRet"][12],'14': dict["bucketAnnRet"][13],'15': dict["bucketAnnRet"][14],'16': dict["bucketAnnRet"][15],'17': dict["bucketAnnRet"][16],'18': dict["bucketAnnRet"][17],
            #                                            '19': dict["bucketAnnRet"][18],'20': dict["bucketAnnRet"][19]
            #                                            }, ignore_index=True)
            new_row = pd.DataFrame([{'FactNum': factNum, 'Category': category, 'Formula': formula, 'VsInd': vsInd,
                                     'HighLow': highLow, 'Description': description, 'Bench': dict["benchmarkAnnRet"],
                                     '1': dict["bucketAnnRet"][0], '2': dict["bucketAnnRet"][1],
                                     '3': dict["bucketAnnRet"][2], '4': dict["bucketAnnRet"][3],
                                     '5': dict["bucketAnnRet"][4], '6': dict["bucketAnnRet"][5],
                                     '7': dict["bucketAnnRet"][6], '8': dict["bucketAnnRet"][7],
                                     '9': dict["bucketAnnRet"][8], '10': dict["bucketAnnRet"][9],
                                     '11': dict["bucketAnnRet"][10], '12': dict["bucketAnnRet"][11],
                                     '13': dict["bucketAnnRet"][12], '14': dict["bucketAnnRet"][13],
                                     '15': dict["bucketAnnRet"][14], '16': dict["bucketAnnRet"][15],
                                     '17': dict["bucketAnnRet"][16], '18': dict["bucketAnnRet"][17],
                                     '19': dict["bucketAnnRet"][18], '20': dict["bucketAnnRet"][19]
                                     }])
            df_results = pd.concat([df_results, new_row], ignore_index=True)

            FactorCount += 1
            # Hopefully code below is temporary. Having a random issue where authentification fails at the end when it tries to write results to the spreadsheet.
            # Code below saves the results to a file after every 100 factors. So no more than 100 API calls are wasted if that auth error should come up.
            # Only the final file is needed, so the rest can be deleted by the user.
            if FactorCount % 100 == 0:
                WriteToResultsFile(df_results)


        # Need to handle case where the formula being ranked has errors.
        except p123api.ClientException as e:
            print(e)
            s = str(e)
            if 'Invalid command' in s or 'Ranking failed' in s:
                df_results = df_results.append(
                    {'FactNum': factNum, 'Category': category, 'Formula': formula, 'VsInd': vsInd, 'HighLow': highLow,
                     'Description': description, 'Bench': "NA",
                     '1': "Failed - Bad formula", '2': "NA", '3': "NA", '4': "NA", '5': "NA", '6': "NA", '7': "NA",
                     '8': "NA", '9': "NA",
                     '10': "NA", '11': "NA", '12': "NA", '13': "NA", '14': "NA", '15': "NA", '16': "NA", '17': "NA",
                     '18': "NA", '19': "NA", '20': "NA",
                     }, ignore_index=True)
                FactorCount += 1
                # Let it continue with other factors from the input file
            else:
                print(
                    'Got some error other than a bad formula and it is not handled. So dumping results to results file and quitting.')
                WriteToResultsFile(df_results)
                sys.exit(1)  # not tested because didnt get any errors to trigger this.
            continue

    WriteToResultsFile(df_results)

except p123api.ClientException as e:
    print(e)
    print('Got unhandled error so dumping results to results file so they are not lost')
    # noinspection PyUnboundLocalVariable
    WriteToResultsFile(df_results)

jrallen81 · February 23, 2024, 4:09pm

thank you both. I have a 2024 goal of learning some programming to be more capable/useful on this sort of thing.

smian · February 23, 2024, 4:10pm

Considering the valuable insights provided by single factor testing and statistics for both beginner and advanced users, it would be awesome to build a dedicated section within the P123 platform.

This section would allow users to easily access and analyze statistics for each factor and will be updated by the weekend data update. Additionally, centralizing this information could reduce the load on P123 system if multiple users are executing the same single factor queries. A win-win.

P123 team, could this be something you can consider adding to your shorter-term roadmap?

pitmaster · February 24, 2024, 12:35pm

Thanks @sraby, this type of research tool is something I use very often.

However for ML you may want to calculate this statistics every period before training. It needs to be fast enough to deal with thousands of factors with millions of rows.

Do you have any estimate how many milliseconds is needed per 1 factor to be calculated?

sraby · February 24, 2024, 12:57pm

@pitmaster, you will not be able to use that tool for machine learning. It is only good to test each factors (i.e. ranking), sequencially. For ML, it is still best to download using the new P123 AI download, and then run your own python ML code.

pitmaster · February 24, 2024, 1:15pm

This is correct.
For for users who want to perform monotonicity testing for ML I may recommend to implement this tool in polars rather than pandas. My implementation that calculates similar stats as p123 uses 240 ms / factor, which is 8x faster than pandas. I have not yet tested numpy implementation which should even faster.

edit: I made mistake in calculations it uses actually 62 ms (milliseconds) per factor or in other words calculates stats for 16 factors per 1 second (my data has 555000 rows)