Data download the day of rebalance for machine learning

danp · August 26, 2023, 2:21am

You would get the same data you would get if you ran a screen on the website as long as you call rank_ranks with the as of date = the current calendar date (or calendar day-1).

I attached a spreadsheet where I tested that scenario today.
verify API can return TODAYS data.xlsx (68.3 KB)

jlittleton · August 26, 2023, 4:29am

Thanks! I think that makes sense. I will have to try it out next week to make sure it still works on other days of the week.

While this means you cannot download historical daily ranking data I think this does mean that folks can rebalance using yesterdays data any day of the week!

I posted it before, but here is my download code again. You can paste it into Google Colab, or run it locally on your machine. It is a lot of code, but should have nice explanations to make it easier to understand each functions inputs.

import and install, you may need to install another way if you are not using colab

!pip install --upgrade p123api # This is only for google colab
import p123api
import pandas as pd
import numpy as np
from datetime import timedelta, datetime

Class for getting the rank downloads

class Portfolio123API:
    def __init__(self, api_id, api_key):
        self.client = p123api.Client(api_id=api_id, api_key=api_key)

    def download_rank(self, universe, ranking_name, date, add_data, pit='Prelim', precision=4, method=2, names=False,
                      NACnt=True, finalStmt=True, node_type='factor', currency='USD'):
        """
        Generate a ranking pandas DataFrame based on the provided inputs. Uses 1 api credit per 25,000 data points as of Aug 2023.

        Parameters:
        -----------
        universe : str
            Name of the universe on Portfolio123.
        ranking_name : str
            Name of the ranking system on Portfolio123.
        date : str
            Date for which the ranking is being generated. Format is 'YYYY-MM-DD'
        add_data : list
            Additional data to be considered for ranking. like 'Close(0)' which is fridays close. Note some things may require the PIT license.
        pit : str, optional
            Period in time, e.g., 'Prelim' or 'Complete', default is 'Prelim'.
        precision : int, optional
            Number of decimal places for ranking scores, default is 4.
        method : int, optional
            Numeric code representing the ranking method, default is 2 which is negative, 4 is neutral
        names : bool, optional
            Flag indicating whether to include ticker names in the output, default is False.
        NACnt : bool, optional
            Flag indicating whether to include the count of missing values, default is True.
        finalStmt : bool, optional
            Flag indicating whether to include if the stock has a final statement, default is True.
        node_type : str, optional
            Type of node for ranking, e.g., 'factor' or 'composite', default is 'factor'.
        currency : str, optional
            Currency for monetary values, default is 'USD'. 'USD' | 'CAD' | 'EUR' | 'GBP' | 'CHF'

        Returns:
        --------
        pandas.DataFrame
            A DataFrame containing the generated ranking data. Added the date as a column
      """

        ranking = self.client.rank_ranks({
            'rankingSystem': ranking_name,
            'asOfDt': date,  # Formated as 'yyyy-mm-dd'
            'universe': universe,
            'pitMethod': pit,
            'precision': precision,
            'rankingMethod': method,  # 2-Percentile NAs Negative, 4-Percentile NAs Neutral
            'includeNames': names,
            'includeNaCnt': NACnt,
            'includeFinalStmt': finalStmt,
            'nodeDetails': node_type,  # 'factor', 'composite'
            'currency': currency,
            'additionalData': add_data  # Example: ['Close(0)', 'mktcap', "ZScore(`Pr2SalesQ`,#All)"]
        }, True)  # True - output to Pandas DataFrame | [False] to JSON.

        dates = pd.to_datetime([date] * len(ranking))
        ranking.insert(0, 'date', dates)
        return ranking

    def download_universe(self, universe, date, formulas, pit='Prelim', precision=4, names=False, currency='USD'):
        """
        Generate a pandas DataFrame based on the provided inputs. Uses 1 api credit per 25,000 data points as of Aug 2023.

        Parameters:
        -----------
        universe : str
            Name of the universe on Portfolio123.
        date : str
            Date for which the ranking is being generated. Format is 'YYYY-MM-DD'
        formulas : list
            Additional data to be considered for ranking. like 'Close(0)' which is fridays close. Note some things may require the PIT license.
        pit : str, optional
            Period in time, e.g., 'Prelim' or 'Complete', default is 'Prelim'.
        precision : int, optional
            Number of decimal places for ranking scores, default is 4.
        names : bool, optional
            Flag indicating whether to include ticker names in the output, default is False.
        currency : str, optional
            Currency for monetary values, default is 'USD'. 'USD' | 'CAD' | 'EUR' | 'GBP' | 'CHF'

        Returns:
        --------
        pandas.DataFrame
            A DataFrame containing the generated ranking data. Added the date as a column
      """
        ranking = self.client.data_universe({
            'universe': universe,
            'asOfDt': date,  # 'yyyy-mm-dd'
            'formulas': formulas,  # ['Close(0)', 'mktcap']
            'pitMethod': pit,
            'precision': precision,
            'includeNames': names,
            'currency': currency
        }, True)  # True - output to Pandas DataFrame | [False] to JSON.

        dates = pd.to_datetime([date] * len(ranking))
        ranking.insert(0, 'Date', dates)
        return ranking

    def download_weekly_ranks(self, universe, ranking_name, start_date, end_date, add_data, pit='Prelim', precision=4,
                              method=2, names=False, NACnt=True, finalStmt=True, node_type='factor', currency='USD'):
        """
        Download ranking from multiple dates. Note that to calculate some additional stats like alpha to the universe some additional date is required!
        Uses 1 api credit per 25,000 data points as of Aug 2023.

        Parameters:
        -----------
        universe : str
            Name of the universe on Portfolio123.
        ranking_name : str
            Name of the ranking system on Portfolio123.
        start_date : str
            Start date to get data. Format is 'YYYY-MM-DD'. Note that the resulting dataframe will use the previous Saturday as the date
        end_date : str
            End date to get data. Format is 'YYYY-MM-DD'. Note that the resulting dataframe will use the previous SSaturday as the date
        add_data : list
            Additional data to be considered for ranking. like 'Close(0)' which is fridays close. Note some things may require the PIT license.
        pit : str, optional
            Period in time, e.g., 'Prelim' or 'Complete', default is 'Prelim'.
        precision : int, optional
            Number of decimal places for ranking scores, default is 4.
        method : int, optional
            Numeric code representing the ranking method, default is 2 which is negative, 4 is neutral
        names : bool, optional
            Flag indicating whether to include ticker names in the output, default is False.
        NACnt : bool, optional
            Flag indicating whether to include the count of missing values, default is True.
        finalStmt : bool, optional
            Flag indicating whether to include if the stock has a final statement, default is True.
        node_type : str, optional
            Type of node for ranking, e.g., 'factor' or 'composite', default is 'factor'.
        currency : str, optional
            Currency for monetary values, default is 'USD'. 'USD' | 'CAD' | 'EUR' | 'GBP' | 'CHF'

        Returns:
        --------
        pandas.DataFrame
            A DataFrame containing the generated ranking data from one date to another
        """

        current_date = datetime.strptime(start_date, '%Y-%m-%d')
        end_date = datetime.strptime(end_date, '%Y-%m-%d')
        combined_dataframe = pd.DataFrame()
        required_data = ['Open(-6)/Open(-1)-1', 'Open_W(-4)/Open(-1)-1',
                         'Open_W(-12)/Open(-1)-1']  # This gives a Monday to Monday open gain which is what I trade. Change if you trade another time

        while current_date <= end_date:
            previous_sunday = current_date - timedelta(
                days=(current_date.weekday() + 1) % 7)  # This calculates a more accurate asofDate
            previous_sunday_str = previous_sunday.strftime('%Y-%m-%d')
            dataframe = self.download_rank(universe, ranking_name, previous_sunday_str, required_data + add_data, pit,
                                           precision, method, names, NACnt, finalStmt, node_type, currency)
            dataframe.rename(columns={'formula1': 'nw_change'}, inplace=True)
            dataframe.rename(columns={'formula2': 'nm_change'}, inplace=True)
            dataframe.rename(columns={'formula3': 'n3m_change'}, inplace=True)

            # Calculate the universe gain and then each stocks alpha!
            nw_univ_gain_percentage = dataframe['nw_change'].mean()  # Calculate the universe return
            dataframe['nw_alpha'] = dataframe[
                                             'nw_change'] - nw_univ_gain_percentage  # Calculate the alpha and add to the dataframe

            nm_univ_gain_percentage = dataframe['nm_change'].mean()  # Calculate the universe return
            dataframe['nm_alpha'] = dataframe[
                                             'nm_change'] - nm_univ_gain_percentage  # Calculate the alpha and add to the dataframe

            n3m_univ_gain_percentage = dataframe['n3m_change'].mean()  # Calculate the universe return
            dataframe['n3m_alpha'] = dataframe[
                                              'n3m_change'] - n3m_univ_gain_percentage  # Calculate the alpha and add to the dataframe

            combined_dataframe = pd.concat([combined_dataframe, dataframe],
                                           ignore_index=True)  # Add it to the dataframe

            current_date += timedelta(weeks=1)

        combined_dataframe.columns = combined_dataframe.columns.str.replace(r' \(\d+\.\d+%\)', '', regex=True)
        return combined_dataframe

Finally examples of how to run the above class:

# Initialize the api class
api_id = 'Your api id'
api_key = 'Your api key'
api = Portfolio123API(api_id, api_key)

#-------------- Examples for each function below -------------------------------

# Download ranks from a ranking system  ------------------------------------
ranks = api.download_rank('Easy to Trade US', 'All-Stars: Greenblatt', '2023-08-25', ['Close(0)'])
print("Ranks from ranking system are:\n")
print(ranks)

# Download data from a universe  ------------------------------------
universe_data = api.download_universe('Easy to Trade US', '2023-08-25', ['Close(0)'])
print("Universe data is as follows:\n")
print(universe_data)

# Download ranks over multiple dates!
# Note that this function adds future universe alpha columns and future 1w, 1m, and 3m changes that are based on Monday open to Monday open.
# Change if you want another time or do something fancy like open to close or the like. It is defined in the function in the class
start_date = '2017-01-15'
end_date ='2017-12-24'
univserse = 'Large Cap'
ranking_name = 'All-Stars: Greenblatt'
extra_formulas = ['Close(0)']
weekly_ranks = api.download_weekly_ranks(universe, ranking_name, start_date, end_date, extra_formulas)

# Save to a pickle which is very fast to load, but not human readable
weekly_ranks.to_pickle('data.pkl') # This saves the data as a pickle. You can load it using: weekly_ranks = pd.read_pickle('data.pkl')

# Save to a csv, slow to load, but human readable
weekly_ranks.to_csv('data.csv', index=False) # To load it again use: weekly_ranks = pd.read_csv('data.csv')

Jrinne · August 26, 2023, 8:36am

Dan, Jonpaul, Walter, Aaron, Marco and others,

Thank you and WOW!!! And just to be clear, Jonpaul should be a target audience. He has an Ultimate membership, BTW. I want to be like him when I grow up. More specifically, I mean I want to learn better Python skills, catch up on Bayesian optimization etc. Continue to compare notes in the forum with him and others and be able to contribute. I think this is what machine learning at P123 looks like BTW. Machine learning that will attract the Kaggle crowd, undergraduates in any scientific field, aerospace engineers etc. Maybe a bit in the weeds but: awesome P123!!! And than you.

So, I could probably figure out the API. But for now I use DataMiner to create a csv file and work with it in Jupyter notebooks.

So simple question: Same thing applies to DataMiner? Just to be sure.

The focus of my question, as with Jonpaul on the API: Will it be overnight update of the ranks if I do this on Friday morning (after the updates)?

For clarity, the code I will use:

Main:
Operation: Ranks
On Error: Stop # ( [Stop] | Continue )
Precision: 4 # ( [ 2 ] | 3 | 4 )

Default Settings:
PIT Method: Prelim # ( [Complete] | Prelim )
Ranking System: ‘M3DM’
Ranking Method: NAsNeutral
Start Date: 2005-01-02
End Date: 2010-01-01
Frequency: 1Week
Universe: Easy to Trade US
Columns: factor #( [ranks] | composite | factor )
Include Names: true #( true | [false] )
Additional Data:

    - 1WkTradedRet: 100*(Close(-6)/Close(-1)-1) 
    - Future 1wkRet: Future%Chg(5)
    - FutureRel%Chg(5,GetSeries("$SPALLPV:USA")) #Relative return vs $SPALLPV:USA

Jim

jlittleton · August 26, 2023, 8:55am

Based on this post by Dan DataMiner cannot currently return the daily data.

Dan, Marco, other folks at P123
Maybe P123 can provide a colab file for the downloads like they do for the screener? Or a tutorial or something on how to set it up? The example can include how to write the data to csv. My code above shows how to do this. Feel free to use it if you make a tutorial or colab file. Chatgpt wrote 90% of it for me anyway…

Jrinne · August 26, 2023, 9:18am

Jonpaul,

Thank you. So I can probably train my data which will take a while knowing I can use the API for daily rebalances when the time come.

I am slow but I can usually figure it out. And as you say, ChatGPT can help me nowadays.

Very helpful. Thank you.

Jim