Single factor testing tool developed using the API

Those of you that have been P123 members for a while may remember the spreadsheets I posted with the 20 bucket rank performance test results for a list of hundreds of factors and formulas. I used scripts to run those tests and I have rewritten those scripts using the P123 API and Colab. Colaboratory, or “Colab” for short, is a product from Google which allows anybody to write and execute python code through the browser without having to install Python on their machine.

I am going to share this tool with the community, but first I would like to have a volunteer from the user community test it out to make sure the instructions are clear and there are no access issues. No experience with with the API or Colab is needed since the goal is for anybody to be able to understand the instructions and run the tests. The only requirement is that you have a Google account. If things go smoothly, it should take less then 30 minutes of your time. Who would like to volunteer today?

1 Like

Dan,

Can I load the Google spreadsheets directly into Colab—without using the P123 API in Windows or Linux? I have a MAC at home.

Best,

Jim

I have a google account- will volunteer to try it out if you like-

Thanks Stephen. I sent you an email with more information.

Hi Jim, I dont have access to a Mac to test it out, but I think this will work since everything runs inside Colab. But the scoring spreadsheet is an Excel file with some VBA code - I dont know if there is a way to use that on a Mac.

Thanks to Stephen for doing the test run.

What this script does:
Most users are familiar with the Ranking System Performance test feature which outputs the returns by ‘bucket’ for a ranking system over a specified time period for a selected universe of stocks. This script reads a spreadsheet with hundreds of factors or formulas and then automatically runs ranking system performance tests for each of them and writes the results to a file. Those results can then be pasted into a ‘scoring’ sheet which also has graphing capabilities. The results for each factor are scored based on the slope and correlation of the bucket and its average returns and also a few other things. Users can add other scoring methods if needed. This Excel scoring spreadsheet can be found in the project folder for users to download.

The key setting is the universe. Some possibilities are to run tests with the universe set to:

  • The entire investable universe. Keep in mind that many factors perform very differently for small caps vs large caps.
  • The general universe that you plan to invest in. For example, small caps.
  • Split your general universe into 2 or more universes. Then run the factor tests for each universe. Look for factors that did well in both as that is an indicator that the factor is robust.
  • Use a narrow universe to test factors for a certain sector or industry. For example, you can see which factors work well for banking stocks. But keep in mind that a very small universe is likely to give unreliable results.
  • Create a universe with filters so that it contains only growth stocks. Then run the factor tests to which factors are good compliments for a universe of growth stocks.

Dates are also important. You may want to run the factor tests for certain date ranges and then look to see which factors have done well in all periods. Or which are doing well in the current period. For example, most value factors did very well in the 2000’s but not very well in the last 5 years.

How to use it:
Create a Google account if you do not already have one.

Go to Account Settings, DataMiner & API on the P123 site to get your API Id and Key. If you do not have one yet, then click Create Key on that page.

Open the shared project folder: https://drive.google.com/drive/folders/10P3ZnGVOFQeCjpXx_oFuNHhAprQ8X3Df?usp=sharing
Follow the instructions in the Introduction section of the SingleFactorTests.ipynb file.

Choose the factors/formulas you want to test from the spreadsheet provided or add your own factors. Be aware that your account is allocated a certain number of API credits depending on your subscription level. But you can purchase additional credits for a very reasonable price. Each factor/formula test will cost 2 API credits. Be aware of your API credit balance before running a set of tests because the script does not currently handle the case where you run out of credits while a set of tests is running. If that happens, the script will stop and not write the results to a file and you will have wasted some credits.

Determine the settings you want to use for the universe, date range, frequency, etc.

Run the steps in the Initialization section of the Colab Notebook.

Run the code in the Single Factor Script section.

Locate the results file on your Google Drive. Open the file and copy the results data.

Download the Excel scoring spreadsheet (ScoredFactors_Template.xls) from the project folder to your hard drive. Delete the existing data and paste your results into the Data tab.

1 Like

My goodness this is amazing!!! What a treasure of information. Thank you so much.

I’m currently working on a (long running) project with the P123 API that attempts to automate the “bootstrap” method determining the optimal weighting of each node of a ranking system that yuval had previously described in his methodology. You incrementally drop each individual factor in a RS and run a screen backtest to see the effect on performance and grade how much each factor effects the overall performance and then assign a weight to each factor accordingly. Unfortunately I’m new to python and I’m a father of young children, so free time is sparse and it’s slow going, but I’m eager to complete it and share it with the community. I’ll warn in advance, it wont be nearly as elegant as Dan’s solution.

If you need any more guinea pigs, I will try it and give you feedback.

Thanks
Tony

Thanks Tony. It should be ready to use, but let me know if you have any issues or if the instructions are not clear.

Dan - just wanted to say thank you for this amazing data set. Already seeing some very interesting results.

Jumping in to say thanks as well. This landed at an opportune time for me as I was deep into exploring a ranking concept that was showing some promise. Using this new factor data has helped break through into uncharted territory (for me)!

After playing around with your long list of sample factors, I’ve now moved on to trying to use your colab notebook, but I’m getting an error.

When I try to run the Single Factor script code block, I don’t get any prompt to enter a verification code and the process fails. Any ideas?

It gives these errors:


TypeError Traceback (most recent call last)
in
10 import gspread
11 from oauth2client.client import GoogleCredentials
—> 12 gc = gspread.authorize(GoogleCredentials.get_application_default())
13
14 def WriteToResultsFile():

2 frames
/usr/local/lib/python3.7/dist-packages/gspread/utils.py in convert_credentials(credentials)
59
60 raise TypeError(
—> 61 ‘Credentials need to be from either oauth2client or from google-auth.’
62 )
63

TypeError: Credentials need to be from either oauth2client or from google-auth.

Sorry that the Colab version isn’t working anymore. I can try to fix it, but it will be a while before I can get to it. I am not a developer and the authentication needed for Google Sheets gave me problems and was unstable. The version I personally use is a python file that I run in PyCharm which uses Excel for the input and the output. This version is stable and I have never had any issues with it. I can send you that Python file if you want me to.

When using these ‘single factor’ results to create strategies, please be very aware that you are looking back to see which factors performed well in the past and then using them in your strategy - of course the backtest results are going to be excellent! But there is no guarantee that those same factors will work as well in the future. If you run a simulation based on a system created solely with this ‘single factor’ data, I would say to expect the live system to return about 60-80% of what the simulation returned. I am basing this on my systems created using this method since 2005. That is a big range and the difference will be in how well you do at picking from the top performing factors and choosing factors that complement each other.

Understood re: the Google authentication.

And good points about these factors. What was exciting for me was to come across some factor concepts I hadn’t thought of before that are nicely complementary to some ranking systems I’ve been working on.

Is there a way to run this with a modified universe? I am not too savvy on how to run this thing.

Dan and Max,

I am not sure I fully understand the problems you are having with Colab but it seems like platforms that allow you to upload an Excel csv file successfully are usable for you. I do not have any problems uploading a file into Colab although I found figuring it out less than trivial. With that in mind I wondered whether you had tried this code:

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
print(‘User uploaded file “{name}” with length {length} bytes’.format(
name=fn, length=len(uploaded[fn])))

I then have to click on the Choose Files button (on the bottom of below and not unlike P123 image upload button) which takes me to my computer directory (where I choose the file). It worked just now.

That imports the file. To load it you use this:

import pandas as pd
import io
df2 = pd.read_csv(io.BytesIO(uploaded[‘JASPTrim2.csv’]))
df2

I don’t know if that helps so I will stop here.

BTW, I use a MAC. And downloading a file from COLAB onto my computer was harder than it should be too. If I recall, I think I had to temporally place it in GoogleDocs and download it from there but I think it is possible to do it within the code with a Windows machine.

Dan, thank you for the tool and making it as usable as possible.

Jim


Dan, I would be interested in seeing your excel / python scripts.

Thanks
Tony

Really great!!

I expect a lot of people here are testing the same nodes, is there anyone willing to publish the findings for the tests they have run?

To fix the Google authorization problem and use your Google Sheets replace these lines (this works for me)

Old…
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

New…
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)

Hi, Danparquette. Have you ever tried to recreate something equivalent to this: note to dan - update top123 factors spreadsheet - #3 by danparquette, but with a longer testing period and a newer date?