AI Factor Parquet file download

I struggle downloading the AI Factor Dataset parquet files, even the small ones fails after a while. For how long does the download tokens live?

Is anyone able to download the parquet files?

Put screenshoot path how are you trying and I will test it

It’s the Dataset I’m trying to download.

It seems fine. It s downloading one random dataset now of 503Mb. In my case just now download successful

1 Like

My downloads get interrupted, does not matter how small I try to make them, Can’t resume an interrupted download and download mangers can’t handle them either :persevering_face:

It’s only the parquet datasets that I can’t download, I can download all other files on the page. Downloaded a 3GB Factor List file, no problem… frustrating

Did you try in a different PC/ labtop / mobile?

Two PC’s, the same issue.

random question but are you using a VPN by chance? I’ve had connection stability issues (not specifically from P123) when using a VPN before.

Might be an API credit issue. Your download says it requires over 7000 API credits, but your Portfolio membership only has 1000 API credits/mo. (we need to revisit this since the download API costs seem too high)

Nonetheless, if API credits was the issue, you should have gotten a clear error message. So it's still a bit strange.

Still investigating...

On a side note...

What's the reason to download data from AI Factor instead of Factor List?

We are making steps to make Factor List the de-facto way to download normalized data. It will support parquet format and other normalizations that are currently missing (ex relative to industry and sector).

Please let us know the exact reasons why you are using AI Factor so we can make sure all that you need will be included in Factor List

@AlgoMan btw, I added some API credits to your account to test the download (which was successful)

2 Likes

Thanks, not sure what the issue is, it starts to download (super slow), restarts the download over and over until if fails. But if no one else has the problem it must be something on my end.

The reason I wanted to download the parquet file is that I’m making a tool to analyze the normalized data we use for AI Factors. Right now it supports csv files downloaded from Factor List, I want to add support to use the parquet files as well. I have collected all the methods I have tested in Python in to one comprehensive analysis tool to get a good overview.

Will write more about it later this week. Below is some teaser screenshots.

5 Likes

It's just amazing!! If you need a beta tester let me know :laughing:

2 Likes

@AlgoMan Thanks for sharing. Looks like you are always making useful advanced tools. What is your background?

Factor Lists will soon support parquet files. They will become the main way to create and download normalized data. But they will be much more than that: we're getting ready to launch a Streamlit server where we host apps to analyze dataset from Factor Lists (and potentially a lot more with our API).

Our first "app" is a simple "Factor Miner" to test factors independently for alpha (not pairs for example), then pick the best N that are not over a correlation threshold. It's a closed-form, single pass deterministic algorithm. It's a proof-of-concept for our vision for "Factor Engineering", which is currently the missing link in the workflow.

This first app is not very fancy, but it should be able to support hundreds, maybe thousands of factors. It is also the MVP for our new "app infrastructure" which will allow us to expand our toolset in a much faster, and customizable way. We chose Streamlit for it's ease of use, and ability to produce clean UI/UX.

There are several advantages to running the app in our servers:

  • No need to download datasets
  • Ability to use normalized or raw data
  • Should generally be faster than using your own PC/laptop
  • Access to potentially many servers, each with lots of RAM, and GPUs
  • Ability to license your apps and/or get paid for developing apps

It will all be much clearer when we launch the first app. Let us know what you think.

5 Likes

Sounds like a great idea. Like a P123 App Store.
Would we be able to store analyzed data on the server?

The Stream lit framework is probably great choice. I tried to build an online app for data analysis using React and run on the host, close to impossible, too slow.

Looks like Tkinter. Very nice :slight_smile:

This reminds me of my time at university. I spent hours developing simple tools. I hated it back then. Now, with AI help, it is far more pleasant.

2 Likes

Yes, but I regret committing to Tkinter for this project, very limited for visual presentations of data.

Can you elaborate? You mean the results of the analysis (stats, charts, etc)?

FactorMiner, our first app (technically it's our second if you count DataMiner which uses Tkinter) allows you to test thousands of factors and select, for example, the top 50 based on alpha and correlation parameters. It's a factor engineering tool.

The workflow is like this:

  • Factor List is used to generate the dataset in a shared location inside our LAN. The dataset contains all the metadata (factor formulas, names, settings, etc) used to generate it.
  • The App can access the dataset and generate whatever results it's designed to do. The results are saved, and can be named and annotated.
  • The App can run multiple analysis per dataset with different parameters.
  • Results can be compared and downloaded.
  • Future versions will have quick ways to create, for example, a ranking system from the best 50 factors.

Apps will have access to storage and API endpoints, so they run relatively sand-boxed. Many details have not been worked out. For example what if the app requires massive resources? How do we manage the storage? Development will depend on the interest from community to develop third party apps, or if they are all centrally controlled by us.

Thanks