Hi all,
I am looking for some feedback on what to download to explore both linear ranking system optimizations and nonlinear systems without burning all of my api credits.
Here is my plan so far:
- Download api method: Python API rank_ranks to get weekly data per Friday from my ranking system. I can get all factor ranks in the system from this + some extras mentioned below. I will also call this function many times to get multiple dates.
- Ranking system: include factors of interest and the Sector/Industry. While sector rank does not tell me a specific sector it should be relatively constant rank for a large universe so I can use it to split my data into sub-universes if desired.
- Open_D(-1): this gives next Monday’s open price which is when I will rebalance. I will use this from multiple weeks to also calculate the total universe return for the next week. Using that I can calculate “next week alpha” and “previous week alpha”. Future%Chg is Friday to Friday and I cannot buy a stock in the past (Friday is before I rebalance), so the weekend gap will create a LOT of error in my returns. Thus the need for the Monday open price.
- Market cap: this is mostly just for breaking out sub-universes if I want
- Volume(-1): this will help me calculate slippage
- Spread last week average: this will help me calculate slippage
- May pull other benchmark data from Yahoo finance, not sure yet
My resulting dataframe (or table for those not familiar with Pandas) will have the following columns:
Date, ticker/ID, factor1, factor2, factor…, open_D(-1), mktcap, vol(-1), spread last week avg
Does this sound sufficient to get started with random forest or PCA?
Thanks,
Jonpaul