The P123/Python Array

Jrinne · November 6, 2019, 2:44pm

I think that answers my question. I appreciate the definitive response.

Best,

-Jim

philjoe · November 6, 2019, 2:47pm

lol good luck with learning python man

Jrinne · November 6, 2019, 2:50pm

Okay, man.

-Jim

philjoe · November 6, 2019, 2:53pm

FYI its polite to say thank you when someone helps you. “Not bad” and “i’m impressed” is just condescending.

Jrinne · November 6, 2019, 2:59pm

My apologies. It was entirely sincere or meant to be.

Including when I implied that you must be doing some coding.

Thanks again for the encouragement in my struggles to learn Python.

-Jim

philjoe · November 6, 2019, 3:06pm

OK no problem, let me know if you have any other obstacles in getting the data.

Jrinne · November 6, 2019, 3:41pm

How many screen results can be downloaded per day?

I thought 5. I am not sure how many columns but maybe not 20. So you hit your limit before you complete one rebalance in a day? Maybe? Maybe you can get 2 rebalances in a day?

I can check myself and I am probably wrong. But 5 per day is not fun—and not that useful, assuming I am close on the limits.

Probably wrong all around. But one thing I know with absolute certainty: I cannot code worth $&*#@!

Thank you for giving me an idea that I will definitely check it out.

Best,

-Jim

philjoe · November 6, 2019, 4:19pm

You can do 5 columns plus you get the label column for “free”. Not sure how many you can do in a day… just start trying it and see what happens I guess.

piard2 · November 6, 2019, 7:59pm

@Philip, Thank you on behalf of the 500+ stealth readers. Showvar is a great feature, it was out of my mind in this context.
@Jim, coding the ranking system in a flexible environment should save time for a data intensive study and be partly reusable for other studies. Personally I will take this way. The class Factor in quantopian research environment has built-in methods processing Dataframes as inputs and outputs to build ranks and filters on a stock universe at any point in time: zscore, winsorize, top(n), bottom(n), quantile, etc… They have tutorials with very basic examples as a starting point. But as I said, if you want to do backtests, forget it. Way too long.

Jrinne · November 6, 2019, 9:16pm

Frederic,

If there is one truth in all of this it is that I do not have much coding experience. But let’s assume I could get beyond that.

Also let’s assume you find a good classification model, perhaps a Random Forest, or your method of choice for discussion. Could you use it?

BTW, I see no Decision Tress, No Random Forest, No CART in general, No Bagging of any kind, No Multiple Regression, No Polynomial Regression, No Robust Regression, No LOESS, No" MARS", No Earth, No C4.5, No Kernel Regression of any kind, No KNN, No SVM (i.e., some of the things I have tried elsewhere). Missing it? I think we may have something different in mind for classification (or regression) models. I do wonder if I, personally, belong at Quantopian.

Anyway, assuming I am just missing it, would you be able to download fresh data in the morning (or pull it into a DataFrame) generate predictions from your model for all of the stocks in your universe and sort the predictions (i.e., find the 5 stocks predicted by your model to perform the best over the next rebalance period)?

Cool if you can do that and if so I might give this data provider a try. I only ask because if you do not think you could do that with Quantopian then I will not try as I would be pretty sure I couldn’t if you couldn’t.

I am being serious. To illustrate, it is not rare that I write some Python DataFrames to an Excel file, manipulate the data in Excel and reload it to Python for more manipulation there. Usually I can progress beyond that, with more head-pounding that there should be. But I am serious that I am not that good.

I am not that good with Excel either, BTW. But at least I notice when I mess up, usually. And I can understand the (fewer) error reports.

BUT I CAN MOVE A FILE FROM EXCEL TO PYTHON WITH ONE LINE OF CODE. So my original question does not seem inappropriate to me at this time. It is an honest question/recommendation.

Thanks.

-Jim

sthorson · November 7, 2019, 5:00am

In my experience there are allowed five downloads of 500 rows per day. The number of columns runs about 50 or so if you use “screen factors” as the report. The trick with that is to bundle your desired outputs into showvar functions.

As to this thread. If I understand correctly, you want to take, say, 1000 factors for each stock. Then randomly “bag” them over multiple random iterations. I’m guessing that is to try and find some combination of the 1000 factors that can predict returns.

In other words, you expect some combination of factors to repeat their performance in the future?

Never going to happen…

Jrinne · November 7, 2019, 5:25am

This thread happens to be about using Python to allow people to use new and diverse techniques. I think you misunderstand bagging. But I am not sure that we need to focus on bagging if it is not a topic of interest for you.

Python has a lot of libraries. I would not want to be limited to using just bagging or be forced to use it myself.

I want members to be able to use their own ideas. The institutions are not limiting new ideas to what can be done on spreadsheets.

Is there nothing in Python that could be used to work on new idea? They do have a lot of programmers writing for the language. Some of them pretty good from what I hear.

Is it that there can be no new ideas or that everything new can be done efficiently on spreadsheets that you believe?

If you have one thing that can bring more ideas and more abilities to more members with less use of P123 resources then you should let us know.

No one wants to stop you from doing what you are doing now. I wish you the best with that.

-Jim

piard2 · November 7, 2019, 2:36pm

@Jim,
Quantopian research environement is in Python 2.7 and includes sklearn, based on tutorials and code I have read. I have not yet practical experience with it. I have read another tutorial elsewhere with 2.7 and sklearn using a few classification algos. I don’t think I need all the ML catalog.