Can we have a page just about P123 functionality and tips on coding (including DataMiner)? A complete page with no need to search the rest of the forum, please?

P123,

You don’t like machine learning that is clear. P123 staff encourages and gives methods to continually optimize on their site. That is the preferred method and posting about that is fine.

P123 is even working on adding parallel-processing of optimizations, making it clear that this is the method they want to facilitate. Maybe we can get 24/7 continuous parallel-processor optimizations with a bank of GPU processors soon. Something I know Marc Gerstein would have been proud of.

It is discouraging to see repeated questions about why overfitting occurs and very serious questions about how much performance degradation one should expect out-of-sample, in the forum. As if we do not know the answer to this already.

We know how much decline in performance the average P123 member and designers can expect out-of-sample: Back to below the benchmark in excess returns over the last 2 years is the clear answer with a good sample-size of experienced members. We already know the answer to this.

I guess we have MOD() as solution but not much more. And any other discussion is discouraged. Even random-seed which would augment this method has not been encouraged.

I find it sad if nothing else. Not a good way to start my morning during my rebalance.

If P123 does not want machine learning in the forum that is fine. I will not post machine learning. Not a problem. But I am troubled seeing recommendations for endless optimizations with little about preventing overfitting. And P123 even discourages discussion of well-established practices for addressing overfitting now. All the while seeing the results of this with the designer models.

It is just sad to watch. And honestly, after seeing what is happening with the designer models, I do not think it is even ethical to silence discussions about methods that prevent overfitting. It disgusts me to see i happening in real-time.

I wonder if I could have a page where I can find just changes in the functionality of P123. Where I could look for an answer to the below question, for example. Then see if I can find something more uplifting in the news.

Seriously, just P123 functionality changes if you are going to start selecting content, please.

I haven’t seen any of this, but maybe I haven’t been paying enough attention. As far as I know, nobody on the P123 staff wants to discourage methods that address overfitting. Perhaps you can give an example?

I think ML looks really interesting and would like to test it, but the learning curve looks really steep. So far it seems easiest to wait until P123 releases its own ML tools before I start working on it.

Do you have any suggestions for relevant introductory materiel, books or websites?

Jim, we’re trying to launch our AI/ML factors this month in beta. It’s a huge deal for us and very big, costly project. Not sure why you say that…

1 Like

Yuval,

So I got a message while I was talking about min_sample_leaf_size that said my message was not on topic.

So Walter had made a nice joke and I responded to it and put it into that context. So it was a little long for that reason but I was just trying to be nice and generally positive.

My post dealt with “resolution” which is a concept I use to understand overfitting. Or specifically look for details that really just are not there—especially with random forests this can be visualized.

But the point that is pertinent if P123 wants to do random forests with its AI is that in my experience someone needs to use really large (huge really) minimum leaf sizes for random forest with stock market data. That can be calculated by the number of stock in you port (or the resolution you need).

I have truly been trying to support machine leaning at P123. I can support that with recent quotes.

This happened again recently when Duckruck suggested principle component regression was useful. You also suggested at the time his post was not pertinent to the thread.

I do not think you personally do have much use for regressions or machine learning in general. Correct me if I am wrong on this.

I think your belief are sincere and nothing wrong with what you do. And it does not matter what I think

But a features involving machine learning—even improvements in Mod()—will not be coming from you I believe. Correct me if I am wrong on that.

But even backing up and taking the the view from 30,000 feet I very much appreciate Dan taking the time to find out the downloads needed for machine leaning after 10 years. And rediscovering that ranks are sufficient. I mean that with all sincerity. Machine learning skills are probably not in his job description. He took the time.

Finally the fact the train/test split was first mentioned by Judith and not taken up as something potential useful means P123 is just not machine learning friendly even if wants to try to develop machine learning. There just is not the knowledge base or understanding.

I thought Marco wanted to do some machine learning. Maybe he still does. Maybe whomever he has hired has all of the answers that will work on day one.

Even if she does some things will have to be discussed in the forum with some level of understanding

Yuval and others, I like your methods. Me personally I would add a train/test split as Judith suggested. That is not my concern in the end. We should all have more latitude to do as we wish going forward.

Even if everyone at P123 is in fact neutral about machine learning and I am just being sensitive about being told my comments were not on topic, somethings will need to change if Marco wants it to work.

Also, however anyone perceives this I will take a holiday from posting machine learning techniques. I want it to work and I will not nitpick.

I would appreciate the download Dan is looking into if it turns out to be feasible. I can do advanced machine learning exactly the way I like it and that will keep me occupied.

I hope the project is a success.

Jim

Jim,
“Can we have a page just about P123 functionality and tips on coding (including DataMiner)?”
The Knowledgebase has articles on each DataMiner operation and the API endpoints here.
The examples I gave last week for the data_universe and ranks operations are in those Knowledgebase articles.

I would be happy to write a Knowledgebase article that gives the details on how to gather the data from P123 to use in ML so anybody that is interested can find all the information in one place. I actually already had this on my list of things to do after realizing that a very experienced user such as your self was unaware of how to use DataMiner to gather data for ML.

Other than the data_universe and ranks operations in DataMiner, what other things have you had difficulty with when trying to gather data from Portfolio123 to use for machine learning? It would be very helpful if I had examples of the things that gave you problems or things you were not able to figure out. I can cover the process of gathering the data, but not anything related to how to use the data in the various ML tools because I have no experience with ML and because the Knowledgebase is specifically for P123 functionality. I appreciate that you are willing to spend your time to help educate others on these subjects.

“…a page where I can find just changes in the functionality of P123.”
The bottom right corner of the Dashboard has a link to “Recent Feature Releases”. Any significant changes we make are listed there. If we end up enhancing the API to have the capability to return daily data, then there would certainly be an announcement in Recent Feature Releases section. I know you are very interested in this enhancement, so I will try to remember to send you updates on this.

2 Likes

Dan, I would certainly be interested in whatever you can offer regarding the data-miner. I’ve used the API quite a bit but never the miner. I’m curious if it offers something I cannot get from the API. I’ve heard the miner is built on top of the API so I never bothered to look at it but it sounds like it may be useful for getting data out of P123 for offsite processing.

Thanks
Tony

I’m still confused about which operation to use. I want to specify one of my ranking systems, my universe and get historic weekly ranks that occured on the Mondays.

The Rank* operations seem to use an asofdate of the prior Saturday.

What is that good for?

Tony,

You and Dan are way more expert than I am. But as near as I can tell DataMiner might be just for people not that great at manipulating data in Python (like me).

For people who actually like to see the data in a spreadsheet and maybe even do some of the manipulation there. Maybe sort the data, make sure it lines up etc.

This is stuff you probably naturally do with Python. I do understand that Python has a nice-liking Pandas DataFrame output to inspect things visually and I use it. But i like to see it in a spreadsheet and that really lake of confidence in my munging (that is a word commonly used with programmers?) more than anything.

I suspect you may continue to get most of what you need with the API and save a few steps. And obviously, you could get a file with the API and write it do you desktop without DataMiner.

When I start doing daily rebalances I will probably want to figure out the API (which you have already done).

Obviously, Dan can add to this.

Jim

Tony,
As you said, DataMiner is built on top of the API. So anything you can do DataMiner could be done in the API if you have coding skills.

But even if you do have coding skills, DataMiner could be useful for things you dont need to do very often and save you time vs writing API scripts. For example, the data_universe operation in DataMiner has start date and end date parameters. So it makes it easy to retrieve the data for a date range and automatically write it to a csv file. The data_universe endpoint in the API has a AsOfDt parameter. So your API script has to generate the dates and loop thru them and write the results to a file. Not difficult for a programmer, but why waste the time writing it unless you were going to use it often or needed full automation of your process.

The best way to see what DataMiner has to offer is to look at the articles in the Knowledgebase.

Walter,
Since you are using an existing ranking system, then the rank operations is the one you want to use.

DataMiner uses ‘Saturday’ As Of Dates as input. Like → Start Date: 2023-08-05
‘Saturday’ just means it is using the weekend data. This is the same data as you would get if you rebalanced on the website on Monday morning.

On the Ranks page on the web site, set the As Of Dt to Monday 8/7/23 and run it. Notice that when it is done it shows ‘As Of is 8/5/2023’.

But this is specifically for the case you described where you want ‘historical weekly ranks that occurred on Mondays’. If your case was that you wanted current weekday rank data as of today 8/11 then the API is not the same as running it on the site because the API has only weekly data (for now).

Thanks! That was the source of my confusion. I read the asofdate literally.

I do trust, but I will verify!

Update!

Dan, I do see the behavior you described regarding rank data and DataMiner d/l.

Thanks!