Is Azure ML Studio perhaps the best for P123 users?

marco · December 24, 2020, 3:11pm

As far as I see these are the main ML cloud platforms where one can train/run models and share with others

SageMaker - amazon
Colab - google
Kaggle - google
Azure ML Studio - microsoft

Azure ML might be just what the doctor ordered. It’s a drag & drop interface for novices with some really cool things like “After being satisfied in the accuracy of the model, Studio makes it easy to publish to model for others to use with a Set up Web Service button.”

And for the more advanced you can still use Jupyter notebooks.

Now, I don’t think drag&drop interfaces are the manna that makes anybody an expert. Far from it. But maybe they are onto something. See image below . Certainly more appealing than a jupyter notebook.

This is the blog I read about it. I will play with it and see.

What do you think?

InspectorSector · December 24, 2020, 3:40pm

Marco - my main objection is that it is not free beyond a certain amount of use. And, like P123, is consumption based. So good luck estimating how much it is going to eventually cost you. Plus I believe you will be operating in the weeds, unlike what I am doing with XGBoost.

Jrinne · December 24, 2020, 4:14pm

These are all great ideas. All of which will be used by multiple member at some point.

Colab is kind of like Jupiter notebooks, no?

Also one could go a long way specializing in boosting (classification and/regression)

So P123 could provide a simple template with default hyper parameters for xgboost—with an optional grid-search of some of the hyper parameters. To be used wherever.

I am not a good coder but I know how to copy and paste good code-like what Steve has done. Or the k-nearest neighbors from a textbook (in a different thread).

Ultimately I might end up using Spyder on my MacBook Pro—probably with some of Steve’s code in there somewhere.

I do not know if that helps. But maybe not put all of your eggs into just one platform basket. Not exclude any of them. Providing simple templates/examples to be copied and pasted.

Jim

InspectorSector · December 24, 2020, 6:03pm

Marco - I am not that familiar with Azure, but in the block diagram that you presented, there is a block “split data 75% / 25%”. You are going to need to understand how that works, or if there are more suitable alternatives. For starters, there is Fred Piard’s concern regarding data leakage. The split data operation really needs to understand that there are multiple stock symbols per date, and that you need to provide a separation between the training and validation data. If your lookahead period is one year for example, then you need to toss one year’s worth of data between the training data and validation data. That is two issues that you are faced with right off the start. i.e. dealing with cross-sectional data, and discarding data between the training and validation data set.

Next is the fact that you are operating “in the weeds.” You are choosing a data split configuration and have no understanding of how your model will perform with a different data configuration. It will be just random luck how your model performs because you have no ability to easily change your data split. ALSO, I am also not seeing anything in the block diagram that relates to the ML topology, and this is something that you really need to be wary of. And again it is pot luck. You have chosen, either explicitly or let Azure decide for you, a model topology that may not be appropriate for what you are trying to achieve. I have already said in the past to not use a DNN configuration for example, because you will simply be memorizing the historical data, not extracting useful predictive information. You are going to spend a lot of time designing and testing one configuration of one model with one data configuration. To do this properly, you are going to have to repeat the entire development multiple times with various configurations to determine whether or not you tripped over a good design or lucked into a specific configuration that gave you good results. This is why I am saying that you will be operating in the weeds.