Linear models temporarily disabled for training

marco · January 2, 2025, 10:13pm

As indicated here we're having intermittent performance issues with AI training. We think it's due to misconfiguration of linear models which are parallelizing operations uncontrolled, i.e. they are basically taking over the entire server.

We think we have a fix but it needs further testing. For now we're disabling linear model validations and predictor training since they affect all the other jobs in the server. This means that a single user running linear models training will cause jobs from other users to take longer, and cost more.

Existing linear predictor models are not affected and you can continue to use them for inference.

Sorry for the inconvenience. We should have a fix soon.

enisbe · January 8, 2025, 11:34pm

Marco - Any updates on this?

marco · January 9, 2025, 2:42am

We have a fix. We'll release it tomorrow. Sorry for delay

marco · January 9, 2025, 2:39pm

Please try it now.

For some reason by default the linear algos run unconstrained in certain parts which affects the entire server. We are forcing them to run single threaded and are now tagged as #singlethreaded. In other words if memory is not an issue use the smallest/cheapest worker.

enisbe · January 10, 2025, 12:41am

Thank you Marco.