As indicated here we're having intermittent performance issues with AI training. We think it's due to misconfiguration of linear models which are parallelizing operations uncontrolled, i.e. they are basically taking over the entire server.
We think we have a fix but it needs further testing. For now we're disabling linear model validations and predictor training since they affect all the other jobs in the server. This means that a single user running linear models training will cause jobs from other users to take longer, and cost more.
Existing linear predictor models are not affected and you can continue to use them for inference.
Sorry for the inconvenience. We should have a fix soon.