I don't know if this is a coincidence, but this is now the third day in a row that I’ve started running validations in the European mornings, and it doesn’t work. I’ve tried running a simple linear training model that would normally take less than a minute to complete, or LightGBM models that usually take 10-15 minutes, but after 45 minutes, I have to cancel because they make very little progress (finishing only 2 out of 4 splits). When I restart in the afternoon, it works as expected.
Do the AI servers go to sleep during US nighttime? Is it only me or does others have problems european mornings too?
We may have found the culprit. Linear models have a different way of controlling parallelism which we did not account for. We're going to disable linear models until we patch the backend since they affect everything else that runs on the same server. I'll post something specific shortly.