We just had about 1h of very slow site. I just want to clarify why things are still a bit unstable. We are trying to tune our distributed network storage CEPH. Some operations are unexpectedly causing excessive network traffic, and are slowing down the website and services.
Rest assured it’s just temporary and all the data is safe. Due to our recent outage we want to get this right and learn from it.
Sorry again. We will try as much as possible to use non-peak hours.
Marco, hi! I ran into an issue after the recent instability. Screens with more than two AI predictors no longer run — it seems like there’s some kind of time-limit restriction kicking in. Before the recent outages, my screen with 9 AI predictors worked perfectly and executed very quickly.
This is probably related to what you described above. I understand you’re working on it, but I decided to leave a report just in case. I’m a big fan of Portfolio123 and everything you guys are doing. Thank you for your product! I hope these issues get resolved soon — good luck with the work!
Can also confirm I’m getting timeouts for every AI predictor I’ve tried to load today and can’t view validation results either (it freezes on loading - or, if the initial results do load, when I try to see results for different quantiles, it fails).
I’m still getting timeouts on every AI factor I try to load predictions on.
Admittedly my models have always taken a while as they’re pretty resource intensive, but I’ve not had this problem before. I’d get a timeout every once in a while but it would quickly correct itself on a subsequent load.
Also still getting weird behaviour with regards to validation results. If I have too many models (like 5+) in an AI factor it won’t let me change the quintiles, just gets stuck on loading still.
EDIT: Seems like predictions are loading again. Still seeing some weird things with the validation results, but improving.
I know it’s after hours so maybe you’re doing some backend work……
I’m still having some issues with AI Factor modelling. I performed validation on a new model and when I attempt to change the quintile in Results from 10 to 100, the request times out, repeatedly. I had similar issues yesterday evening, which then resolved today, then returned back this evening.
Looks like the file system (a CephFS system) for our AI worker nodes is unstable. Sometimes the dataset is ingested in 4 seconds, sometimes 15 minutes. We are still investigating the root cause.
Sorry about this. This week's network crash has surfaced lots of things, so, in the end, it will be a blessing in disguise.