Reason for some continued website instability

We just had about 1h of very slow site. I just want to clarify why things are still a bit unstable. We are trying to tune our distributed network storage CEPH. Some operations are unexpectedly causing excessive network traffic, and are slowing down the website and services.

Rest assured it’s just temporary and all the data is safe. Due to our recent outage we want to get this right and learn from it.

Sorry again. We will try as much as possible to use non-peak hours.

1 Like

Marco, hi! I ran into an issue after the recent instability. Screens with more than two AI predictors no longer run — it seems like there’s some kind of time-limit restriction kicking in. Before the recent outages, my screen with 9 AI predictors worked perfectly and executed very quickly.

This is probably related to what you described above. I understand you’re working on it, but I decided to leave a report just in case. I’m a big fan of Portfolio123 and everything you guys are doing. Thank you for your product! I hope these issues get resolved soon — good luck with the work!

AI prediction are experiencing a different problem due to some upgrades we released recently. We’ll investigate shortly. Sorry about that.

1 Like

@NikGold can you try now? Which screen is it?

FYI , we’re testing using your screen right now

Can also confirm I’m getting timeouts for every AI predictor I’ve tried to load today and can’t view validation results either (it freezes on loading - or, if the initial results do load, when I try to see results for different quantiles, it fails).

Can you please try it now? We might have fixed it.

I’m still getting timeouts on every AI factor I try to load predictions on.

Admittedly my models have always taken a while as they’re pretty resource intensive, but I’ve not had this problem before. I’d get a timeout every once in a while but it would quickly correct itself on a subsequent load.

Also still getting weird behaviour with regards to validation results. If I have too many models (like 5+) in an AI factor it won’t let me change the quintiles, just gets stuck on loading still.

EDIT: Seems like predictions are loading again. Still seeing some weird things with the validation results, but improving.

Yes, everything is working now, thank you, Marco!

1 Like

I know it’s after hours so maybe you’re doing some backend work……

I’m still having some issues with AI Factor modelling. I performed validation on a new model and when I attempt to change the quintile in Results from 10 to 100, the request times out, repeatedly. I had similar issues yesterday evening, which then resolved today, then returned back this evening.

I am also having issues with AI Factor. Validating models and training predictors are not working.

When I click on the validation status, the progress gets stuck on "retrieving data."

I have tried the Basic, Premium, and Extra30 workers. None of them seemed to work.

Yes, confirmed. AI factor training is often hanging, or generates errors.

Investigating.

Thanks

Looks like the file system (a CephFS system) for our AI worker nodes is unstable. Sometimes the dataset is ingested in 4 seconds, sometimes 15 minutes. We are still investigating the root cause.

Sorry about this. This week's network crash has surfaced lots of things, so, in the end, it will be a blessing in disguise.

Does this instability affect the rankings of trained AIFactors?

1 Like

Important fact by the way

Depends what you mean by "instability".

It's a network/infrastructure issue, so either it 1) works 2) works slowly 3) doesn't work.

Thanks