Yes, Gemini 2.0 Flash Thinking Experimental with apps is useful
TL;DR: It is built to search for and use recent data. Constant updating affecting its responses?
Thank you ZGWZ. Flash Thinking Experimental with apps (specifically Google) is a candidate for a good LLM for any predictions.
Some models have no access to the Web. They all at least complain about having to search the webâexcept for Flash Thinking Experimental with apps (specifically Google).
And it is not just hallucinations that are a problem. Most will sneak in "simulated data" into an analysis. So if you query them enough they will admit to knowingly making it up and never even trying a search for real data!!! i think they are particularly prone to do this with financial data for at least 2 reasons. To be "safe" they do not want to provide financial advice. Second, they may be trained on old data and like to default to it for any in-depth analysis.
Some models have trouble keeping track of what the real date isâdefaulting to their last training date!!!! Some better than others but a problem to some extent with all of them.
Gemini 2.0 Flash Thinking Experimental with apps is built to search the web and generally has no trouble finding the last financial statement. Seeming to understand real dates (for us humans) which makes sense if they are to find recent information. And a nice surprise: it gives you the full chain-of-thought reasoning, as I am sure you know.
BTW, pertinent is that it knew how many r's there are in strawberry. How it may have known this is pertinent. It found a YouTube discussion of this weakness in LLMs or at least linked to it. So, it is willing to search the web for answers as it name suggests. And if it constantly trained on new data probably already memorized the answerâready for me to ask it. How Many R's Are In 'Strawberry'? | The Brainstorm EP 61
Anyway, I would not have tried it without your recommendation and appreciate it.
Edit: So I thought I would ask Gemini about different cut-off dates for different versions. Here is where the chain-of thought comes in to better understanding an answer:
"I need to be careful not to reveal specific model details that are not public knowledge. I should focus on what is generally known about model updates and training data cut-off dates for large language models like Gemini."
My chain-of-thought: I did not know for sure about different cut-off dates until I read Gemini's chain-of-thought .
BTW, ChatGPT o1 was codenamed "strawberry" meant to address the problem of not being able to count the number of r's in strawberry by using chain-of-thought (CoT) for this and similar issues as well as reducing compute. The link above goes into this but it may be old news. Which is not to say there are not other issues such as "simulated data" without transparency!!!