P123's Dedicated In-House Supercomputer? LLM for $3,000?

Jrinne · January 10, 2025, 12:29pm

Announced at CES and possibly available in May: Nvidia’s tiny $3,000 computer for AI developers steals the show atCES.

My understanding is that smaller versions of open-source models like Llama (Meta's language model) and FinGPT (designed for financial tasks) could run on this device. I’m sure there are other open-source models worth considering as well.

This seems like an affordable alternative to cloud-based services like ChatGPT or Anthropic's APIs, which can get expensive due to their cost per token.

Could this make tools like financial text summarization or sentiment analysis more accessible for P123 users?

Maybe there are other AI uses for this inexpensive "super-computer" that I haven’t thought of yet.

marco · January 10, 2025, 2:40pm

Few use cases for LLM: headlines only, full-text of the news, and filing. For the first case with few tokens, chatgpt would be cost effective.

Not familiar with LLM training vs inference. Our current AI algorithms are very fast with inference, including NN which requires GPUs for training.

I'm guessing that to train our own LLM we would require much more than a 3K desktop. But if we use a pretrained open source LLM do we need any special hardware?

Thx

bobmc · January 12, 2025, 4:59pm

Marco “Few use cases for LLM: headlines only, full-text of the news, and filing. For the first case with few tokens, chatgpt would be cost effective.”

I’m not so confident that the way LLMs are progressing that within the next couple of years they aren’t going to take over all of our hands on AI/Machine Learning efforts.

Here’s a paper “Financial Statement Analysis with Large Language Models” that makes the claim.

Conclusion:

-Our results suggest that GPT’s analysis yields useful insights about the company, which enable the model to outperform professional human analysts in predicting the direction of future earnings.

-Furthermore and surprisingly, GPT’s performance is on par (or even better in some cases) with that of the most sophisticated narrowly specialized machine learning models, namely, an ANN trained on earnings prediction tasks.

Financial Statement Analysis with Large Language Models | Becker Friedman Institute

marco · January 13, 2025, 1:04am

Interesting paper with a novel way to use LLMs, and that is to feed them tabular data. In other words to assume the role of an analyst doing Financial Statement Analysis (FSA). The experiment is rather simple with only 59 "predictors" (features) from income and balance sheet that exclude price ratios. This tabular data is then used to train different models to predict the direction of earnings.

They then compare the results against 1) human analysts 2) some kind of stepwise logistic regression 3) two neural net algorithm (they call them ANNs).
Results are below. In essence the ChatGPT LLM performed about as well as a NN method that you can do on P123 relatively easily (using deeptables for example).

It's an interesting paper because it's using an LLM in a novel way. They conclude that LLMs now have have "human-like capabilities in the financial domain", and that "LLM's ability to perform tasks across domains points towards the emergence of Artificial General Intelligence"

Not sure I would go that far.

Beating the human analysts is no big feat. All you need is a mechanical, unemotional system to beat a human. And everybody knows that just following, for example, analysts price targets, will not give you an edge. Perhaps it's because of conflict of interest or because they focus too much on the past. And there's also survivorship bias: the good analysts will get a job in a private hedge fund.

So the bar for LLMs to prove themselves at doing FSA should be against the best ML and NN numerical models. But they only used a couple, and looks like a tie to me. Did they do any tuning? And how come they did not use tree algorithms that seem to be better than NN for tabular data? We've all seen better results with these than NNs, have we not?

Still, it's an interesting paper. The data they use is interesting. It's only ANNUAL data. It's a balance sheet with around 30 factors, and the factor's previous value. And for the income statement it's around 30 factors with two previous values.

In other words the LLM is making inferences based on either:

Y over Y changes for 60 factors
Three year trends for 60 factors
Complex relationships between them

Guess what? With our new normalization techniques, and our breadth of formulas, you can do that, and much more. My conclusion is that we can train a model that is far better than theirs, with traditional ML and NNs.

But we do want to start using LLMs somehow. Not sure how to get started yet. I doubt it will be trying to use LLMs for tabular data. We also lack the expertise at the moment. Perhaps I'll start reaching out to paper's authors...

Cheers

bobmc · January 13, 2025, 2:49am

Did you notice they have an App to:

Analyze annual report (10-K in PDF or DOC)

Analyze quarterly report (10-Q in PDF or DOC)

Analyze press release (PDF or DOC)

Analyze Financial Statements directly (PDF, DOC, TXT)

ChatGPT - Financial Statement Analyzer

Didn’t try it out but might be an option to see the LLMs opinion on your potential stock purchases.

Jrinne · January 13, 2025, 9:28am

TL;DR: An LLM is particularly good at creating structured data from unstructured sources like annual reports. That structured data generated by the LLM could then be used as input features in a separate machine learning (ML) model to enhance predictions.

You might want to consider exploring stacking, where the output of a large language model (LLM) is used as an input feature for one of your machine learning models.

Example: Predicting Next Quarter's Earnings

For instance, you could have the LLM predict next quarter's earnings for a set of stocks based on unstructured data, such as:

Annual reports or earnings calls: Extract tone, sentiment, and specific disclosures.
News articles or press releases: Identify risks, opportunities, or sector trends.
Social media activity: Aggregate public sentiment and detect early signals.

Once predictions are made, incorporate them into your workflow by calculating metrics like:

Price-to-predicted-earnings (P/PE): A valuation metric derived from LLM forecasts.
Sentiment scores: Features that reflect positivity, caution, or risks inferred from text.

These LLM-derived insights would become new features in your broader stock-selection framework, augmenting structured data like price-to-book ratios or momentum indicators.

—

Scalable Implementation (Getting Back to the Original Post)

This ties directly to Nvidia’s $3,000 AI computer, which makes deploying such tools feasible without relying on expensive cloud services. Here’s how it could work:

Task-Specific Models:

Deploy smaller versions of open-source models like Llama or FinGPT on these devices.
Assign focused tasks to each machine, such as earnings predictions or sentiment analysis.

Efficient Training and Feedback:

Train each model on curated datasets and refine them iteratively using real-world results (e.g., actual stock performance).

Parallel Processing:

A small cluster of these devices could handle large-scale data tasks, like summarizing financial reports for multiple stocks or sectors, cost-effectively.

—

Benefits for P123 Users

Accessibility: Reduce reliance on cloud-based APIs with per-token costs.
Customizability: Tailor open-source models to specific needs, like small-cap analysis or sector trends.
Affordability: A one-time hardware investment offsets recurring API expenses.

—

Conclusion

The Nvidia $3,000 AI computer opens exciting possibilities for integrating AI into stock analysis workflows. By leveraging affordable, high-performance devices, P123 users could implement task-specific LLMs for earnings predictions, sentiment analysis, or financial text summarization. These outputs would then enhance traditional ML models, creating a powerful hybrid system for stock selection.

This approach seems obvious and highly practical. I would be extremely surprised if it isn’t already in widespread private use. Stacking, however, requires a specific cross-validation method, which should be accessible to any consultant.

marco · January 13, 2025, 2:32pm

Yes, leaning towards this

Right

Which model? Training an LLM model is not realistic for us at this point , no? Even with a bunch of $3K AI computer.

In fact to run pure LLM inference, using a prebuilt LLM, do I even need specialized hardware? A NN model trained using GPUs runs just fine on a CPU.

Thx

Jrinne · January 13, 2025, 3:04pm

TL;DR: I’m not suggesting training a large language model (LLM) from scratch—that’s unrealistic. However, starting with an open-source model like Llama (or another similar one) and fine-tuning it for specific tasks is much more feasible. While you might not achieve this with your current setup, a $3K Nvidia AI computer or a comparable system could handle focused fine-tuning tasks. Broadcom's success highlights how critical architecture is for supporting these models.

—

Fine-Tuning vs. Training From Scratch

You’re absolutely right—training a large language model from scratch is far beyond the scope of most setups. It requires enormous datasets, specialized hardware (e.g., TPU pods or high-end GPU clusters), and months of effort.

However, fine-tuning a pre-trained LLM like Llama for specific tasks—such as financial text summarization or earnings prediction—is much less resource-intensive. This can often be done on high-end consumer hardware, like Nvidia's $3K AI computer. Fine-tuning essentially adapts the pre-trained model to your specific use case by training only a subset of its parameters with focused datasets.

—

Do You Even Need Specialized Hardware for Inference?

For pure inference, you’re correct that it can run on CPUs, especially for smaller models or low-demand tasks. However:

GPUs are far more efficient for inference, especially with larger models, due to their ability to handle parallel operations.
For fine-tuning or larger-scale tasks, GPUs are almost essential because they significantly accelerate matrix multiplications and reduce training time.

—

The Role of Architecture: Why It’s Not Just About the Number of GPUs

I asked ChatGPT: “It’s not just the number of GPUs but the architecture, correct?”

Here’s the response:

"You're absolutely correct—it’s not just the number of GPUs, but also the architecture of the GPUs and the overall system design. Modern deep learning tasks, particularly fine-tuning or inference with LLMs, benefit greatly from advanced GPU architectures optimized for these workloads."

Key Points About Architecture:

GPU Design Matters:

Modern GPUs (e.g., Nvidia A100, H100) are specifically designed for AI workloads. They include:
- Tensor Cores for fast matrix multiplications.
- Large VRAM (40-80GB) to handle large models and batch sizes.
- Support for mixed-precision formats (FP16, BF16) to accelerate training/inference while conserving memory.

System Connectivity:

NVLink enables high-speed communication between GPUs in a multi-GPU setup, reducing bottlenecks.
High-speed NVMe storage ensures fast data loading into GPU memory.

CPU and RAM Considerations:

A powerful CPU is needed to handle data preprocessing and coordinate the training pipeline.
64GB+ of system RAM is recommended for large models, and 128GB+ is ideal for multi-GPU systems.

—

Parallelization: Why It’s Critical

I also asked ChatGPT about parallelization:

Q: “So you need a lot of GPUs and the architecture to ensure high parallelization, correct? Can you quantify the parallelization ChatGPT achieves for an average response?”

A: "Yes, high parallelization is a key factor in the architecture of GPUs and systems used for running or fine-tuning LLMs like ChatGPT. The core operations in transformers—matrix multiplications—are highly parallelizable, leveraging thousands of GPU cores simultaneously."

Highlights:

ChatGPT achieves massive parallelization during training and inference:
- Each forward pass involves tens of thousands of parallel operations across GPU cores.
- High-end GPUs like the Nvidia A100 feature 6,912 CUDA cores and 432 Tensor Cores for processing these operations.
- Multi-GPU systems further expand this parallelization, enabling millions of operations per second.
During inference, sequential dependencies (token-by-token generation) reduce parallelization, but batching requests or using multi-GPU setups helps mitigate this.

—

Conclusion

So, while a bunch of GPUs can help, architecture is critical for efficient LLM use. Tensor Cores, large VRAM, and high-speed interconnects like NVLink are game-changers for both fine-tuning and inference.

For your setup, starting with a pre-trained LLM like Llama and exploring fine-tuning is realistic on high-end hardware like Nvidia’s $3K AI computer. If fine-tuning isn’t feasible, you could still use prebuilt models for inference, leveraging prompt engineering or lightweight adapters (e.g., LoRA) for customization.

-ChatGPT (with the help of a SLOW carbon-based life-form who lacks parallel processing but occasionally has a good idea)

marco · January 13, 2025, 7:09pm

Seems all correct, the LoRA thing too. Looks like we have the beginning of the next P123 AI evolution. Well, first we need to add feature engineering tools to current setup and add lots more documentation.

Thanks!