Gemini Copilot, Perplexity, and ChatGPT

Whycliffes · May 19, 2024, 3:59pm

I actively use AI to understand and get explanations for various P123 ranking systems and codes. I typically use Gemini Copilot, Perplexity, and ChatGPT. ChatGPT 4o has particularly impressed me. It has successfully created simple ranking systems and provided detailed explanations for individual nodes in more complex ranking systems.

Is there a place online or in a document that contains all the "codes" for P123?

There is a good overview here: Portfolio123 Documentation and P123 Codes, but I would like a comprehensive overview collected on a single page or in a document.

In other words, I want both what is listed here P123 Vocab and P123 Codes and everything in the "Full Description."

If I have that, I can use a LLM to train it on my own personal document described above, so that it can better understand the P123 language.

Jrinne · May 19, 2024, 6:33pm

Thanks Wycliffe's,

I have been impressed with ChatGPT 4o also. I also signed up for Claude 3 opus because of its greater context abilities for large file uploads.

ChatGPT is limited to about 4,096 tokens (about 8 pages) for conversations and the uploaded documents.

Claude 3 has a token limit of 8,192 tokens for conversations which is double but still may not serve your needs if it were limited to that for uploads. It is highly praised regarding the ability to keep context for large uploads on the internet. They say you can upload all of "The Great Gatsby" and it will "keep context", for example. Coders love it because it keeps "context" for long code.

Here is what Claude 3 says about context for uploaded documents:

So still a little unclear to this non-programmer. I do know that I can upload multiple papers on the same topic into Claude 3 and get a great summary of the salient points of all of the papers combined into one summary.

So if P123 does give you a long PDF document that you can upload it may (or may not) work better in Claude 3. But maybe.

Jim

SpacemanJones · May 20, 2024, 3:40am

deleted. I just realized I posted a link you already reference in op. sorry.

Whycliffes · May 20, 2024, 5:23am

I was able to download the sites described above, as also the "Full description," and merge them into one file with a use of CMD-prompt.

Its a raw file, not sorted, and the loss of CSS design makes it less beautiful.

Here it is, if anyone can use it: merged_output.html - Google Drive (16MB)

abwillingham · May 20, 2024, 12:56pm

I would love this too. I unsuccessfully tried to screen scrape all that documentation into a single PDF last year for offline viewing. I didn't spend much time on it.

Now we have an even better reason to consolidate it... AI training

Michael7 · May 20, 2024, 2:26pm

I agree, such a tool would be great! @Whycliffes : Thanks for the raw export. I tried to clean it up a bit using the BeautifulSoup Python library Here is the code if anyone is interested:

from bs4 import BeautifulSoup

def extract_text_from_html_file(multi_html_file, output_txt_file):
    with open(multi_html_file, 'r', encoding='utf-8') as file:
        content = file.read()

    # Split the content by the closing HTML tag
    html_documents = content.split('</html>')

    # Initialize an empty string to hold all the text
    all_text = ""

    for html_doc in html_documents:
        if '<html' in html_doc:  # Check if there is an HTML tag in the segment
            html_doc += '</html>'  # Add the closing HTML tag back
            soup = BeautifulSoup(html_doc, 'lxml')
            text = soup.get_text(separator='\n')

            # Split the text into lines and filter out empty lines
            filtered_lines = [line for line in text.split('\n') if line.strip()]
            all_text += '\n'.join(filtered_lines) + '\n\n'  # Add a newline between different documents

    # Write the combined text to the output file
    with open(output_txt_file, 'w', encoding='utf-8') as file:
        file.write(all_text)


if __name__ == "__main__":
    multi_html_file_path = '/merged_output.html'
    output_txt_file_path = '/merged_output.txt'
    extract_text_from_html_file(multi_html_file_path, output_txt_file_path)

In general, I think a good and simple way to build such a LLM portfolio123 knowledge database application would be a RAG (Retrieval Augmented Generation) architecture. This architecture is particularly useful for question-answering (Q&A) chatbots. Maybe I will give it a try

Whycliffes · May 21, 2024, 3:31am

Here is the butify version: beautified_output.html - Google Drive (25MB)

Whycliffes · May 25, 2024, 3:58am

Thanks, I was not aware of this difference and see that there are limitations right away when I start with very large files, but isn't it a problem that it does not have internet access?

I am also trying my best to learn Python, currently just to retrieve some options data from Yahoo, but what do you think is the best and easiest LLM to assist with coding in Python?

Jrinne · May 25, 2024, 7:46am

His,

I have gone back and forth a little. But I really like ChatGPT 4o a lot and maybe it is my overall favorite now. It is more natural, faster. Maybe the Python is better. It gave me a pretty esoteric porgam for determining how fat the tails of stock returns are recently with no errors. Just pasted it into Jupyter notebooks and it ran. Actually,ChatGPT ran it internally and gave me the answer first. Claude 3 has been making some Python errors recently. While I am not 100% sure which is best for the shorter Python programs that we use--where context is not important---I think you could not go wrong sticking with ChatGPT 4o for learning and doing most Python related to P123 data..

Also I note for the discussion of context that ChatGPT 4o has added 'Memory" across conversations and within conversations. This is, in essence, an attempt to keep better context across conversations and also within conversational. It will remember what it thinks are salient point throughout a conversation now. Not forgetting what it said an the beginning of the conversation—which it used to do.

So maybe not a problem if you provide the files but ChatGPT 4o does a good job of searching and then understanding what it has searched in a conversation, I think. Not just giving links at the end. Pretty important as you note, and done well with ChatGPT 4o I think.

Context does seem to matter for large amounts of data though. I do find it nice to upload a a bunch of files on a topic and have Claude 3 integrate all of them into the discussion. I uploaded a bunch of papers on selecting stock factors from SSRN. It distilled the ideas from each paper and seemed to know which ideas were important for each question I asked. I use some of what came from that discussion with P123 now. It was impressive.

My thinking in this thread was only that if P123 wanted to upload several large files or we had access to P123's PDFs it was possible the large context capabilities of Claude 3 Opus would be helpful for that specific purpose.

But I think if I had to use just one LLM it would be ChatGPT 4o for now. Sometimes I like a second opinion when one of the LLMs doesn't perform well so i might not end my subscription to Chaude 3 just yet..

Jim

Michael7 · May 25, 2024, 9:01am

Hi there,

I'd like to add my two cents.

I can fully support Jrinne's statement that chatGPT 4o is a very good choice to get help with programming tasks. This thesis is also backed up by the current ranking on the ChatbotArena Leaderboard (https://chat.lmsys.org/?leaderboard). For the category "Coding", GPT-4o is currently ranked #1. The knowledge cutoff is 2023/10, but you have the option to let chatGPT search the web for more up-to-date information.

Personally, I use Visual Studio Code with the AI assistant Cody (Cody | AI coding assistant) for my Python programming. The great thing about Cody is that it allows you to test different LLMs for your programming tasks. Supported LLMs are chatGPT4 turbo, chatGPT4o, Claude Opus, Claude Sonnet and much more.

Hope this helps.

Whycliffes · May 25, 2024, 10:27am

It seems the free version only allows access to Claude, is that true? Im using Visual Studios.

Michael7 · May 25, 2024, 10:34am

Yes, it's true. For full access to all LLMs you need the paid Pro version. It costs $9 per month. I think that's a fair price for what you get.

yuvaltaylor · May 25, 2024, 6:17pm

ChatGPT 4o does perform searches on the Internet to assist with gathering information.

Whycliffes · May 25, 2024, 7:00pm

Yes I know, I meant Claude.

Jrinne · May 28, 2024, 7:43pm

Seems like there is some published data that supports this impression: Financial Statement Analysis with Large Language Models