A Journey with GPT-4 to Glean Pension Data (Part II)

Focus on mass GPT use

Mass data processing

The newly developed custom GPTs greatly improved the efficiency at which annual reports can be located and analysed. However, for companies like Financial Canvas, the vast amount of data required for meaningful insights means that despite the assistance of AI tools, considerable user effort and time are still necessary, particularly for tasks like trimming PDF documents.

Python and API Calls

Python Script Development:

Given its outstanding abilities in data manipulation and compatibility with the ChatGPT API, Python has been selected as the scripting language for developing and processing PDFs.

Do I detail actual code and explain?-- or just give overview of use?

The creation of several Python functions designed to identify clusters of keywords and analyze the frequencies within table-like data significantly improved the speed of extracting relevant pages from PDF documents and most importantly removed the need for human intervention. This enhancement made it possible to efficiently process a large volume of annual reports.

API Calls:

After removing the need for a user's input by automating the trimming of PDFs, the next step was to integrate ChatGPT. Manually uploading the PDFs to a CustomGPT instance would be impractical for handling large datasets, such as the FTSE 100.

A more efficient solution that could seamlessly integrate with our current codebase was necessary, specifically through the use of API calls.

We initiated the process by generating an API key, which allowed us to establish a connection with ChatGPT. Following this, a series of tests were conducted to evaluate the functionality and reliability of ChatGPT's response to prompts sent via the API.

The main takeaways from these tests were that the prompts had to be of a short enough token length to allow for enough tokens to be allocated towards the processing and output of a response. 

To solve issues arising from this a further Python script was developed to split prompts down into prompts of suitable token lengths. This allowed for a suitable amount of tokens to be free for the processing and output portion of ChatGPT. 

Following the development of the new code, the next step involved creating a directory containing various annual reports, and then systematically processing these reports using our newly implemented functions.

The output folder contained the successfully trimmed-down PDFs, which now only contained data relevant to pensions. Upon manual inspection, all PDFs were accurately trimmed. Furthermore, issues no longer arose from poorly allocated tokens, meaning that all PDFs were accurately analysed with the desired data being extracted as intended.

Interrogating Data with GPT

To improve the processing of large datasets and prepare the data for use in Financial Canvas’s models, an additional Python function was developed and incorporated into the existing code pipeline. This function is designed to format ChatGPT responses into an Excel format, building off Financial Canvas's pre-existing format to Excel code. This allows for output data to be easily viewable, usable and ready for use in upcoming products such as AllPensionsUK.

To Conclude

The scope of possibilities in automation, from data collection to analysis using AI, is truly remarkable and offers huge Benefits to Financial Canvas. Priming for the release of AllPensionsUK

Data collection by AI, Data analysis by AI, AllPensionsUK coming soon…

Next
Next

A Journey with GPT-4 to Glean Pension Data (Part I)