Find AI is an AI-powered search engine for companies and people. Their app makes millions of requests to OpenAI every week, and warehouses every single request using Velvet. In this post, we'll explore how the Find AI engineering team uses LLM request logs to optimize accuracy, manage costs, and compare models.
Find AI aims to be the source of truth for all people and companies online. Imagine Perplexity, but focused on data you might find on LinkedIn, Pitchbook, and CB Insights. Search a natural language query like "biotech startups focused on sustainability" and it finds a list of accurate results, including a photo, summary, and links to learn more.
Find AI leverages OpenAI to build its knowledge base and answer questions.
Ahead of their public launch, the team sought to store all their requests, responses, and metadata from OpenAI. They wanted to analyze usage and costs, evaluate how different models and prompts performed, and eventually fine-tune some of their own models.
Find AI had an impressive launch. Thousands of new users visited the site to ask questions, triggering millions of requests to OpenAI. They converted free users to paying subscribers who were tired of antiquated search tools like LinkedIn.
Find AI uses four OpenAI endpoints—chat, batch, moderations, and embeddings. On launch day, their system scaled quickly. At peak times, they were sending 1,500 requests to OpenAI per second.
OpenAI costs quickly became a major concern inside the company, exceeding even their cloud hosting bills. OpenAI's invoices contain minimal information, so the company needed to analyze request logs to decode their OpenAI bill.
After launch, Find AI had stored millions of requests in their PostgreSQL database. They set out to analyze and optimize their system.
Find AI's post-launch goals:
Find AI uses OpenAI to power a variety of features including natural language search, data analysis, and text summarization. They wanted to optimize results as much as possible.
Improve search result accuracy: AI apps commonly feature "👍 / 👎" on results to gather user feedback. Find AI collects these user reviews across their app. When a user gives a negative rating, the engineering team handles that like a bug in their prompt.
"We use Velvet to trace back the inaccurate result to the OpenAI LLM request log, tweak the prompt and parameters to get an accurate answer, and then deploy a fix. It's not unlike how we handle errors in our code," reported Find AI CTO Philip Thomas.
Improve summary quality: Find AI provides a summary explaining why each result is a good fit for the user's query. The effectiveness of this feature is less about accuracy and more about the quality of generated text.
When the team makes changes to Prompts, they use Velvet logs to replay past requests with the new prompts, then compare the output head-to-head. Reviewing text is a qualitative process, so sampling from live requests helps the team confidently deploy changes.
Trace errors and evaluate model performance: On launch day, Find AI saw some occasional errors in their logs. With logs warehoused and queryable, they were able to trace events to internal server errors at OpenAI.
Moving forward, the team can granularly query their OpenAI usage and uncover insights. For example - though OpenAI batch calls promised 24 hour response times, most queries completed within three hours in production.
The Find AI engineering team has a robust set of structured inputs and outputs to evaluate prompts, improve quality, and monitor usage over time.
Find AI needs to measure and manage the cost per query, while maintaining accuracy and speed. Using logs, they can run granular cost analysis using Search ID meta tags and other system-specific parameters.
We'll walk through a few example queries from Find AI's launch day.
Average cost per query: Each user search in the Find AI app makes a variety of calls to OpenAI. The team measures average costs and identifies outlier high-cost searches.
Cost per service: Find AI divides its prompts into 'services', which are labeled as parameters on each call to OpenAI. They want to identify the highest cost services and then optimize prompts to reduce spend.
"Shortening prompts can decrease costs dramatically, but it's only worth it on high-frequency services," said Philip.
Cost per model: The team wants to understand the difference between running the same service on different models. How does each model impact speed and cost?
With this data on hand, the team can run experiments to optimize costs. They can modify inputs, implement batching and caching, evaluate different models, and prompt end users to interact with the system differently.
FindAI wants the flexibility to switch between models and fine-tune their own models.
"We use LLMs to make decisions or generate text. For models used to make decisions, we can replay the Velvet request logs across different models or vendors to evaluate their comparative accuracy," said Philip.
As Find AI's logs increase, they're building a data set they can use for finetuning.
"We can bootstrap a data set from OpenAI, then pull that data from Velvet to fine-tune a foundation model like BERT. These self-hosted models end up having about a $0 marginal cost, which can improve margin a lot," said Philip.
In the days after launch, the Find AI engineering team credits Velvet as an important part of their launch strategy. Warehousing their OpenAI calls has given them a robust data set for growing and optimizing their AI production infrastructure.
"Velvet enables us to turn our OpenAI calls into valuable data sets. It turns OpenAI from throwaway calls into a cornerstone of a sophisticated AI program with in-house models." - Philip Thomas, Find AI co-founder
Want to try Find AI yourself? Search vetted data on people and tech startups. For example, type “technical founders building AI companies who care about ethical AI”.
Use code “VELVET” for a free month of Find AI Premium. Check it out →
Use our data copilot to query your AI request logs with SQL.
Use Velvet to observe, analyze, and optimize your AI features.
Use Velvet to observe, analyze, and optimize your AI features.