Reinforcement Learning From Human Feedback Took Travel AI Tool To Near-Perfect Accuracy

By Stefan Klopp

In late 2022, just before the ChatGPT launch kicked off the current AI frenzy, our development team had the opportunity to experiment with the model. We were a travel publisher with no plans on becoming a tech company. But it was obvious to us that this technology could be used to plan and book trips in a much more efficient, enjoyable way.

Within months we launched an AI tool that travelers could message via WhatsApp. Its accuracy was about 85%. That might not sound terrible, but when roughly one of every six conversations includes miscommunication or hallucinations, what you have is a fun gadget, not game-changing tech.

Thanks to our travel media platform, we were able to attract a critical mass of users, which allowed us to improve performance through reinforcement learning from human feedback. Over the next 15 months, we were able to increase accuracy to 98%, which has enabled us to strike partnerships with major travel brands, win awards and draw in more than a million users. Here’s how we did it.

A helping human hand

It’s helpful when users tell AI when an answer is wrong, which is the simplest form of reinforcement learning. If someone asks for restaurant recommendations in the Pearl District in Portland and the AI includes a recommendation in the Hawthorne District, the user may point out the inaccuracy. But relying on direct user feedback isn’t enough.

We hired five people, most of whom speak multiple languages, to put reinforcement learning into high gear. To date they’ve monitored 1.5 million conversations between users and the AI. These agents catch subtle miscommunications. If a user asks for recommendations of the best kid-friendly resorts in Mexico, the AI might ask to specify the city thinking the user would like hotel rates. But they don’t know yet — they’re just looking for general information.

At this point the agent is able to intervene, manually taking over the conversation and getting it back on track. Then the agent flags and categorizes the issue for a backend fix, which improves the system for an entire category of questions.

Reframing the question

Sometimes inaccuracies are a result of the way the question is asked. To improve outcomes, we needed to improve the quality of the questions. We developed a system that categorizes and reframes questions before they are fed into the large language model. This process assures that we get the most from our extensive site indexing.

Questions about live events initially posed a challenge. A query like, “What are some events going on in Estes Park, Colorado, this weekend?” might find a page about events from two years ago that includes the phrase “this weekend,” causing a hallucination. But what is the user really asking? The timing of the question needs to be translated into a specific date, where “this weekend” becomes “Jan. 25-26, 2025.”

Another challenge is combining questions across multiple messages. Someone might ask for Airbnb recommendations in Vancouver, then follow up with “close to Yaletown.” The underlying question needs to roll in new elements as they are added — “Recommend Airbnbs in the Yaletown area of Vancouver.”

Ping the partner

Site indexing is essential. For in-depth knowledge and real-time information, you need partners and data sources you can ping behind the scenes. Once we improved the ability to accurately identify the intent of the user, we needed a network of plugins to get the data they were seeking for flight times, hotel pricing and exchange rates.

When a user asks a question, our AI categorizes it as a particular intent, sources the appropriate data, and feeds the result into the LLM to deliver the information in coherent, consistent and conversational language. There’s a lot more going on behind the scenes than the baseline ChatGPT, but the user experience is the same and responses are noticeably richer and more accurate.

Creating a plugin for every type of intent is intensive. As you work through it, it’s important to communicate to the user in a friendly way what your AI can’t yet do. A response from the AI saying, “I don’t yet have that capability,” provides a better user experience than a hallucination — and it’s a great way of maintaining accuracy while building out your product.

Stefan Klopp is the chief technology officer at Matador Network, a leading travel publisher and creator of the award-winning AI travel genius GuideGeek.

Illustration: Dom Guzman

TagsAI

Stay up to date with recent funding rounds, acquisitions, and more with the Crunchbase Daily.

Reinforcement Learning From Human Feedback Took Travel AI Tool To Near-Perfect Accuracy

A helping human hand

Reframing the question

Ping the partner

Featured

Q1 Global Startup Funding Posts Strongest Quarter Since Q2 2022 With A Third Going To Massive OpenAI Deal

CTA

A helping human hand

Reframing the question

Ping the partner

You may also like

Data: Tech Layoffs Remain Stubbornly High, With Big Tech Leading The Way

Featured

Q1 Global Startup Funding Posts Strongest Quarter Since Q2 2022 With A Third Going To Massive OpenAI Deal

67.1K Followers

CTA