LLM-Enhanced Order-Taking System for Quick Service Restaurants
Our partner is a technology company serving quick service restaurants (QSRs). Their primary goal is to streamline order-taking processes, improve accuracy, and scale across different brands and menu structures.
They approached us to explore whether Large Language Models (LLMs) could replace their existing custom-trained transformer models, which required significant effort for adaptation.
Solution
Unidatalab developed an LLM-powered solution that uses advanced language understanding to streamline order processing in real-time environments.
How it works
Instead of relying on brand-specific transformers, the system uses a fine-tuned GPT-4o-mini model hosted on Azure to process customer orders, extracting key entities such as menu items, modifiers, and quantities with high accuracy.
Our team developed a robust prompt engineering framework to guide the LLM in interpreting diverse customer inputs and order patterns.
We implemented multiple methods for calculating confidence scores for each recognized order state and extracted entity. Based on these scores, the system can trigger fallback mechanisms involving human staff for low-confidence cases.
Recognizing the importance of real-time performance, our team identified bottlenecks in the order-taking pipeline and applied optimizations to reduce inference times without sacrificing accuracy.
Our challenges:
Inconsistent accuracy across brands
The partner’s current order-taking system used custom transformer models trained on brand-specific data. While these models worked well for existing brands, scaling to new brands required effort for model preparation.
High latency in real-time environments
For QSR environments, quick response times are crucial. Our challenge was to design a pipeline that, without sacrificing accuracy, maintained an acceptable performance level.
Lack of reliable confidence scoring
Another challenge was ensuring reliable confidence scoring for intent recognition and entity extraction. The existing system already included confidence scoring, which was essential for deferring to human agents in ambiguous or uncertain situations. Our task was to develop a similar approach for LLM-based models: one that matched or exceeded the accuracy of the current system, maintained acceptable performance speeds, and provided confidence scores to determine when to involve a human fallback.
Project stages
In the first stage, we conducted feasibility studies to assess the LLM’s performance in recognizing order state and extracting entities. We designed a data pipeline to process orders, experimented with various prompt structures, and benchmarked results to validate accuracy improvements over the existing system.
The second stage focused on integrating robust confidence scoring. We developed methods to estimate prediction confidence and defined thresholds for triggering human fallback in low-confidence scenarios. This ensured reliable and consistent service quality.
We optimized the latency of the system, fine-tuned configurations, and deployed the solution within a real-world QSR environment. Comprehensive testing ensured that latency improvements didn’t compromise accuracy or confidence scoring.