Personalized speech recognition and document processing in healthcare

Speech AI is a technology mainly used for automatic speech recognition (ASR), also known as speech-to-text. It is designed to convert audio recordings into text. While this may seem like an easy task in a lab setting, its application in real life is much more complicated. An ASR virtual assistant should be able to process human speech in spite of any challenges: from noisy settings to poor recording equipment. Additionally, it should consider variations in people’s voices, accents and dialects, as well as distinguish between different speakers in a conversation. NLP or Natural Language Processing is another AI technology, closely related to ASR. Simply put, its purpose is to make sense of the information transcribed by speech AI. It transforms unstructured data in several stages that include extracting information, identifying classes of proper names and conducting intent classification (determining the context of the conversation). A combination of the two technologies is used in many industries and healthcare is no exception. The potential of the technologies to quickly transcribe, analyze and interpret huge amounts of information, provided by a patient, makes it a perfect tool for the automation of medical documentation processing and insurance billing procedures. What is more, such AI solutions have great business value, including significant cost reduction, faster billing process and increased team capacity for the claim coding process.

Our client is a fast-growing U.S. startup that offers technical solutions to common problems in the healthcare industry. The company wanted to create software that would ease the physician’s paperwork burden and increase the accuracy and speed of medical documentation processing. The client requested to design an elegant solution that would automate the transcription and interpretation of recorded conversations between doctors and patients within the examination room or via telemedicine platforms, as well as optimize the insurance billing process. It had to be simple and accurate, but also effective at detecting fraudulent activity like upcoding or other unauthorized procedures.

Solution

An API integration module built using speech-to-text and NLP that automates medical documentation processing and insurance billing process.

How it works

A user uploads audio data via the user interface.

The audio data is processed and sent to the speech recognition module.

The speech recognition module processes audio and returns the transcribed stenogram (a user is able to review and modify it if necessary).

The audio transcription is processed and sent to the NLP module, which extracts textual data and generates the documentation according to a predefined template (insurance claims forms, EHR/EMR samples, etc.)

A user receives audio transcription, relevant documentation and an invoice.

Our challenges:

Domain-specific lexicon

Due to the nature of the healthcare domain, we had to keep in mind that AI should be tested on relevant data. Because the healthcare lexicon is not commonly used in regular speech, the AI model had to be adapted to work better with healthcare-related lexicon.

Lack of audio data for the healthcare domain

There were not enough high-quality audio files containing healthcare-related speech to test the mode properly. That is why we increased the amount of data artificially by using text-to-speech technology.

Regular speech accuracy

While adapting the neural network model to the healthcare domain, our AI team had to pay attention to regular speech accuracy and design solutions to avoid its reduction.

Project stages

Data preparation

Testing

Development

Delivery

Description:

At this stage, the client team provided the existing audio and textual data. Our team analyzed the data and selected the most representative metrics to correctly evaluate the test results. Additionally, we found other open textual datasets with medical terms and used them to generate more audio data using our Text to Speech service with a number of voices, which we required in the testing and development stages of the project.

Description:

Our AI team tested the existing services that provide similar solutions and compared them with ours. Several scripts were written to automate this process. After analyzing the test and experiment results, we developed recommendations for the next steps of solution improvement.

Description:

Our AI team proceeded to prepare and fine-tune the transformer-based neural network model to better suit the client’s needs. Our software engineers created an API for speech recognition, as well as an NLP module that accepts audio data and returns a transcribed text along with the extracted data needed to fill in the corresponding fields in the insurance claim form.

Description:

The source code of a web service, based on the Speech to Text and NLP pipelines, documentation for the API and neural network models were sent to the client as a separate component with documentation and test reports on the speech recognition and NLP models. Additionally, recommendations for AI pipeline optimization were provided.

Summary

The speech and text recognition application is now a part of the customer’s pipeline to optimize medical documentation processing. This solution significantly automates the transcription and interpretation of patient data, as well as the generation of required forms/invoices. Thanks to our solution, physicians can be relieved of the major part of their administrative duties.

Head of Marketing