Extractive Question Answering

Extractive Question Answering is a product based on Natural Language Processing technology where an AI model is trained to answer questions within a given context. NLP technology engagement improves the product-user interaction by “helping” the product to understand the user's request and produce the best matching answer.

Our client is a US-based FinTech company that helps customers to choose where to invest money. This company provides its customers with weekly and monthly reports via email (or other convenient channels) on domains and topics customers are interested in. Every team of specialists, who make reports for investors, is responsible for a particular domain. Some work is done manually, and it is almost impossible and resource-intensive to automate it. The client's request was to identify all time-consuming, routine tasks and leverage the power of AI to automate them without harming the quality. For instance, specialists spend much time fulfilling standardized fields in reports. Every time they need to find answers to the same questions in a regularly updated database of documents. Therefore automating this process will speed up the report creation.

Solution

AI-based Extractive Questions Answering module helps specialists with routine tasks (such as finding answers for predefined questions).

How it works

The company has a database of documents for various domains, which is regularly updated. These documents contain information that needs to be extracted and presented in a particular format (e.g., answers to FAQs). Previously specialists manually searched for information and formed it into answers. However, now the AI module finds requested information and forms sentences independently. Also, based on in-house metrics, AI determines the validity of the findings.

The specialists choose their domain(e.g., Money investment) and the topic they are working on (e.g., Binance).

Then they see a list of predefined questions (which vary in topics), but specialists can change them if needed, add new questions or delete the obsolete ones. After establishing the required questions, specialists press the START button.

The documents are filtered by topic, and the AI module starts processing these documents.

After answers to all questions are generated, they are gathered in a table. All answers are classified by quality (trustworthy, doubtful, or harmful), so the specialist can evaluate answers and trace back them in the texts where answers were found to check if they are valid.

The specialists analyze AI-generated data sets and, based on the results, create the final reports for investors.

Our challenges:

Questions formats

Some questions, which were supposed to be processed automatically, were too “human” for the Machine Learning model to understand. So we were experimenting with paraphrasing. As a result, most of the questions were adapted for ML. Some questions have unique processing logic to handle edge cases.

Tables in dataset

When PDF documents were converted into TEXT and put inside the database – all tables formatting were lost. So we had to detect all such cases and make notes for specialists on what PDF file they should check the information.

Create easy to use a traceback system for specialists to check answers

As the customer already has a working system and we were integrating through API, it was tricky to implement “debug mode” for the specialists to check doubtful answers.

Project stages

Discovery

Proof of concept

Development

Testing

Delivery

Description:

In the first stage, we made a project with no AI but only a human analyst without domain expertise. The idea was to identify how difficult it is to find the information required for report preparation for non-specialists (because the AI model is also not a specialist). Also, in this stage, we checked questions to verify if they weren’t too general, too specific, or maybe too complicated and ambiguous.

Description:

This stage included data quality assessment and initial AI experiments to build a simple model that supports a limited number of questions.

Description:

We have done experiments with different AI models and approaches related to text extraction. As the outcome of this stage, three different models with complementary behavior were optimized and combined into an ensemble.

Description:

The client allocated some staff for testing a new approach before integrating the developed solution into the business process. After the AI module proved to be useful, the client proceeded to use it more extensively.

Description:

The ready-to-use system was provided to our customer as a separate service with corresponding documentation.

Summary

Extractive Question Answering is now a part of the customer's pipeline for making reports for clients. On average, more than 85% of the questions are defined as trustworthy. Most of the specialists’ manual work with such questions is automated due to AI engagement. As work with reports is sped up, our client is able to increase its customer base.

Head of Marketing