Civil engineering documents categorization

Civil engineering documents categorization

The purpose of the civil engineering documents categorization system is to ensure that all documents related to a construction project are properly categorized and filed in a way that makes them easy to find and reference. This helps to avoid confusion and delays during the construction process. Moreover, it ensures that everyone involved in the project has the information they need to do their job. There are many different types of civil engineering documents, which can be categorized in several ways. A common way to categorize such documents is by the type of project they are associated with. For example, construction documents might include plans, specifications, and other documents related to the construction of a new building or other structure. Maintenance and repair documents might include manuals, guides, and other documents related to the maintenance and repair of existing structures.

?

Our client is a European company from the construction industry. They created a software for managing engineering documentation of various data formats e.g. text, graphics, or binaries. As they need to manage huge amounts of data, they decided to automate some parts of their pipeline. First of all, they wanted to automate categorization, therefore AI was the perfect solution for this task.

Solution

Solve the need for manual categorization in existing infrastructure by adding the AI-powered categorization functionality.

How it works

Our mission was to provide client with a new way of categorizing documents with the help of AI to increase work efficience and save time.

01

The user uploads documents via the user interface.

02

The bot relies on natural language understanding (NLU) engine to understand the message.

03

AI module evaluates categories for each document in a background mode without impacting the overall user experience.

04

After the documents are processed, the user gets a notification on the UI.

05

The user can observe and change the predicted categories if needed.

Our challenges:

The dataset was pretty large

There was a lot of data, so we needed a powerful machine to train the classifier and didn’t have much room for experiments because training sessions were time-consuming.

The dataset for Machine Learning was chaotical and inconsistent

The dataset consists of different formats of data, besides PDF there were graphic files and binaries. Moreover, samples were poorly mapped, but because of the size of the dataset, manual mapping of the data was not an option, so we were doing it analytically.

Support of user-defined categories

Unlike a typical classification model with a predefined list of classes, in the current project it was expected that the system needs to be flexible in order to support adding new categories with just minor fine- tuning.

Project stages

Description:

Through communication with the customer, we realized that the ideas are feasible. We explained to customers what type and amount of data are required for the AI training process and how it can be collected and organized considering existing capabilities on the client’s side.

Description:

This stage included data quality assessment, exploratory data analysis as well as initial AI experiments to build a simple model that supports a limited number of classes. This model was used by the client for internal testing in order to validate product concept and AI capabilities on a limited user group.

Description:

Our AI engineers upgraded the system by adding more predefined categories and improving overall recognition accuracy and the number of supported file formats. Together with the client, we have designed a continuous learning flow based on the Human-In-The-Loop concept.

Description:

The system was sent to the customer as a separate plugin with corresponding documentation.

Summary

Civil engineering document categorization is now part of the customer's pipeline for managing construction documentation. Final overall accuracy exceeded 90% and will be improved by continuous learning based on the user-generated feedback (validating and correcting AI predictions).