What is intelligent document processing

IDV - intelligent document processing: entry into hyper-automation

For a number of years now, banks and insurance companies have been more or less determinedly pursuing the digitization and automation of their value chain. In many cases, however, initiatives and projects were primarily applied to the interfaces and processes on the customer side (e-banking, reporting of claims). In the last two to three years, however, banks in particular, but also insurers, have started to automate more and more of their repetitive core business processes on the non-customer-related side of their value chain.

by Dr. Dennis Imer and Frederike Sturm, Synpulse

The limit of simple and fast automation of repetitive core business processes, e.g. through Robotic Process Automation (RPA), is often reached when artificial intelligence (AI) components are required for complete automation of the process. This is often the case when unstructured data is available as input for automated further processing. Intelligent document processing (IDV), as an important element of the hyper-automated value chain, contributes to expanding the range of applications for automation.

What is intelligent document processing and how does it help?

Intelligent document processing enables banks and insurers to process their forms, letters, e-mails, contracts and other written correspondence in a controlled and automated manner and thus to minimize the manual effort in the areas concerned.

IDV solutions recognize defined and trained document and form categories and extract relevant information from these unstructured and structured formats in order to make them available for further processing in the desired format. "

How does an IDV solution work?

IDV solutions essentially do their work in four otherwise complex work steps.

  1. Character recognition: In the first step, scanned PDF documents and forms are made machine-readable - i.e. converted into text files in order to be able to process them further in the following steps. This step is called Optical Character Recognition (OCR) and works - as the name suggests - through the optical analysis and recognition of the characters.
  2. Categorization: In the next step, the contents of the created text files are loaded in order to categorize them. Depending on the IDV solution, the documents are "read" by an AI or categorized using optical features. A combination of both approaches can further improve the result.
  3. Data extraction: In the third step, the information relevant for further processing is extracted from the categorized documents. Either AI, structural or a combination of both components are also used for this.
  4. Further processing: In the last step, the extracted information is stored in a database or on a network drive for further processing or transferred directly to another system (e.g. the existing system of an insurance company) via interfaces.

Should errors occur during intelligent document processing, it must be ensured that these do not drag themselves through the entire process and that incorrect data are ultimately transferred to a target system.

In order to prevent this, IDV solutions are equipped with mechanisms to detect and control errors. The controlled cases are checked by people, the data supplemented or corrected and sent back to automated further processing with just a few clicks. "

Reasons for modulation can be, for example, an uncertain result in the handwriting recognition due to poor scan quality, an uncertainty of the AI ​​when assigning a document to a certain category or an uncertainty about the association of an extracted value with a certain type of information (e.g. insurance number).

Below is a list of various use cases in which IDV solutions ensure end-to-end automation:

  • Incoming mail sorting and data extraction for manual further processing
  • Extraction of ID data
  • Reading of forms (e.g. SEPA direct debit mandate, change of address)
  • Reading out invoices
  • Data extraction from insurance contracts
  • Reading of content from free text e-mails
  • Data extraction from garnishment and transfer orders



The advantages of intelligent document processing

Using an IDV solution has several direct and indirect advantages.

Reduced processing time

With IDV, banks and insurers can automate the entry of their sometimes complex and manual processes - such as incoming mail processing. In this way, incoming documents, forms and customer inquiries can be received in real time by post or digitally and prepared for further processing. In a further expansion step, IDV solutions can be combined very well with RPA.

Employees only have to intervene in individual cases and verify or intercept results. This in turn means that the processes are faster and, at the same time, less prone to errors. "

As a side effect, internal capacities are freed up that can be used profitably elsewhere, for example for processing more complex applications.

The IDV solution has a clear advantage, especially during peak times, as it is in operation around the clock. In the event of a permanent increase in the volume to be processed (new use cases, growth, etc.), the implemented solution can be scaled. This in turn means that employees are relieved and there are fewer manual processing errors.

Artificial intelligence creates a head start

Depending on the application, better results can be achieved with the use of artificial intelligence in categorization and extraction than with the use of structural or rule-based solutions. So can a rule-based solution for the identification of the word estate categorize the document as a special conditions document, as the tool associates a discount. However, through training, the AI ​​can identify that a case of inheritance is being processed. The AI ​​learns the categorization and extraction of relevant data on the basis of in-house documents and forms and is thus ideally tailored to the individual requirements. This leads to the fact that more reliable results are achieved, which are retained even when new forms or document types are added by adjusting the training set. This eliminates the need for time-consuming adjustments to regulations and their documentation.

Low implementation hurdle

The implementation of IDV solutions is usually a relatively quick and inexpensive undertaking. The typical use cases can be implemented under all infrastructural requirements within a few months. In connection with the typical project, license and operating costs, profitable business cases can be created very quickly.


Project example: mail returns

In the following, we will use a project example to show how an IDV tool introduction works.

Initial situation:

At a German retail bank, the task was to first digitize mail returns, i.e. returned bank forms and standard letters, using an in-house scanning process and then extract relevant information from the documents for further processing. Since further processing is automated with the help of Robotic Process Automation and different data had to be extracted for further processing for each type of document, the documents were first divided into different categories (for an overview of the individual categories see Table 1). The IDV tool should be integrated as seamlessly as possible into the existing process, from scanning to document archiving.

Author Dr. Dennis Imer, Synpulse
Dr. Dennis Imer is Global Head (website) of Advanced Analytics & Data Management at Synpulse. In this role, he develops topics related to intelligent document processing and data management strategies and oversees projects in these areas. Before that, Dr. Imer was responsible for the data processing there as Group Data Governance Manager and Group Data Protection Officer at the Swiss private bank Bank J. Safra Sarasin. During his doctorate in biogeochemistry at the ETH Zurich, Imer worked on the modeling of greenhouse emissions.

Phase A and B: Potential analysis and tool selection

IDV solutions can be used in almost all processes in which written correspondence has to be categorized and / or content has to be extracted from documents. Before deciding on an IDV solution, however, the first step should always be to carry out a potential analysis. "

Several use cases are considered here and their relevance for automation is examined. The process volume and the manual processing time saved are important factors in making a decision. The use cases are also used to assess which IDV solution can be used to implement the largest range of existing document types. In order to get the most comprehensive picture possible, the following questions, among others, are clarified:

  • Which processes should be automated with an IDV solution?
  • Which types of documents need to be processed (post, e-mails, contracts, forms, invoices, ID cards)?
  • How high are the volumes to be processed per process (in general: the higher the volume, the faster it pays to use an IDV solution)?
  • In what time frame do the documents have to be processed?

By answering these questions, it is generally possible to quickly determine whether an IDV solution is worthwhile and, if so, which specific IDV solution should be used. For example, while one tool has strengths in categorizing forms and ID cards, another tool is better suited for free-text documents.

Since the expected amount of mail returns contains a significant proportion of semi-structured and unstructured documents, the AI ​​platform from ITyX was selected as the IDV solution. By using AI, better results in the area of ​​categorization and data extraction can be achieved than with a solution that works based on structure or rules. In addition, an AI solution is generally more flexible when adding further use cases, because training quantities can be adjusted quickly.

Author Frederike Sturm, Synpulse
Frederike Sturm is an Associate Partner of Synpulse (website). The trained bank clerk and her team at Synpulse Germany advise financial service providers and insurance companies on the introduction of Robotic Process Automation (RPA) and its intelligent automation stage - from the selection of suitable tools and training to the implementation of automated processes. In addition, Ms. Sturm looks after Synpulse's technology partners in this area.

Phase C: Proof of Concept:

In the proof of concept, the process was successfully automated (good detection and extraction rate). To this end, a workflow was first set up in the IDV solution. This workflow ensures that the scanned documents are sent from an input folder for handwriting recognition, then categorized, then the relevant data is extracted and, at the end, the imported PDF and XML files with the extracted data are written to the desired output folder. From there, the data is then picked up by robots for further processing.

Between 100 and 200 documents were used to train the categorization and extraction models. In the extraction automation, a combination of models and regular expressions (RegEx) - i.e. a combination of AI and existing sets of rules - was used to further increase the extraction rate from the trained documents with the help of rules.

Phase D: project

After the proof of concept, further processes are implemented. In parallel, internal employees are usually trained in how to use the tool. In this way, the most comprehensive possible knowledge transfer can take place.

In this way, simple use cases can often be implemented independently by internal employees. Over the course of the project, users will be able to take on more and more activities within the IDV solution themselves. "

Another important project goal is to embed the IDV solution in a corresponding operating model in order to meet the requirements of external auditors and internal auditing for documentation, IT security and a clear structure and process organization.

categoryExtracted dataNumber of training documentsSTP ratePossible:
Documents / h
Documents / week
Bank cardsdate
Account number (from IBAN)
Card number
Online bankingdate
Online banking key
Bank account number
Customer number
Bank account number
Remaining documentsdate
Customer number
Bank account number
(Credit) card number
Online banking key

Figure 4: IVD solution results. Source: Synpulse.

Results and conclusion

Good to very good dark processing rates (STP: Straight Through Processing) were achieved across all categories. While a rate of 80% was achieved in the most complex case, up to 95% could be achieved in the less complex cases.

With the current volume of documents to be processed per week (approx. 400 / week), the bank even has scalability options, as between 12 and 20 documents can be processed per hour, depending on the category.

It turns out that IDV solutions are an important step towards digitization and automation of the customer and non-customer-related side of the value chain, as they offer increased efficiency with little implementation effort. "

Dr. Dennis Imer and Frederike Sturm, Synpulse

You can find this article on the Internet on the website:

(8 Votes, average: 4,63 of a maximum of 5)
Loading ...

Interesting too