MLNER Service

MLNER (Machine Learning Named Entity Recognition) Service

Internal document text extraction and classification service replacing LLaMA 3.1 with faster, cost-effective ML models for entity recognition and document classification.

PythonspaCyFastAPIDockerONNX Runtime65% Cost Reduction

Project Overview

MLNER is an internal document text extraction and classification service developed to replace large LLM-based pipelines (such as LLaMA 3.1) with a faster, cost-effective, and scalable model. The service extracts key entities from documents such as IDs, financial statements, and utility bills with high accuracy and classifies document types in real time. It was deployed to reduce GPU load while maintaining top-tier extraction performance across various document layouts.

How It Works

Document Ingestion

A scanned document (PDF or image) is received via API.

OCR Preprocessing

Text is extracted using an optimized Tesseract pipeline with language-specific configs.

Entity Recognition

The extracted text is passed to the MLNER model to detect key entities (e.g., name, DOB, ID number, issue date).

Document Classification

A parallel model classifies the document type (e.g., National ID, Passport, Utility Bill, Bank Statement).

Post-processing

Outputs are validated against custom regex and logic rules (e.g., passport number format, date consistency).

API Response

Clean JSON with entity fields and document classification is returned to the client application.

Logging & Analytics

Metadata and inference times are logged for performance tracking and audit.

Key Features

Lightweight Model

Fast inference with average processing time under 300ms per document.

High Accuracy Training

Entity recognition trained on over 200k annotated samples for superior performance.

Multi-Document Support

Multi-document-type classification support for various document formats.

OCR-NER Pipeline

High OCR-to-NER pipeline compatibility for noisy scanned images.

Built-in Fallbacks

Intelligent fallbacks for low-confidence predictions to ensure reliability.

Scalable Microservice

Easily deployable as a scalable microservice with containerization.

Impacts & Results

65% reduction in GPU inference costs after replacing LLaMA 3.1
3× improvement in document processing speed (from ~1s to <300ms)
Enabled real-time identity document processing at high scale
Improved NER accuracy for low-resolution and multilingual documents
Enhanced throughput for high-volume onboarding workflows by 40%
Reduced hardware requirements while maintaining top-tier performance
Scalable microservice architecture for production deployment

Back to Home