MLNER Service

MLNER (Machine Learning Named Entity Recognition) Service

Internal document text extraction and classification service replacing LLaMA 3.1 with faster, cost-effective ML models for entity recognition and document classification.

PythonspaCyFastAPIDockerONNX Runtime65% Cost Reduction

Project Overview

MLNER is an internal document text extraction and classification service developed to replace large LLM-based pipelines (such as LLaMA 3.1) with a faster, cost-effective, and scalable model. The service extracts key entities from documents such as IDs, financial statements, and utility bills with high accuracy and classifies document types in real time. It was deployed to reduce GPU load while maintaining top-tier extraction performance across various document layouts.

How It Works

Document Ingestion
A scanned document (PDF or image) is received via API.
OCR Preprocessing
Text is extracted using an optimized Tesseract pipeline with language-specific configs.
Entity Recognition
The extracted text is passed to the MLNER model to detect key entities (e.g., name, DOB, ID number, issue date).
Document Classification
A parallel model classifies the document type (e.g., National ID, Passport, Utility Bill, Bank Statement).
Post-processing
Outputs are validated against custom regex and logic rules (e.g., passport number format, date consistency).
API Response
Clean JSON with entity fields and document classification is returned to the client application.
Logging & Analytics
Metadata and inference times are logged for performance tracking and audit.

Key Features

Lightweight Model
Fast inference with average processing time under 300ms per document.
High Accuracy Training
Entity recognition trained on over 200k annotated samples for superior performance.
Multi-Document Support
Multi-document-type classification support for various document formats.
OCR-NER Pipeline
High OCR-to-NER pipeline compatibility for noisy scanned images.
Built-in Fallbacks
Intelligent fallbacks for low-confidence predictions to ensure reliability.
Scalable Microservice
Easily deployable as a scalable microservice with containerization.

Impacts & Results

  • 65% reduction in GPU inference costs after replacing LLaMA 3.1
  • 3× improvement in document processing speed (from ~1s to <300ms)
  • Enabled real-time identity document processing at high scale
  • Improved NER accuracy for low-resolution and multilingual documents
  • Enhanced throughput for high-volume onboarding workflows by 40%
  • Reduced hardware requirements while maintaining top-tier performance
  • Scalable microservice architecture for production deployment