Intelligently Entity Extraction Using OCR and NER for Business Cards

Authors

  • Thanh Thao Thai Thi Trường Đại học Ngoại Ngữ Tin học TP.HCM
  • Xuan Thu Tuong Thi

Abstract

This paper introduces a framework for building a custom Named Entity Recognizer (NER) tailored for extracting important entities from scanned documents, with a focus on business cards to ensure data privacy. The approach is adaptable to other financial documents, including invoices, shipping bills, and bills of lading. However, in this paper I focus on the Bussiness Cards only. The project is the combination of two main data science technologies: Computer Vision and Natural Language Processing (NLP).  In which, the Computer Vision component involves extracting text from document images using tools like OpenCV, NumPy, and Pytesseract. The NLP phase focuses on entity recognition, text cleaning, and parsing through the use of libraries such as SpaCy, Pandas, Regular Expressions, and String manipulation. This method provides a flexible and efficient solution for automating entity extraction across different types of financial documents.

Downloads

Published

08-07-2025

How to Cite

Thai Thi, T. T., & Tuong Thi, X. T. (2025). Intelligently Entity Extraction Using OCR and NER for Business Cards. HUFLIT Journal of Science, 9(2), 22. Retrieved from https://vjst.net/index.php/hjs/article/view/262

Issue

Section

Review Articles

Categories