Utilization of Optical Character Recognition and Text Feature Extraction to Build a Workforce Complaint Database

Pemanfaatan Optical Character Recognition Dan Text Feature Extraction Untuk Membangun Basisdata Pengaduan Tenaga Kerja

  • Yan Puspitarani Universitas Widyatama
  • Yenie Syukriyah
Keywords: OCR, text feature extraction, database


The examination of complaints of labor violations is part of the main activity of the labor inspection section within the Department of Manpower. Monitors will examine companies that are considered to have violated labor laws based on a letter of complaint sent by the relevant union organization or legal aid agency. The easy way to communicate at this time, making the submission of complaint letters can be directly sent in the form of images through electronic media such as whatsapp or email. This makes it difficult for administrative staff to recapitulate incoming complaints because they have to read and enter data manually into the system. Therefore, this research was conducted to create a system that utilizes OCR technology and text feature extraction to be able to input complaints data automatically. This research resulted in a prototype of letter input and a database of letter storage that can be further utilized for Data Mining and Business Intelligent. OCR implementation is done by using the Tesseract library while the text feature selection utilizes the Natural Language Toolkit (NLTK) library. The results of testing of the prototype showed an accuracy of 66.7% of the OCR results and 91.67% of the manually typed letters.



