Semi-automatic Segmentation & Alignment of Handwritten



Yüklə 11,83 Mb.
səhifə3/23
tarix07.09.2023
ölçüsü11,83 Mb.
#141855
1   2   3   4   5   6   7   8   9   ...   23
Table of Contents

  1. INTRODUCTION 1

  2. SUBJECT THEORY 3

    1. Segmentation 3

    2. Alignment 3

    3. HTR systems 4

    4. Related Work 5

    5. Machine learning for image segmentation 6

  3. DATA & SOFTWARE 7

    1. Complications with the data 7

    2. Software & Python packages 8

    3. Ethics and conflict of interest 9

  4. ALIGNMENT ALGORITHM DEVELOPMENT 11

    1. Page image preprocessing 11

    2. Line segmentation 14

    3. Word segmentation 16

    4. Interactive correction 22

    5. Self learning 24

    6. User interface 25

  5. PERFORMANCE EXPERIMENTS 27

  6. RESULTS 31

    1. Performance of the algorithm 31

    2. Visualisation of the algorithm pipeline 35

  7. DISCUSSION 39

    1. Performance analysis 39

    2. The importance of ground truth quality 40

  8. CLOSING REMARKS 43

    1. Conclusion 43

    2. Future work 43

REFERENCES 46
SUPPLEMENTARY FILES 48



Abbreviations




GMM Gaussian Mixture Model
GT Ground Truth
HPP Horizontal Projection Profiling HTR Handwritten Text Recognition IoU Intersection over Union
ROI Region Of Interest
  1. Introduction


The process of segmenting a document image into text lines or words and then align- ing them to create an annotation is an important preprocessing step in many cases of document understanding. Handwritten text found in historical records, such as tables, can not be automatically transcribed by traditional Optical Character Recognition sys- tems. Therefore, Handwritten Text Recognition (HTR) is being used, which has shown exceptional performance in fields such as text-line segmentation, keyword spotting, and character recognition, among others (De Gregorio et al. 2023). HTR models usually require large amounts of annotated data for learning. Such annotations can be obtained in a process called alignment, which makes use of manual transcriptions of some parts of the document, and links corresponding text images to transcribed words. This project started as part of an effort to digitize historical document images.


This thesis aims to develop a pipeline that covers the process of going from a raw his- torical document image to segmented words and then aligning the segmented words to a transcription or ground truth (GT) with interactive self-learning built into the algorithms. The start of a suitable user interface is also developed. The research objectives were:



    • Develop an interactive algorithm for the alignment of historical document images







    • Integrate self-learning in the algorithm




    • Evaluate the performance of the algorithm

Key questions that were to be answered are: What factors affect the segmentation and alignment? How can self-learning be integrated?


The algorithm created for this thesis is intended to work on one image at a time and was primarily tested on the data set Labour’s Memory (Chapter 3). For the algorithm to work as intended, some conditions must be fulfilled for the images used. The text lines in the document images need to be approximately straight. The document images can only have a limited amount of noise; if there is too much noise in the image, the algorithms will not perform as intended. The words in the image can only overlap to an extent, although the characters in the words can overlap as the algorithm only needs to segment the image into words. The algorithms are only intended to segment lines and words, symbols such as [ , : ; ” - ) ( ] are not individually segmented as they are not


words.

Figure 1: An example image, taken from the Labour’s Memory dataset



  1. Yüklə 11,83 Mb.

    Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   23




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin