Tóm tắt Luận án Document geometric layout analysis based on adaptive threshold

<p> Text recognition is a field that has been researched and applied for many years. Text recognition process is performed through the following main steps: The input image page will go through the preprocessing step, then the page analysis step, the output of the page analysis will be the input of the recognition step, and finally post-processing. The result of a recognition system depends on two main steps: page analysis and recognition. At this point, the problem of recognition on printed text has been resolved almost completely (ABBYY's FineReader 12.0 commercial product can recognize printed text in various languages, recognition software of Vietnamese words in VnDOCR 4.0 of the Hanoi Information Technology Institute can recognize with accuracy over 98%). However, in the world as well as in Vietnam, the page analysis problem remains a major challenge for researchers. Until now, page analysis is still receiving the attention of many researchers. Every two years in the world there is an international page analysis contest to promote the development of page analysis algorithms. These were the motivations for the dissertation to try researching so that they can propose effective solutions to the page analysis problem. In recent years, there are many page analysis algorithms have been developed, especially are hybrid-oriented approached development algorithms. The proposed algorithms show different strengths and weaknesses, but in general most of them still suffer from two basic errors: an error separating a correct text area into smaller that leads to mislead or miss the information of text lines or paragraph (over-segmentation), the aggregation error of text areas in text columns or paragraphs together (under-segmentation). Therefore, the objective of the dissertation is to study and develop page analysis algorithms that simultaneously reduce both types of errors: over-segmentation, under-segmentation. The issues in page analysis are very broad so the dissertation limits the scale of the study within the scope of text image pages written in Latin language which particularly is English and focuses on the analysis of the text areas. The dissertation has not proposed the problem of detecting and analyzing the structure of table spaces, detecting image areas and analyzing logical structures. With the objectives of the dissertation have achieved the following results: 1. Propose a solution that speeds up the algorithm for detecting background images. 2. Proposed adaptive parameterization method reduces the effect of size and font type on the results of page analysis. 3. Proposed a new solution for the problem of detecting and using separator objects in page analysis algorithms. 4. Proposes a new solution that separates text areas into paragraphs based on context analysis </p>

TÀI LIỆU LUẬN VĂN CÙNG DANH MỤC

HỖ TRỢ TÌM VÀ TẢI TÀI LIỆU

  • Từ ngày 01/05/2022

    Luanvan365 sẽ có thêm dịch vụ hỗ trợ các bạn tìm kiếm các tài liệu, luận văn ở nhiều website khác nhau
    Bạn có thể liên hệ với Admin để được hỗ trợ nhé
  • THÔNG TIN LIÊN HỆ


    Phone: 0909.773687 (Zalo, Text) Facebook : Facebook chat hỗ trợ

  • XEM THÊM THÔNG TIN

    Xem thêm bài viết
LIÊN HỆ NGAY

TIN KHUYẾN MÃI

  • Thư viện tài liệu Phong Phú

    Hỗ trợ download nhiều Website

  • Nạp thẻ & Download nhanh

    Hỗ trợ nạp thẻ qua Momo & Zalo Pay

  • Nhận nhiều khuyến mãi

    Khi đăng ký & nạp thẻ ngay Hôm Nay

NẠP THẺ NGAY