<p> Text recognition is a field that has been researched and applied for many years. Text recognition process is performed through the following main steps: The input image page will go through the preprocessing step, then the page analysis step, the output of the page analysis will be the input of the recognition step, and finally post-processing. The result of a recognition system depends on two main steps: page analysis and recognition. At this point, the problem of recognition on printed text has been resolved almost completely (ABBYY's FineReader 12.0 commercial product can recognize printed text in various languages, recognition software of Vietnamese words in VnDOCR 4.0 of the Hanoi Information Technology Institute can recognize with accuracy over 98%). However, in the world as well as in Vietnam, the page analysis problem remains a major challenge for researchers. Until now, page analysis is still receiving the attention of many researchers. Every two years in the world there is an international page analysis contest to promote the development of page analysis algorithms. These were the motivations for the dissertation to try researching so that they can propose effective solutions to the page analysis problem. In recent years, there are many page analysis algorithms have been developed, especially are hybrid-oriented approached development algorithms. The proposed algorithms show different strengths and weaknesses, but in general most of them still suffer from two basic errors: an error separating a correct text area into smaller that leads to mislead or miss the information of text lines or paragraph (over-segmentation), the aggregation error of text areas in text columns or paragraphs together (under-segmentation). Therefore, the objective of the dissertation is to study and develop page analysis algorithms that simultaneously reduce both types of errors: over-segmentation, under-segmentation. The issues in page analysis are very broad so the dissertation limits the scale of the study within the scope of text image pages written in Latin language which particularly is English and focuses on the analysis of the text areas. The dissertation has not proposed the problem of detecting and analyzing the structure of table spaces, detecting image areas and analyzing logical structures. With the objectives of the dissertation have achieved the following results: 1. Propose a solution that speeds up the algorithm for detecting background images. 2. Proposed adaptive parameterization method reduces the effect of size and font type on the results of page analysis. 3. Proposed a new solution for the problem of detecting and using separator objects in page analysis algorithms. 4. Proposes a new solution that separates text areas into paragraphs based on context analysis </p>
<p> Đất nước ta đang trên đà phát triển theo hướng công nghiệp hóa, hiện đại hóa. Cùng với đó là nhu cầu sử dụng năng lư ...
<p> Hiện nay tình trạng giết mổ gia súc, gia cầm thủ công tự phát đang xảy ra ở rất nhiều nơi. Với số lượng điểm giết mổ ...
<p> Đặt vấn đề Một trong những vấn đề nằm trong những quan tâm hàng đầu đặt ra cho sự nghiệp đổi mới đất nước, đó là ph ...
<p> Tiến bộ của khoa học và công nghệ ngày càng được ứng dụng phục vụ công cuộc chăm sóc sức khỏe con người nhiều hơn.Kỹ ...
<p> Nghị quyết số 29-NQ/TW ngày 04 tháng 11 năm 2013 của Hội nghị lần thứ tám, Ban Chấp hành Trung ương Đảng khóa XI về " ...
Hỗ trợ download nhiều Website
Hỗ trợ nạp thẻ qua Momo & Zalo Pay
Khi đăng ký & nạp thẻ ngay Hôm Nay