Organize Entries with the Same Fixed Size.

Based on the similarities between characters, we will group similar characters together. Researchers McGraw, Rehling and Goldstone have built a data table comparing the similarity between letters according to the font grid on a number of fonts. And the following table illustrates a part of the research results of the author group:

Excerpt from a table of letter similarity assessment Based on the research results of the above group of authors and some experience in identifying words on tesseract. Our group would like to present a table of grouping characters that have similarities and are likely to be confused with other letters.


4.4 Dictionary data structure

Every word in the word crazy


are determined


gum


from root


and meaning


crab


from

(including transcription and word type). Each word will have a root word.

and meaning

with

kic

h size

different storage. Table 4.1 describes these data fields.


STT

Data field name

Note

1

From the root

As keywords in dictionary data, word lengths can vary so it has a

variable scale

Maybe you are interested!

Content

Includes transcription, word types and meanings.

of the word. It has variable size.

2

Table 4.2. Table describing the data fields .

With such data fields, we have to

organization

sentence

axis

data

to save

Store them for easy access like this:

g access. We have a

number of stages

French

organization

item

from

 Organize entries of the same fixed size.

 Organize entries of variable size.

We will consider each method in detail below.


4.4.1 Organize entries with the same fixed size.

To organize items of any size

try

h then chun

we must

know

kick

The largest possible entry size, to store all the data.

Advantages: fast, easy to retrieve word data when knowing its starting position.

Disadvantages: spread

g wastes memory because of the

g from kic

h size

very

small but

Using large data fields causes limited memory spread.

g phi

sculpture

now

very

round light

g on the device

moved

Size regulation

try

crab

item

from han

work

store cac

popular item size

from the root greater than or

cradle

large

more. Lam

for storage seems unnatural. Difficult to update additional entries during

New dictionary data generator.


4.4.2 Organize variable-sized entries.

Depending on each entry, there are kic

h size

how much will we give

give it a potty

enough memory to store their data. Thus each entry will have information about its starting position and the size of the data field associated with it.

Advantages: maximum saving of memory resources.

No rust

stimulus

size of entry.

Words that can have data

big

small

y. Not safe

h fragrance

to other entries so it sounds more natural.

Disadvantage: Due to the different sizes of the items, additional weight is required.

data to manage size, difficult for data to support faster lookup.

look up

therefore balance

organization

data

Like that

on the go

g then so

memory fade

be given priority

before that organization

item

from cun

g kic

h size

try

h load

exposed

defect

more points. Meanwhile, the organization

you

item

word with kic

h size

record

pedestal

g will compete

waste of memory space so this solution

ok

use

g in u

g dung.

And to solve the speed problem in accessing it we have to organize the data from

suitable dictionary


4.4.3 Organize quick lookup dictionary data

Problems organizing data items

pedestal

g has been

solve. Now together

We solve the problem of organizing files to support the heart.

cum

. Work

organization

sentence

online

There are many different ways to organize files quickly. Here are some ways to organize files:

 Sequential file: is a file that stores

item

Conjunction

each other. When

Search for keywords then do the question

g way to bring from science

compare

heart

search in the

crab root

episode

believe. Like

fin

slide

fast

The keyword matches the search result immediately.

first in the set

believe, and

at worst it has to review all

from the root

tendon

all

episode

news

heart

out. This method is easy to install but will take time to process due to

A lot of data sources from people

to

work

access

acupuncture

more than

especially on mobile devices.

 Index file: to increase efficiency

heart

cum

pair

with

episode

believe

large size people use file

index

. Episode

index

includes keywords and

information to describe the location of data

in

semantic file. Like that

work

search

from science

become easy

than just weighing

heart

cum

on one

episode

trust

from science

and just weigh

based on the location information that comes with it we get

ok

semantics

I have

You can use binary search to speed up this index.

heart rate

cum

on file

believe

 Hash file: instead of searching the entire

set of cards

from science

, we use

Hash file for classifying keywords

have a dick

g one

believe

substance

somewhere

into the same cluster, to limit the range of large data sets.

micro-cardia

cum

when salary

 Binary search tree: we can read all index records

heart tree

cum

, then

save that tree to file

news. work

use

Binary search tree

have advantages

is mining

ok

speed

heart rate

search, but also has the disadvantage of having to use a lot of memory to store the binary tree.

Through work

consider the methods

organization

episode

above news then consider

follow the eyes

speed and memory,

I see

work

conclusion

match

method

index

and hash

believe is true

fit

best for work

heart

cum

fast. Now we will go into

specific way

use this method


4.4.3.1 Organizing the index file

match

hash

believe

Purpose of work

index file hash

aim

split

micro-cardia

cum

, shout

quick search aid. So it must satisfy the following criteria:

 The hash function calculation must be fast.

 Keywords are evenly distributed in the hash table.

 Still keep the same order as the original dictionary data.

The index file has been

organize

population growth

according to you

g letter

should

we will chop according to the letters

where

crab

from. Like that

love

hashed

g ok

two

criteria. To satisfy the

sub-criteria

hybrid

I am a shredded body.

index

according to one

or more levels corresponding to

number of first characters. The hash table division can

g more

level will differentiate can

g details

data

. Therefore we must

review

choose

level

state

chop it up properly.

 First level hash table: Group words with the same

where character

fairy

a cluster. Thus the data will be distributed

h drill

g 30 cm,

However, there will be clusters with data sources.

how much

and cun

g has cum

number

The amount of data in each cluster remains the same.

big

cause orange

eagle

The application runs slowly so it needs to be partitioned even smaller.

 Second level hash table: Words with the same hash

g two characters where

will be

collect

into a cluster, so the heart range

cum

detained

knit

chicken

The

stool

The partition will be smaller so the search time within that partition is significantly reduced.

 Third-level hash table: Words with the same first three characters will be clustered into one cluster. Files will be fragmented, memory size and word loading time will increase.

So the job

First level hash will cause orange

dream

acupuncture

, child

state

level three hash

or more can be overspending

news and hype

h size

memory one

unnecessary way. In the

mobile device

porch

now

processing power

g increase

significantly so the use of hashing reduces the amount of memory occupied.

two can neutral

between

time

nap time


4.4.3.2 Semantic file organization

Based on cost

section we have information about score

catch

where

and toughness

crab

stool

meaning in the semantic file. Therefore,

semantic file only includes

you

Truong

upcoming data

arranged sequentially in a set

believe. lips

truon

g allowance

includes

stool

national transcription

facts, word classes and more

other meaning of word


4.4.3.3 Dictionary lookup

Through sentence analysis


axis


organization


data


crazy word


as above we have the diagram

Hash Message 2

news

news

Hash Message Type 1

dictionary data (Figure 4.4) and how to organize this data.

on which we have the technique

math

look up words

fit

way


a_pos_length b_pos_length

...

t_pos_length

...

z_pos_length

a _pos_length aa_pos_length ab_pos_length

...

az_pos_length b _pos_legnth ba_pos_length

...

ta_pos_length

...

zz_pos_length

a_pos_length

...

alone_pos_length

...

table_pos_length

...

zymotic_pos_length

[ei, ə] noun: First, Second, Third Generation

...

[ə'loun] ƉŚſ ƚá & ƚáh: ŵҾt Ƶsình tr ƚ ƌa

...

['teibl] noun: Food

...


Table 4.4 Hash tables for organizing data files

 “Table” lookup algorithm.

on the map

as above

later.

I want

look up word

 Step 1: Read the hash file

1 then

dun

g heart

cum

binary lookup

key “t” in level 1 hash file. Melon

enter

from science

"t" means heart

ok

Level 2 hash file capture location information.

where

(pos) and toughness

(length) crab

cum

data

 Step 2: Read the hash file

2 ears

position and move

tough

length

just found me

get out

blue

car

data

eraser

you

from science

starts with 2 letters and has “t” somewhere

: "I"

 “tw”. Continue searching

binary search for keyword “ta” in block

data

on my heart

ok

index file location information.

where

(pos) and toughness

(length) crab

cum

data

 Step 3: Read the ear index file

position and move

tough

medium length

found

we spread

get out

is one

Khoi

data

eraser

you

all

whole word

lock start 2 ky

self "I". Continue

news

heart

cum

binary from science

“table” in data block

on my heart

ok

location information

where

(pos) and length (length) of the data cluster in the semantic file.

 Step 4: Read semantic file

ears

position and move

tough

length

just found me

get out

one

Khoi

surplus

feces

language

meaning of the word “table”. Return the result to the application.


Chapter 5: INSTALLATION AND EXPERIMENTATION

When developing applications

g from crazy

dun

g camera on crazy

okay

move

g Android

out of the

algorithm, sub data structure problems

have features

man specific

picture

display, camera processing, configuration storage, sound... Therefore, in application programming, it is necessary to solve the following technical problems:


5.1 Draw frames and controls on the Camera screen.

The camera screen is the main screen of the application, it is designed quite specially.

consists of 2 layers

. The first layer below is the porch.

market

h an

h camera. Class

Monday

above are the controls including the bounding box

take

you, guys

button, input text, text

view. There are two grinding problems.

out of scale

solve

with

two layers

now: porch

market and people

patience

interaction. Image 5.1 is the main camera screen interface.



Figure 5.19 Camera screen interface

For

display problem for two layers

year

bury

g on top of each other, then in Android for

allows us to design such layouts thanks to FrameLayout . Thus the controls

divided into 2 groups

to porch

2-layer display

today. group

1 is:

SurfaceView content

camera display. Group 2 includes the frame

Han

earth

g shot

, you

button capture

Older brother,

focus button, flash on/off button, camera zoom button, word lookup button

and edittext is displayed from

take

. On one

number of orders

design

crazy

okay

no shout

If auto focus or flash is enabled, the corresponding buttons will not be displayed.


For the interactive event recognition problem, we need to draw a frame.

image that limits the capture area. This frame has variable size.

although

meaning

you

content

Therefore, a graphics layer (Paint layer) is needed above to support drawing frames.

Paint the frame will cover everything

math

man

hint

should i

we will not interact

online

get more

control below. When clicked

you

control will not multiply

ok

the

event. To solve this problem, we use a button click event simulator, which

means to take the event by prescription

dollar,

interactive location

on man

hint

h, state

that position year

on which control, that control calls the click event. If you click on the frame and drag and drop,

The frame will change position.

Work

handle now

Although it is difficult, it is

passion

all

Controls are interacted with even if they are stacked on top of each other.

The rectangle on the main screen is used to limit the capture area that can be changed.

size for husband

match

with

size

letter

sacrifice

bag

ripeness

yes

In class application

RectView is created

out

g to observe

work management

today. to

The rectangle on the screen needs a graphics class called Paint , we initialize it.

and declare the necessary objects as follows:

private Paint paint = new Paint(Paint.ANTI_ALIAS_FLAG);

private static float top; private static float left; private static float right; private static float bottom;

With Paint

class

contains cac

information about pride

people

g and

quickly

color

, supply

you

method used to draw the

hint

study

, draw letters and cards

bitmap. Child

you

top parameter,

left, right, bottom are used to determine the top, left, right, bottom positions of a rectangle.

The function used to initialize parameters is as follows:

private void Init() {

// TODO Auto-generated method stub

left = (MAX_WIDTH / 2) - 100;

top = (MAX_HEIGHT / 2) - 40;

right = left + 120;

bottom = top + 60; paint.setColor(Color.WHITE); paint.setStrokeWidth(3); paint.setStyle(Style.STROKE); invalidate();

}


5.2 Capture images from the phone camera.

The input for the Tesseract optical character recognition is a bitmap image file, so we need to program the camera on the Android phone to be able to capture images from paper documents.

Comment


Agree Privacy Policy *