Illustration of Basic Steps in an Image Processing and Recognition System.

+ M-link (mixed link) : two points p and q with values from V are m-links if they satisfy one of the following two conditions:

- One is: q belongs to N4(p)

- Second: q belongs to NP(p)

1.1.9. Image boundary

Image boundaries are a major issue in image analysis, as image segmentation techniques are mainly based on boundaries. According to [3] , a pixel can be considered an edge if there is a sudden, sharp change in gray level. The set of edge points forms the image boundary or image boundary (Boundary). In other words, an image boundary is a place where neighboring pixels have a sudden, sharp change in intensity.

For example, in a binary image, a point can be called an edge if it is a black point and has at least one white point as its neighbor.

1.2. GENERAL INTRODUCTION TO IMAGE PROCESSING AND RECOGNITION

Like graphic data processing, image processing is a field of applied computing. Graphic data processing refers to artificial images, which are considered as data structures and are created by programs. Image processing includes methods and techniques for transforming, transmitting or encoding natural images. The purposes of image processing include:

- First: Transform and beautify photos.

- Second: Automatically recognize or guess images and evaluate the content of the images.

Image processing is usually performed according to the principle: processing through the use of image transformation functions. Image transformation is a process performed through operators. An operator takes an image as input to the system and produces another image according to the processing requirements. To perform the image transformation process, we are mainly interested in linear operators.

Suppose O(f) is the O operator of an image f, then the O operator is called linear if we have: O[af + bg] = aO(f) + bO(g) for all f, g and a, b.

In image processing, operators are defined as point spread functions. A point spread function of an operator is the result we get after applying the operator's rule to a point source: O[point source] = point spread function. Or we have: O[δ(x-α, y- β)] = h(x, α, y, β). In which δ(x-α, y-β) is a point source with a light intensity of 1 located at point (α,β) and if the operator is linear, we have: O[aδ(x-α, y-β)] = ah(x, α, y, β); that is, if the light intensity is increased a times, the result obtained also increases a times.

Image recognition is the process of descriptors of the objects that one wants to characterize. The recognition process is usually followed by the extraction of essential features of the object. There are two types of object descriptors:

- Parameter description (identification by parameter).

- Structured description (structured identification).

Recognizing and evaluating the content of an image is the analysis of an image into meaningful parts to distinguish one object from another. Based on that, we can describe the structure of the original image. Some basic recognition methods can be listed such as recognizing the edge of an object on the image, separating edges, segmenting images, etc. In practice, people have applied recognition techniques quite successfully to many different objects such as: fingerprint recognition, letter recognition, flower recognition, pet recognition, etc.

The processes of image processing and recognition can be carried out according to the following diagram:

Image acquisition

Image digitization

Image preprocessing

Feature extraction

Training data

Classifier training

ModelRecognition training data

Identification results

Identification

Figure 1.1: Illustration of the basic steps in an image processing and recognition system.

1.2.1. Image acquisition

In the process of image processing, the first thing to do is to acquire the image (Image acquisition). The quality of the acquired image will greatly determine the result of the recognition. Then the image must be stored in a format suitable for the following processing steps.

Images can be taken from a camera lens or a phone. Images can be captured through a camera, usually an analog signal from a tube-type camera (CCIR), but can also be a digital signal from a CCD (Charge Coupled Device). Images can also be captured from satellites through sensors, or scanned through a scanner.

1.2.2. Image digitizer

Image digitizer is the process of converting analog signals into discrete signals (sampling) and digitizing them quantitatively, before moving on to the processing, analysis or storage stage.

1.2.3. Image processing

This step increases the ability to accurately identify, and plays a role in improving the quality of the image before analyzing and identifying. Due to various reasons, which may be due to the image acquisition device, the light source or noise, the image may have low contrast, may be degraded.

distortion. Therefore, the main job of image preprocessing is to improve image quality. Image quality improvement is an important step, it is the premise for image processing. The main purpose is to highlight some image characteristics such as contrast, color enhancement, noise filtering, image smoothing, image amplification, image enhancement or image restoration to highlight some main characteristics of the image to improve image quality, or make the image similar to the original state (the state before the image is distorted), etc.

Image enhancement is the process of improving an image so that it can clearly represent its characteristics, such as gray level control, image contrast, noise reduction, etc. Depending on the desired requirements, there may be two important issues: gray level statistics of the image and image frequency.

1.2.3.1. Noise filtering

There are many types of noise, but they can be classified into noise from image acquisition devices, independent random noise, and noise from observed objects. People often approximate noise by linear invariant processes because there are more linear tools that can solve the problem of image restoration and enhancement than nonlinear ones, and they also allow easier processing on computers.

From the above problems, we can smooth the image by filtering noise according to the types of linear filters (Liner filter) or nonlinear filters. Linear filters include spatial average filters (Mean filter, Average filter), low pass filter (Low pass filter), ideal high pass filter, Gaussian blur filter, etc. Nonlinear filters include Median filter (Median filter), Pseudo median filter, Outer filter (Oulier filter), etc. In this thesis, I only want to mention the linear Gaussian blur filter as follows:

According to document [5] , Gaussian blurring is a way to blur an image using a Gaussian function. This method is widely and effectively applied in graphics processing software. It is also a popular tool for performing image preprocessing to make good input data for more advanced analysis in computer vision, or for algorithms that are performed in a different scale of the given image.

So, we can say that Gaussian blurring is a type of image blurring filter, using Gaussian function theory also known as Normal distribution in statistics to calculate the transformation of each pixel of the image, helping to reduce noise and unwanted detail of the image. Here is the equation of Gaussian function (Gaussion distribution) in two-dimensional space:

[ ] 1

m 2 + n 2

Gaussian m, n

= . exp (−

√2πσ

2σ 2 ) (1.1)

The mask of this filter is constructed from the Gaussian function (1.6) above. The Gaussian mask is square. The coefficients of the mask are usually limited to the range of 4σ to 6σ which can give us good efficiency.

1.2.3.1. Gray level

Grayscale

The gray level histogram of a gray image is a diagram that represents the frequency of occurrence of each gray level, i.e. the gray level histogram of an image is a discrete function. This diagram is represented along the (x,y) coordinate axis. The horizontal axis represents the gray levels from 0 to 255, and the vertical axis represents the number of pixels corresponding to the gray level on the horizontal axis.

So we have the relationship: y = f(x) = number of pixels with the same gray level x.

When the function is normalized so that the sum of the gray levels is 1, the function can be considered a density function. Given the gray level value found in the image, the gray level value is a random value.

y = p(x) = h(x)

L − 1

with L usually equal to 256 (1.2)

Thus, the gray level histogram provides information about the gray levels of an image, and is a useful tool in many stages of image processing.

Grayscale transformation

The grayscale histogram represents an image as wide as possible. If x is the grayscale value of the original image and the new image grayscale value is:

s = T(x) where T is called the gray level transformation function.

Some measures to enhance images by grayscale transformation:

- Leveling gray level:

v = LTHĐ[f(u)] = LTHĐ [∑ p(x)] with 𝑥 = 0,1, … , 𝑢 (1.3)

x=0

Where p(x) is a proportional gray level function, and LTHĐ is a uniform quantization of the value of f(u) to gray level values from 0 to L-1. Using the integer sampling function Int, we can have the following LTHĐ:

LTHĐ[x] = Int [ x − x min (L − 1) + 0.5] with x = f(u) (1.4) 1 − x min

- Gray level nonlinear transformation:

Before performing LTHĐ, people use nonlinear function f(u) to transform gray level

u. The following forms of the function f(u) are possible:

∑ u [p(x)] 1/n

f(u) =x=0

x=0

∑ L−1 [p(x)] 1/n

with n being an integer > 1 (1.5)

f(u) = log(1 + u)

with u ≥ 0	(1.6)
f(u) = u 1/n	with u ≥ 0 and n is an integer > 1	(1.7)

Maybe you are interested!

1.2.4. Image analysis and feature extraction

1.2.4.1. General introduction to image analysis

The resulting image after being enhanced will give a more realistic and clearer image. But the images are not simply stored and displayed to the viewer, but the processing continues with the meaning of automatically finding the information contained in the image to provide for human needs. Doing so is called image analysis or also known as image representation and description.

Image analysis is the next stage of image preprocessing. This process transforms the image to derive important features of the image. This is the most important stage of image processing. In an image, there can be many objects, each object carries different information, including information that needs to be known. In image analysis, we often find pixel feature areas such as image boundaries, image regions, image segmentation into separate regions, etc. and represent them through the feature pixels. Image analysis has two main meanings:

- Reduce unnecessary information in the image, leaving only characteristic information such as borders, skeleton, etc. of the object.

- Separate objects in the image separately.

1.2.4.2. Standardization and image feature extraction

Standardization

Variation is inherent in nature, and it is the diversity of forms that a recognition problem exists. The main question for a recognition problem is how can the variations be solved? There are features of the object that are invariant to external influences so that the feature extraction process can work, but there are also features that are very difficult to capture when the object changes. That is why this standardization step is often present and necessary in recognition problems. It reduces the parameters that are heavily affected by the transformation, in other words, it reduces the data to a common form in which feature extraction can be performed correctly.

Feature extraction

Feature extraction is the step of representing samples by object features. During the process, the image data will be reduced. This is essential for saving memory in storage and calculation time. The task for this step is to extract the specific properties of the object in the image area. Then each feature of the object will be described in numerical form, these values are collected into a vector describing the sample. Performing this task includes two jobs:

- Reduce data set.

- Focus on those numbers to layer essential information.

Feature extraction is the process of selecting geometric elements. Transforming individual elements may change the order of the quantities, which may affect the classification. This problem is usually solved by applying a suitable linear transformation to the components of the feature vector.

A good feature extraction method is one that extracts object features that help distinguish different sample classes, and also transforms the inherent properties of the object or those created by the image acquisition devices.

Up to now, there has not been an optimal mathematical method to meet the above requirements. Experts still have to rely on intuition and imagination to find suitable features of the object. Some typical selection methods are: Morphology method, PCA method [17] , Canny edge finding method [6], [16], [22], [23], Entropy measurement determination method [8], [20], etc.

1.2.4.3. Image boundary finding technique

1.2.4.3.1. Overview of finding boundaries

Boundary of image object:

The edge is the separation between two regions with relatively different gray levels. To determine the limit of an image object, people rely on the edge of the image object. The edge of the object provides a lot of characteristic information of the object, so the recognition process mainly relies on the edge of the object. In terms of signal, the edge of the image is a collection of points at which a sudden change in light intensity is determined. This is the basis for edge finding techniques.

Some common types of borders in practice:

(a)

Stepped profile

(b) Slope

(c)

Square pulse profile

(d)

Cone shape

Figure 1.2: Some common types of image borders

Classification of edge finding techniques:

There are two methods of finding boundaries for an object:

- Direct method: Use derivatives to find the variation of light intensity. For example, the Gradient technique uses first derivatives, the Laplace technique uses third derivatives.

2. This method is effective when the intensity at the edge changes abruptly, and it is less affected by noise.

- Indirect method: Perform image segmentation first and the edge of the separated image area is the required edge. This method is effective in cases where the intensity change at the edge is small. Some main image segmentation methods are: image segmentation based on amplitude threshold, image segmentation based on homogeneous region, image segmentation based on edge.

According to [1] , the direct edge detection method is quite effective and less affected by noise, but if the brightness variation is not sudden, the method is less effective. The indirect edge detection method, although difficult to install, is quite well applied in this case.

According to [6] , traditional edge detection methods are often based on the results of convolution between the image to be studied f ( x , y ) and a 2D filter h ( x , y ) often called mask h ( Mask).

+∞ +∞

h(x, y) ∗ f(x, y) = ∫ ∫ h(k 1, k 2 )f(x − k 1, , y − k 2, )dk 1, dk 2

(1.8)

−∞ −∞

If h(x,y) and f(x,y) are discrete, then formula (1.8) will be rewritten as:

+∞ +∞

h(n 1 , n 2 ) ∗ f(n 1 , n 2 ) = ∑ ∑ h(k 1, k 2 )f(n 1 − k 1, , n 2 − k 2 ) (1.9)

k 1 =−∞ k 2 =−∞

In practice, people often use h ( n, n ) as a [3*3 ] matrix as follows:

h(−1,1) h(0,1) h(1,1)

h = [h(−1,1) h(0,1) h(1,1)]

h(−1,1) h(0,1) h(1,1)

The structure and values of edge detection operators determine the characteristic direction in which the operator is sensitive to the edge. Some operators are suitable for horizontal edges, while others are suitable for vertical or diagonal edges. There are many edge detection methods in use, but they can be divided into two basic groups: Gradient edge detection and Laplacian methods. In this thesis, I will only introduce the Gradient method as follows:

1.2.4.3.2. Gradient Technique

Gradient Concept:

Gradient is a vector with two components representing the rate of change of light intensity value in two directions x and y. According to documents [1], [6] and [23] , if given a continuous image f(x,y), the two components of Gradient (symbol: G x , G y ) are the partial derivatives of f(x,y) in two directions x and y as the following two formulas (1.10 and 1.11):

G = f

= ∂f(x, y) ≈ f(x + dx, y) − f(x, y)

(1.10)

xx ∂x dx

G = f

= ∂f(x, y) ≈ f(x, y + dy) − f(x, y)

(1.11)

yy ∂y dy

In which dx, dy are the distances in the x and y directions. In practice, people often use dx = dy = 1.

Gradient Technique:

This technique uses a pair of orthogonal masks h1 and h2 (in 2 perpendicular directions). According to document [6] , if we define G x , G y as the corresponding gradient in 2 directions x and y, then the amplitude of the gradient denoted by g at point (m,n) is calculated according to formula (1.12):

𝐴 0 = 𝑔 ( 𝑚, 𝑛 ) = √𝐺 2 ( 𝑚, 𝑛 ) + 𝐺 2 ( 𝑚, 𝑛 ) (1.12)

𝑥 𝑦

The direction of the gradient vector is determined by the following formula (1.13):

𝜃 (𝑚, 𝑛) = 𝑡𝑎𝑛 −1 ( 𝐺 𝑥 (𝑚, 𝑛) ) (1.13)

𝑟 𝐺 𝑦 (𝑚, 𝑛)

The direction of the edge will be perpendicular to the direction of this gradient vector.

In addition, there are quite a lot of applied derivative operators, typically: Robert, Sobel, Prewitt, Sobel, Canny operators (based on calculating the maximum and minimum values of the first derivative of the image) according to [6], [22], [23] and [25] as follows:

a) Sobel operator:

In practice, Sobel uses two masks of size [3 * 3] where one mask is simply a 90 0 rotation of the other mask as shown below. These masks are designed to find the best vertical and horizontal edges. When performing a convolution between the image and these masks, we get the vertical and horizontal gradients G x , G y . The Sobel operator has the form (Figure 1.3) as follows:

-1