Neural Network Coordination for ECG Signal Recognition Using Decision Tree Model

Extract the results of the QRS complex detection algorithm, tested with 100 records in the MIT-BIH database set, the results are shown in Figure 3.6-f.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.6. Example steps of R peak detection: (a) original ECG signal, (b) filtered result,

(c) result after taking derivative, (d) result after taking absolute value, (e) result after taking average, (f) result of detecting R peak.

3.1.2. Analysis of QRS complex by basis Hermite functions

Analyzing the ECG signal by Hermite function is a quite popular method, has been used by many authors, the results are good and this method is also chosen for the thesis. In this section, the author will present the process of creating characteristic vectors of the ECG signal.

a) Hermite function

The Hermite polynomial is given in recursive form by the following formula:

H n  1 ( x )  2 x  H n ( x )  2 n  H n  1 ( x )

(3.6)

give

n  1,

with

H 0 ( x )  1; H 1 ( x )  2 x .

The Hermite function is defined by the following formula:



n ( x )  

 2 n n !  2e

x 2

2 H n ( x )

(3.7)

Some shapes of Hermite functions are shown in Figure 3.7. We can see that the higher the degree of the Hermite function, the greater the rate of change of the function, or in other words, the function will contain more high-order components. At the same time, the shape of the functions is quite similar to the shape of the basic components in the ECG signal. This is the basis for using Hermite functions to analyze electrocardiogram signals.

-10

-5	0	5	10 -10	-5	0	5	10
		(c)				(d)

Maybe you are interested!

-10 -5

(a)

5 10 -10 -5

(b)

Figure 3.7. Graph of Hermite function of degree n: a) n=0, b) n=1, c) n=3, d) n=10.

To represent a signal s(n) in terms of the first N Hermite basis functions, we need to find coefficients c i such that:

N  1

s ( t )  c i  i ( t )

i  0

When we have a digitized signal, instead of a time function

(3.8)

s ( t ) , we have a sequence

p values of the signal at times (sampling) t 0 , t 1 , , t p  1 , then the coefficients c i

Choose to best satisfy the following system of equations:

Okay

 c 0 0 ( t 0 ) c 1  1 ( t 0 ) c N  1 N  1 ( t 0 ) s ( t 0 )

 c ( t )  c ( t )   c ( t )  s ( t )

0 0 1 1 1 1

N  1

N  1 1 1



 ... ... ...

  c 0 0( t p)  c 1  1 ( t p)   c N 1 N 1 ( t p  1 )  s ( t p  1 )

with N=16, p=91. Or convert to matrix form:

(3.9)

 0 ( t 0 )

 0 ( t 1 )

 1 ( t 0 ) ...

 1 ( t 1 ) ...

N  1 ( t 0 ) 

N  1 ( t 1 ) 

c 0 

c 1 

 s ( t 0 ) 

 s ( t 1 ) 

       

 ... ... ... ...     

 ( t ) ( t ) ... ( t )   c   s ( t ) 

 0 p

We denote:

1 p N  1

p  1  

N  1  

p  1

  (3.10)

0 ( t 0 )

 0 ( t 1 )

 1 ( t 0 ) ...

 1 ( t 1 ) ...

N  1 ( t 0 ) 

N  1 ( t 1 )  

s ( t 0 )   c 0 

s ( t 1 )   c 1 

A    ; b    ; x   

 ... ... ... ...     

 ( t

) ( t

) ... 

( t )   s ( t

)  c

 0

p  1 1

p  1

N  1

p  1  

 p  1  

N  1 

Then, the system of equations (3.10) will have the matrix form:

A x b

good

min

A x b

(3.11)

The coefficients c i that best satisfy the above system of equations are equivalent to achieving

minimum of the error function min

A x b . Normally, the sampling point data p=91 is large

than the number of polynomials N=16 used for approximation, so this is a system of equations with

more equations than unknowns. The optimal solution of the system of equations can be determined using the Singular Value Decomposition (SVD ) method.

b) Using SVD method to determine the characteristics of electrocardiographic signals

To find the optimal solution of a system of first-degree equations with a large number of equations

more than hidden number

A x b , as presented above. According to the SVD method, first analyze

Matrix product A is the product of three special matrices:

AP  N

S p p V p  N D T  N

(3.12)

with S and D being orthogonal matrices, V being a diagonal matrix. Then, the matrix

pseudo inverse



N p

of matrix A can be determined by the formula:

A D V S T

N p N  NN pp p

(3.13)

where V + is the pseudo-inverse matrix of matrix V , determined by replacing the diagonal elements of matrix V with the inverse value, then transposing the resulting matrix . Once the pseudo-inverse matrix A + is determined , the optimal solution x

of the system of equations

A x b can be easily calculated by the formula:

x A b (3.14)

As presented in the above section, the solution result x is the expansion coefficients c i of the ECG signal that will be used as the values of the characteristic vector. ECG signal s  t b can be restored by the following formula:

N  1

b  c i   i ( t )  b  A  x

i  0

(3.15)

Figure 3.8 shows some examples of QRS complex analysis using Hermite functions.

In which, the green color represents the original signal.

s  t , red represents the signal approximately b .

Due to the signal

s  t 

is a combination of Hermite functions, the higher the Hermite degree, the more it will represent.

can represent fast-varying components , for example, in Figure 3.8-a, when using few Hermite functions (N=5) , only slow-varying components and quite large deviations can be represented, but when using more Hermite functions, for example in the case of N=10

(Figure 3.8-b) , N=12 (Figure 3.8-c) , N=16 (Figure 3.8-d) perform much better, especially with N=16 the overlap is relatively high.

(a)

(b)

Figure 3.8. Approximation of ECG signal by first N basis Hermite functions: a) N=5; b) N=10;

c) N=12; d) N=16.

However, the selection of the number of Hermite functions also needs to be specifically investigated, because if too few are used, the recognition model will lack information, so the results will be inaccurate. If too many are used, the model will become cumbersome.

large computational volume. Investigate the error between the original signal and the first N Hermite functions:

s  tand the signal is approximately

N  1

E  s ( t )  c i i ( t )

i  0

E 

s ( t )  b

(3.16)

Error E

4 8 12 14 16

Number of Hermite functions N

Figure 3.9. Graph of approximation error according to the number of basis Hermite functions

The E error survey charts show that when the number of Hermite functions used to analyze the electrocardiogram signal increases, the E error decreases, meaning it more closely matches the original signal. In Figure 3.9, the error result corresponding to the number of Hermite functions is N= 16, approaching min.

If we continue to increase the number of Hermite functions, the error will not decrease much further, so the thesis proposes to use the case N=16.

Continuing the investigation when expanding six different heart rates according to 16 Hermite functions, shown in Figure 3.10, the Hermite function still approximates quite well.

Ventricular arrhythmia (E)

Left bundle branch block (L)

Right bundle branch block (R)

Ventricular premature beats (V) Ventricular fibrillation (I) Atrial premature beats (A)

Figure 3.10. Image of the expansion of other types of violet rhythms according to the first 16 Hermite functions

c) Create characteristic vectors of ECG electrocardiogram signals

, c 15 , RR last , RR mean  

- According to the ECG signal feature extraction process in figure 2.4 in section

2.3, the feature vector consists of 18 components:

x   c 0 ,

of each beat (QRS complex)

- 16 expansion coefficients c i

 i  0  15 of the ECG signal according to the functions

Hermite as equation 3.9;

- 2 time domain characteristics of the electrocardiogram signal, are

RR last distance

between two consecutive R vertices (also called the RR distance), and

of the last 10 RR distances.

3.2. Building simple recognition models

3.2.1. Process of building simple models

RR mean mean value

In the thesis, the author uses four single recognition models, which are MLP, TSK, SVM and RF models. As presented above, the reasons for choosing these single models are:

- In opinion synthesis systems, the number of unit blocks should usually be greater than two.

(to avoid the case where two blocks give opposite results and then don't know which result to follow)

The three - block identification system is sufficient and not too complex for the experiments in the thesis;

- The selected networks are all classic networks, which have been used in many signal processing applications in general and in recognition problems in particular;

Next, we will briefly introduce the structure and algorithm for building networks based on the learning processes of these four basic networks.

3.2.2. Building MLP network model

Choose the structure of an MLP network with one hidden layer, structured as in [6].

The training (learning) process for an MLP neural network is as follows:

- First, initialize the initial values of the MLP neural network such as the weight matrix W , V and the number of hidden neurons M. Gradually increase the number of hidden neurons M from 1 until the desired accuracy is achieved, the thesis finds M in the range from 1 to 30;

- Next, is the adaptive adjustment process (learning process) to adjust the coupling weights between layers of the MLP network, which are the weight matrix W , the weight matrix V , in the thesis using the maximum reduction step algorithm;

- Check the arithmetic error, if it meets the requirement, stop, otherwise go back to the first step to increase the number of hidden neurons. The training process stops when the error meets the set requirement;

- Check the trained model with the test sample set, evaluate the error.

3.2.3. Building TSK network model

Similar to the MLP neural network, with the neural network, the number of hidden neurons M is increased, and with the TSK fuzzy logic network, the number of rules is found from 1 to 25, the structure of the TSK network is as in [5, 6], specifically:

- Initialize the initial values of linear and nonlinear parameters, set the starting rule number;

- Adjusting linear and nonlinear parameters, the thesis uses an iterative algorithm using the maximum descent step of the gradient function;

- Check the arithmetic error, if it meets the requirements, stop, otherwise go back to step 1 and increase the number of rules. The training process stops when the error meets the requirements;

- Check the trained model with the test sample set, evaluate the error.

3.2.4. Building a support vector model SVM

The process of building a recognition model using support vector machine SVM is different from the two models above. As presented above, because the SVM model is only a binary classification (two classes) , if for multi-class classification problems (greater than two classes), it is necessary to improve it into multi-class SVM models. Currently, there are many methods to build multi-class SVM models. In this thesis, the one-on-one method is chosen.

(Onevesus–One), that is, if N classes are to be classified, all models need to be built.

Binary SVM, they are connected together and through the voting method to evaluate the final classification result, the class with the most votes will be selected as the prediction result.

3.2.5. RF random forest model

The final single recognition model used in the thesis is the random forest model RF (Random Forest) developed by L. Breiman (2001) [19]. RF and SVM are two fast learning algorithms that are resistant to noise. The main construction steps are:

- Constructing a random forest RF (depicted in Figure 3.11) generates a set of M unpruned decision trees, each built on a bootstrap sample set (random sampling with replacement) ;

Figure 3.11. The process of building component decision trees

Tree 1

Study/test data set

Tree 2

Tree M

y 1

y 2

Study/test results

y M

Result aggregation block (voting method)

- The synthesis of recognition results from M popular decision trees uses crowd voting method to give the final result, as shown in Figure 3.12.

Figure 3.12. Testing process of the RF random forest model

3.3. Neural network coordination to recognize ECG signals using decision tree model

3.3.1. Proposed combination model

Figure 3.13 shows the general diagram of the combined model using multiple single recognition models , where M − number of single recognition models, x in − input ECG signal, P i − are preprocessing and feature extraction blocks, C i − classification blocks, z − final recognition result corresponding to input signal x in .

Figure 3.13. General diagram of the combined model using multiple single recognition models

In general, the basic recognition models work independently of the input ECG signal x in which can be from different leads, the preprocessing and feature extraction blocks P i use different methods. As presented in the introduction, the research orientation of the thesis is to use a common preprocessing and feature extraction method  P 1  P 2  ...  P M  for single recognition models C i .

If the problem is to identify K different types of heartbeats, then each single recognition model C i (with i=1, 2,…, M) will have M results y i (with i=1, 2,…, M) represented as vectors (formula 3.1), an ideal vector y i when one value is '1' and all the remaining values are '0', but usually their values often fluctuate in the range [0, 1]. In the thesis, the output results y i from the basic recognition models are combined into a total vector Y (of size M- K) as (3.2) and further processed in the result combination block to draw the final conclusion that the vector z (of size K ) corresponds to the codes of K different types of heartbeats.

- Result of single recognition model C i (with i=1, 2,…, M) :

y i  y i1 y i2 ... y iK  (3.17)

- Synthetic vector:

Y = [y 1 y 2 ... y M ]

= [y 11 y 12 ...y 1K y 21 y 22 ...y 2K ... y M1 y M2 ...y MK ] (3.18)

Some popular result combination solutions have been proposed by many other studies: Majority voting [11, 16], weighted voting [28, 34], aggregation according to Bayesian conditional probability [28]... In general, these solutions are quite simple, stable, and effective.

Comment