Quality Evaluation Criteria of ECG Signal Recognition Model


Table 3.6. Recognition results of Example 2


ID

Decision tree recognition results

25

1

26

1

27

2

28

2

29

3

30

3

Maybe you are interested!

3.4. Conclusion of chapter 3

In the first part, the author briefly presented the structure of the networks and learning methods of four classic single recognition models, which are multi-layer feedforward neural network MLP, fuzzy logic neural network TSK, support vector machine SVM and random forest RF.

In the next part, the author presents a single recognition model using popular neural networks to recognize electrocardiogram signals such as MLP, TSK, SVM and RF neural networks and builds a coordinated recognition model of the above neural networks to increase the accuracy of the recognition problem.

The next chapter will present the computational results of the proposed recognition methods in chapter 3.


Chapter 4. CALCULATION AND SIMULATION RESULTS


Chapter 4 presents the method of constructing a sample dataset from the MIT-BIH and MGH/MF databases; with this sample dataset, the author uses single neural network models and combined neural network models to identify ECG electrocardiogram signals by simulating on the computer, then evaluates the results to demonstrate the proposed solution of the thesis.

4.1. Building sample data sets

4.1.1. MIT-BIH Database

In this thesis, we use sample ECG signals taken from the famous MIT-BIH arrhythmia database [15] (which can be downloaded from www.physionet.org). There have been many works on ECG signal classification using this database to build and test recognition models, such as in studies [5, 20, 25, 26, 27, 28, 29, 32].

For the purpose of comparison with previous works, this thesis will use the same data set as in studies [5, 28], specifically the identification of arrhythmias originating from the basis set of QRS segments of electrocardiogram signals from 19 patients (database codes are 100, 105, 106, 109, 111, 114, 116, 118, 119, 124,

200, 202, 207, 208, 209, 212, 214, 221 and 222). Use lead I recording for all selected recordings. The selected patients often had more than one arrhythmia in their recordings, the worst case being patient 207 having all 7 types of arrhythmias, details are presented in the appendix.

In the main test of the thesis, 7 types of arrhythmias were considered: Left bundle branch block (L), Right bundle branch block (R ), Atrial premature beat (A), Ventricular premature beat ( V), Ventricular flutter wave (I) , Ventricular escape beat (E) and normal sinus rhythm (N). Because the disease samples were not evenly distributed in the records, most of the patients had normal rhythm N, diseased rhythms appeared very rarely such as ventricular fibrillation I, ventricular escape beat E. To build data sets with diseased patterns that could appear in the same patient (as in the case of patient 207, all six types of diseased rhythms appeared , so in the thesis, all rhythms of all records will not be taken as a database. In the records of the 19 patients above, 6678 samples were taken and divided into two data sets: 3611 samples. for the learning process (to build the model) and 3068 samples used for testing purposes (to check the reliability). The detailed number of samples taken from the records of 19 patients is detailed in Table 4.1.


Table 4.1. Distribution of the number of training samples and test samples of 7 types of arrhythmias from the MIT-BIH database


Rhythm type

Total number of samples

Number of samples

Number of test samples

N

2000

1065

935

L

1200

639

561

R

1000

515

485

A

902

504

398

V

964

549

451

I

472

271

201

E

105

68

37

Total

6643

3611

3068

Table 4.2. Table dividing the number of study samples and test samples of the 2 types of rhythms


Rhythm type

Total number of samples

Number of samples

Number of test samples

Normal

2000

1065

935

Abnormal

4643

2546

2133

Total

6643

3611

3068

In Table 4.1 is the number of heart rate types used in the learning and testing of the recognition model. The limited number of heart rate types (e.g. E, I beats) is due to the limitation in the MIT-BIH database [15]. The original number of normal heart rates N is very high, but we consider these as simple beats for classification and to make the final results generally more independent of the number of samples in each group, the author limits the number of heart rates used in the experiments to a reasonable level usually around 1000 beats.

To generate the feature vector and corresponding output signal from the ECG signal records in the MIT-BIH database, we follow the following procedure:

- QRS complex separation: Because in the MIT-BIH database, ECG signals have been marked with R peak positions and have been labeled with disease classification by specialized doctors, so for each QRS complex sample, in this case, there is no need to use the R peak determination algorithm as presented in section 2.2, but only need to sequentially read the R peak positions of the QRS complex in the ECG signal line, separate the QRS complex by cutting a 250ms window around the R peak (125ms before and 125ms after the R peak position).

1

- Expand the QRS complex extracted above according to the Hermite polynomials according to formula 4.1 to determine the first 16 expansion coefficients as characteristics.

n ( x )

2 nn ! 2e

x 2

2 H n ( x )


(4.1)


- Determine the RR distance from the considered R peak to the previous R peak to make the 17th feature. The average value of the last 10 RR segments will be the 18th feature of the considered QRS complex.

Thus, each QRS complex is separated into 18 characteristics, the output is the code of the disease type of the rhythm under consideration (already marked in the MIT-BIH database ), for example with 7 different rhythm types the corresponding output will be 7 channels with values ​​0 and 1 (6 channels are 0 and the channel with the code corresponding to the disease type will have value 1).

The following is a superimposed image of the samples of 6 types of disease rhythms with the number as shown in Table 4.1. From Figure 4.1, it can be seen that the major obstacle in recognizing ECG signals is the huge variation in amplitude and shape of heart beats belonging to the same disease type [23]. Moreover, some types of rhythms belonging to different diseases can also be similar in morphology . Therefore, we can see that the problem of classifying ECG signals is a difficult problem.


Atrial premature beats – A Ventricular premature beats – E


Left bundle branch block –L Right bundle branch block –R


Ventricular Fibrillation – I Ventricular Premature Contractions – V

Figure 4.1. QRS complex patterns of rhythms A, E, L, R, I and V

4.1.2. MGH/MF Database

In addition, the thesis also uses the second database set, MGH/MF [35, 36], this database set includes 250 records of ECG signals, collected from 250 cardiovascular patients in intensive care rooms, operating rooms, cardiac catheterization laboratories, etc. at Massachusetts General Hospital. The thesis chooses to use ECG signal samples of 20 records with code numbers: 029, 030, 058, 105, 106, 107, 108, 110, 111, 114, 117,

119, 121, 123, 124, 125, 128, 131, 137, 142, taking out a total of 4500 samples with 3 types of rhythm: Normal (N -Normal sinus rhythm), premature ventricular contraction (V-Premature ventricular contraction) and supraventricular premature beat (S-Supraventricular premature beat). The detailed number of samples used is listed in Table 4.3 and Table 4.4 below:

Table 4.3. Table of division of number of study samples and test samples of 3 types of rhythms


Rhythm type

Total number of samples

Number of samples

Number of test samples

N

3000

1997

1003

S

750

502

248

V

750

501

249

Total

4500

3000

1500

Table 4.4. Table dividing the number of study samples and test samples of the 2 types of rhythms


Rhythm type

Total number of samples

Number of samples

Number of test samples

Normal

(normal)

3000

1997

1003

Abnormal

(abnormal)

1500

1003

497

Total

4500

3000

1500


Ventricular extrasystole – V Supraventricular arrhythmia – S Normal – N

Figure 4.2. QRS complex pattern of V, S, N rhythms

To generate the feature vectors and corresponding output signals from the ECG signal records in the MGH/MF database, we proceed similarly to the MIT-BIH database.

4.2. Criteria for evaluating the quality of the ECG signal recognition model

The recognition models (single and combined models) are evaluated using the following criteria:

- Number of incorrectly identified samples;

- FN ( False Negative ): Number of false negative diagnoses, that is, cases where the patient has the disease but is misdiagnosed as normal;

- TN (True Negative ): Number of correct negative diagnoses;

- FP (False Positive): Number of false positive diagnoses, meaning cases where the patient is normal but diagnosed with the disease;

- TP (True Positive ): Number of cases with correct positive diagnosis;

- Sensitivity (True Positive Rate) : The rate of correct positive diagnosis,


Sensitivity

TP TP FN


.100%


(4.2)

- Specificity (True Negative Rate): Rate of correct negative diagnosis


Specificity

TN TN FP


.100%

(4.3)

The recognition model has high quality and reliability when:

- Number of falsely identified samples, number of false positive diagnoses FP, number of false negative diagnoses FN low;

- Correct positive diagnosis rate Sensitivity , correct negative diagnosis rate

The higher the specificity ;

The actual requirements for the quality of ECG signal recognition models are very high. In the criteria for evaluating the model quality listed above, pay attention to the FN parameter (number of false negative diagnoses). In practice, if FN is high, it will cause


dangerous for people because the recognition model will not detect the disease when in fact there is a disease, so if the recognition model has a lower FN, the quality of the model is evaluated to increase.

4.3. Single recognition model results

4.3.1. On the MIT-BIH database

a) Test on the dataset in table 4.1

7 types of beat recognition, MLP, SVM, TSK and RF base recognition models

trained independently on the same training dataset , with the following results:

- MLP Network

The structure is as in [6], 18 input neurons (corresponding to 18 values ​​of the feature vector), 20 hidden neurons, there are 7 output neurons (corresponding to 7 types of arrhythmia),

- TSK Network

The selection structure is as in [5, 6], with 21 inference rules and 7 outputs;

- Support Vector Machine SVM

The selection structure is like [31], the SVM model classifies 7 classes, using the method

OVO should have 21 binary SVM components;

- RF random forest

Following the method of L. Breiman (2001) [19], the RF random forest has 100 component decision trees.

The test results of the basic recognition models on the same test data set in table 4.1, the results are presented in matrix form as in tables 4.5 to 4.9: In which the columns are the standard results (pre-marked), the rows are the recognition results, the main diagonal of the matrix is ​​the number of correctly recognized samples, the values ​​outside the main diagonal are the incorrectly recognized samples, the specific results are as follows:

Table 4.5. Distribution matrix of recognition results of 7 types of rhythm patterns using MLP network


Sample

Result

N

L

R

A

V

I

E

N

905

5

4

5

4

0

0

L

5

536

0

0

2

0

1

R

3

0

476

5

1

0

1

A

17

6

2

382

6

4

0

V

3

9

1

5

429

2

0

I

2

3

2

0

7

195

0

E

0

2

0

1

2

0

35

Total error

30

25

9

16

22

6

2


Table 4.6. Distribution matrix of recognition results of 7 types of rhythm patterns using TSK network


Sample

Result

N

L

R

A

V

I

E

N

920

8

2

9

4

0

0

L

0

518

0

1

3

0

0

R

4

0

481

12

1

0

0

A

9

19

0

375

2

2

0

V

2

14

1

1

439

0

0

I

0

0

1

0

3

199

0

E

0

2

0

0

0

0

37

Total error

15

43

4

23

13

2

0

Table 4.7. Distribution matrix of recognition results of 7 types of rhythm patterns using SVM network


Sample

Result

N

L

R

A

V

I

E

N

919

7

1

7

0

0

0

L

2

546

1

1

3

3

1

R

1

0

479

4

0

0

0

A

11

2

2

385

0

0

0

V

2

5

1

1

448

1

0

I

0

1

1

0

0

197

0

E

0

0

0

0

0

0

36

Total error

16

15

6

13

3

4

1

Table 4.8. Distribution matrix of the results of identifying 7 types of rhythm patterns using RF


Sample Results

N

L

R

A

V

I

E

N

914

6

3

11

0

0

0

L

1

547

0

0

4

0

1

R

1

0

478

4

0

0

0

A

18

4

1

382

3

0

0

V

1

3

1

1

443

3

0

I

0

1

2

0

1

198

0

E

0

0

0

0

0

0

36

Comment


Agree Privacy Policy *