Using decision trees to classify imbalanced data

Table 4-17: Contraceptive Method Choice Dataset Results

Classifier

TPR	FPR	AUC
SC4.5	0.225	0.076	0.574
CSC4.5	0.333	0.092	0.621
AUC4.5	TPR mean = 0.661	FPR mean = 0.430	AUC mean = 0.616

Maybe you are interested!

Neural Network Coordination for ECG Signal Recognition Using Decision Tree Model
Pre-tax Profit of Bidv Tien Giang in the Period 2011-2015 zt2i3t4l5ee zt2a3gsnon-credit services, joint stock commercial bank zt2a3ge zc2o3n4t5e6n7ts At that time, the Branch had to set aside a provision for credit risks, which reduced the Branch's income. Chart 2.2. Pre-tax profit of BIDV Tien Giang in the period 2011-2015 Unit: Billion VND 140 120 100 80 60 40 20 0 63.3 80.34 89.29 110.08 131.99 2011 2012 2013 2014 2015 Profit before tax (Source: Report on the implementation of the annual business plan of the General Planning Department of BIDV Tien Giang [24]) However, through chart 2.2, it can be seen that BIDV Tien Giang's profit is still increasing continuously, and its operating efficiency is currently leaking. This is a contribution of non-credit services, and this service segment will be increasingly focused on growth by BIDV Tien Giang to ensure the highest profit safety because credit activities have many potential risks. At the same time, focusing on developing non-credit services is consistent with one of the contents of restructuring the financial activities of credit institutions in the project "Restructuring the system of credit institutions in the period 2011-2015" approved by the Prime Minister in Decision No. 254/QD-TTg dated March 1, 2012 [14]: "Gradually shifting the business model of commercial banks towards reducing dependence on credit activities and increasing income from non-credit services". 2.2. Current status of non-credit service development at BIDV Tien Giang. 2.2.1. BIDV Tien Giang has deployed the development of non-credit services in recent times. Along with the development of the Head Office, BIDV Tien Giang's products and services are constantly improved and deployed in a diverse manner to ensure provision for many different customer groups in the area: individual customers, corporate customers, and financial institutions. Typical services are as follows: Payment services, treasury services, guarantee services, card services, trade finance, other services: Western Union, insurance commissions, consulting services, foreign exchange derivatives trading, e-banking services,... 2.2.1.1. Payment services: In accordance with the Prime Minister's Project to promote non-cash payments in Vietnam [15], banks in Tien Giang province have continuously developed payment services to reduce customers' cash usage habits through card services and electronic banking services such as: salary payment through accounts, focusing on developing card acceptance points, developing multi-purpose cards, paying social insurance by transfer, paying bills through banks, etc. Chart 2.3. Net income from payment services in the period 2011-2015 Unit: Million VND 6000 5000 4000 3000 2000 1000 0 3922 4065 4720 5084 5324 2011 2012 2013 2014 2015 Net income from payment services (Source: Report on the implementation of the annual business plan of the General Planning Department of BIDV Tien Giang [24]) Along with the technological development of the entire system, BIDV Tien Giang has a payment system with a fairly stable transaction processing speed, bringing many conveniences to customers. The results of observing chart 2.3 show that the income from payment services that the Branch has achieved has grown over the years but the speed is not high and the products are not outstanding compared to other banks. Domestic payment products such as: Online bill payment, electricity bills, water bills, insurance premiums, cable TV bills, telecommunications fees, airline tickets, etc. bring many conveniences to customers. Regarding international payment, this is an indispensable activity for foreign economic activities, BIDV Tien Giang is providing international payment methods for small enterprises producing agriculture, aquatic food and seafood that have credit relationships with banks in industrial parks in Tien Giang province such as: money transfer, collection, L/C payment. 2.2.1.2. Treasury services: BIDV Tien Giang always focuses on ensuring treasury safety and currency security, always complies with legal regulations, and minimizes risks in operations such as: counting and collecting money from customers, receiving and delivering internal transactions, collecting from the State Bank (SBV) or other credit institutions, receiving ATM funds, bundling money, etc. BIDV Tien Giang's treasury service management department is always fully equipped with modern machinery and equipment such as: money transport vehicles, fire prevention tools, money counters, money detectors, magnifying glasses, etc. to ensure absolute safety in treasury operations, immediately identifying real and fake money and other risks that may affect people and assets of the bank and customers. In addition, implementing regulation 2480/QC dated October 28, 2008 between the State Bank of Tien Giang province and the Provincial Police on coordination in the fight against counterfeit money, in the 3-year review of implementation, BIDV Tien Giang discovered, seized and submitted to the State Bank of Tien Giang province 475 banknotes of various denominations and was commended by the Provincial Police and the State Bank of Tien Giang province [17]. Chart 2.4. Net income from treasury services in the period 2011-2015 Unit: Million VND 350 300 250 200 150 100 50 0 105 122 309 289 279 2011 2012 2013 2014 2015 Net income from treasury services (Source: Report on the implementation of the annual business plan of the General Planning Department of BIDV Tien Giang [24]) However, as shown in Figure 2.4, income from treasury operations is not high and fluctuates. Specifically, in the period 2011-2013, net income increased and increased most sharply in 2013, then in the period 2013-2015, there was a downward trend. This fluctuation is due to the fact that fees collected from treasury services are often very low and can even be waived to attract customers to use other services. 2.2.1.3. Guarantee and trade finance services: BIDV Tien Giang, thanks to the advantages of the province and the favorable location of the Branch, has continuously focused on developing income from guarantee services and trade finance. Chart 2.5. Net income from guarantee and trade finance services in the period 2011-2015 Unit: Million VND 14000 12000 10000 8000 6000 4000 2000 0 5193 5695 2742 3420 8889 3992 11604 12206 5143 5312 2011 2012 2013 2014 2015 Net income from guarantee services Net income from Trade Finance (Source: Report on the implementation of the annual business plan of the General Planning Department of BIDV Tien Giang [24]) Through chart 2.5, we can see that BIDV Tien Giang's income from guarantee services and trade finance has grown over the years. The reason is: Among BIDV Tien Giang's corporate customers, the construction industry is the industry with the highest proportion of customers after the trading industry, this is a group of customers with potential to develop guarantee services. The second group of customers is corporate customers in the fields of agricultural production, livestock and seafood processing with high import and export turnover in the area. are the target of trade finance development. In addition, BIDV Tien Giang also focuses on continuously developing these customer groups to increase revenue for many other products and services in the future. 2.2.1.4. Card and POS services: As a service that BIDV Tien Giang has recently developed strongly, it can be said that this is a very potential market and has the ability to develop even more strongly in the future. Card services with outstanding advantages such as fast payment time, wide payment range, quite safe, effective and suitable for the integration trend and the Project to promote non-cash payments in Vietnam. Cards have become a modern and popular payment tool. BIDV Tien Giang early identified that developing card services is to expand the market to people in society, create capital mobilized from card-opened accounts, contribute to diversifying banking activities, enhance the image of the bank, bring the BIDV Tien Giang brand to people as quickly and easily as possible. BIDV Tien Giang is currently providing card types such as: credit cards (BIDV MasterCard Platinum, BIDV Visa Gold Precious, BIDV Visa Manchester United, BIDV Visa Classic), international debit cards (BIDV Ready Card, BIDV Manu Debit Card), domestic debit cards (BIDV Harmony Card, BIDV eTrans Card, BIDV Moving Card, BIDV-Lingo Co-branded Card, BIDV-Co.opmart Co-branded Card). These cards can be paid via POS/EDC or on the ATM system. In addition, with debit cards, customers can not only withdraw money via ATMs but also perform utilities such as mobile top-up, online payment, money transfer,... through electronic banking services. In order to attract customers with card services, BIDV Tien Giang has continuously increased the installation of ATMs. As of December 31, 2015, BIDV Tien Giang has 23 ATMs combined with 7 ATMs in the same system of BIDV My Tho, so the number of ATMs is quite large, especially in the center of My Tho City, but is not yet fully present in the districts. Basic services on ATMs such as withdrawing money, checking balances, printing short statements,... BIDV ATMs accept cards from banks in the system. Banknetvn and Smartlink, cards branded by international card organizations Union Pay (CUP), VISA, MasterCard and cards of banks in the Asian Payment Network. From here, cardholders can make bill payments for themselves or others at ATMs, by simply entering the subscriber number or customer code, booking code that service providers notify and make bill payments. Chart 2.6. Net income from card services in the period 2011-2015 Unit: Million VND 3500 3000 2500 2000 1500 1000 500 0 687 1023 1547 2267 3104 2011 2012 2013 2014 2015 Net income from card services (Source: Report on the implementation of the annual business plan of the General Planning Department of BIDV Tien Giang [24]) Through chart 2.6, it can be seen that BIDV Tien Giang's card service income is constantly growing because the Branch focuses on developing businesses operating in industrial parks, which are the source of customers for salary payment products, ATMs, BSMS. Specifically, there are companies such as Freeview, Quang Viet, Dai Thanh, which are businesses with a large number of card openings at the Branch, contributing to the increase in card service fees [25]. Table 2.6. Number of ATMs and POS machines in 2015 of some banks in Tien Giang area. Unit: Machine STT Bank name Number of ATMs Cumulative number of ATM cards POS machine 1 BIDV Tien Giang 23 97,095 22 2 BIDV My Tho 7 21,325 0 3 Agribank Tien Giang 29 115,743 77 4 Vietinbank Tien Giang 16 100,052 54 5 Dong A Tien Giang 26 97,536 11 6 Sacombank Tien Giang 24 88,513 27 7 Vietcombank Tien Giang 15 61,607 96 8 Vietinbank - Tay Tien Giang Branch 6 46,042 38 (Source: 2015 Banking Activity Data Report of the General and Internal Control Department of the Provincial State Bank [21]) Through table 2.6, the author finds that the number of ATMs of BIDV Tien Giang is not much, ranking fourth after Agribank Tien Giang, Dong A Tien Giang, Sacombank Tien Giang. The number of POS machines of BIDV Tien Giang is very small, only higher than Dong A Tien Giang and BIDV My Tho in the initial stages of merging the BIDV system. Besides, BIDV Tien Giang has a high number of cards increasing over the years (table 2.7) but the cumulative number of cards issued up to December 31, 2015 is still relatively low compared to Agribank, Vietcombank, Dong A (table 2.6). div.maincontent .content_head3 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent p { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; margin:0pt; } div.maincontent .s1 { color: black; font-family:"Courier New", monospace; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s2 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 13pt; } div.maincontent .s3 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s4 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s5 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s6 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s7 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 13.5pt; } div.maincontent .s8 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; } div.maincontent .s9 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: -2pt; } div.maincontent .s10 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: 5pt; } div.maincontent .s11 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: -5pt; } div.maincontent .s12 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: -3pt; } div.maincontent .s13 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9pt; vertical-align: -4pt; } div.maincontent .s14 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 7.5pt; } div.maincontent .s15 { color: black; font-family:"Times New Roman", serif; font-style: italic; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s16 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; } div.maincontent .s17 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 9.5pt; } div.maincontent .s18 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -1pt; } div.maincontent .s19 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -5pt; } div.maincontent .s20 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -2pt; } div.maincontent .s21 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10pt; } div.maincontent .s22 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; } div.maincontent .s23 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -3pt; } div.maincontent .s24 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -5pt; } div.maincontent .s25 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; } div.maincontent .s26 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -4pt; } div.maincontent .s27 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -6pt; } div.maincontent .s28 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -1pt; } div.maincontent .s29 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11.5pt; } div.maincontent .s30 { color: black; font-family:Calibri, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11pt; } div.maincontent .s31 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11pt; } div.maincontent .s32 { color: black; font-family:.VnTime, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 14pt; } div.maincontent .s33 { color: black; font-family:Cambria, serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; } div.maincontent .s34 { color: black; font-family:Cambria, serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 10.5pt; vertical-align: -4pt; } div.maincontent .s35 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 11.5pt; } div.maincontent .s36 { color: black; font-family:Arial, sans-serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 14pt; } div.maincontent .s37 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: bold; text-decoration: none; font-size: 13pt; } div.maincontent .s38 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 13pt; } div.maincontent .s39 { color: black; font-family:"Times New Roman", serif; font-style: normal; font-weight: normal; text-decoration: none; font-size: 15pt; } div.maincontent .s40 { color: black; font-family:"Times New Roman", serif; font-style: normal; fo
Decision on Term Imprisonment in Case of Accomplices
Research on factors affecting the decision to purchase gypsum board of Huy An private trading enterprise from institutional customers in Ho Chi Minh City - 13
Building a Scale and Research Model of Factors Affecting Customers' Decision to Choose a Bank to Deposit Savings at

The AUC4.5 algorithm gives a slightly lower AUC mean = 0.616 than the SC4.5 algorithm.

and CSC4.5. But with a much higher TPR mean = 0.661, it shows that the AUC4.5 algorithm has classified the minority class more accurately than the SC4.5 and CSC4.5 algorithms, although the results are not high.

In addition, the Contraceptive Method Choice set of attribute values belongs to continuous type. It has a great influence on the classification process. The test result has a standard deviation = 0.02028.

 Tic-Tac-Toe Endgame: Discrete attributes = 9, minority class ratio = 34.62%.

Table 4-18: Results table of 10 tests on Tic-Tac-Toe Endgame dataset

Test times

TPR	FPR	AUC	Variance	Standard deviation
1	0.745	0.105	0.820
2	0.807	0.104	0.851
3	0.708	0.070	0.819
4	0.779	0.098	0.840
5	0.748	0.101	0.823
6	0.776	0.105	0.835
7	0.794	0.112	0.841
8	0.753	0.097	0.828
9	0.785	0.126	0.829
10	0.764	0.151	0.807
	TPR mean =0.766	FPR mean =0.107	AUC mean =0.829	0.00017	0.01285

Source: author's research

Table 4-19: Tic-Tac-Toe Endgame dataset results

Classifier

TPR

FPR

AUC

SC4.5

0.631	0.062	0.784
CSC4.5	0.640	0.062	0.789
AUC4.5	TPR mean = 0.766	FPR mean = 0.107	AUC mean = 0.829

The AUC4.5 algorithm gives better classification results in all three metrics TPR, FPR and AUC on the imbalanced dataset. Again, it is confirmed that the dataset with discrete-valued attributes gives better classification results than the dataset with continuous-valued attributes. The standard deviation = 0.01285 is quite small.

4.5 Evaluation of experimental results

Through experimental results, analysis on eight data sets was tested on D Test set 10 times.

and taking the average results for the TPR mean , FPR mean and AUC mean indices (Table V) and the variance - standard deviation index (Table IV), we have the following comments:

+ The imbalance ratio between classes does not greatly affect the classification results of the proposed algorithm AUC4.5.

+ For data sets with discrete-valued attributes:

- Gives good classification results for minority class on imbalanced dataset.

- In which, all datasets give good classification results, superior to the two algorithms SC4.5 and CSC4.5. In particular, the Car Evaluation and Mushroom datasets have 100% accurate classification results.

- The standard deviation of the two sets Car Evaluation and Mushroom is zero (=0). The deviation of the two sets Nursery and Tac-Tic-Toe Endgame is not large, proving the stability of the algorithm as well as the data belonging to the group of discrete values.

+ For data sets with attributes having continuous values:

- Only the Ecoli dataset has higher classification results than the two algorithms SC4.5 and CSC4.5. However, the standard deviation of the Ecoli dataset is quite high, second only to the Wine Quality dataset.

– Red , indicates that the data type needs to be reviewed regularly.

- The remaining three datasets Wine Quality – Red , Wine Quality – White and Contraceptive Method Choice have higher TPR mean than SC4.5 and CSC4.5 algorithms. If we ignore the FPR mean (misclassifying the majority class into the minority class) to get the AUC mean result

high, the AUC4.5 algorithm has achieved the goal of improving the classification accuracy of

minority class in imbalanced dataset.

- The standard deviation of all four continuous type data sets is the highest among the eight data sets. in the order of 0.02028, 0.02631, 0.03022 and 0.09520. Showing the stability, the data distribution in the continuous data set is the issue to consider.

CHAPTER 5. CONCLUSION AND DEVELOPMENT DIRECTION

In this thesis, the AUC4.5 algorithm is improved from the C4.5 algorithm using the AUC value instead of Gain-entropy in the tree splitting and pruning criteria to improve the classification efficiency of imbalanced data, specifically on the minority class, suitable for binary imbalanced classification. Experimental results evaluated on eight real imbalanced data sets from the UCI machine learning repository [28] have shown that the improved AUC4.5 algorithm gives better classification efficiency than the SC4.5 and SCS4.5 algorithms. This confirms the importance of using the AUC value directly in training in data sets that affect the classification process. In particular, the improved method does not sacrifice the FPR value to increase the TPR value to achieve the highest AUC value.

The proposed method does not need to set different costs such as misclassification cost as in cost-sensitive learning method, so the training time is less but the classification performance is better.

The method improves the correct classification rate on the minority class in the imbalanced data set. However, continuous-valued data is also an issue that needs to be considered and processed before being classified when applied on the AUC4.5 algorithm.

With the results achieved by the algorithm. If the system is applied to medical diagnostic applications, it will improve the diagnostic efficiency, if applied to the field of intrusion and attack detection, it will improve the efficiency of system monitoring. However, at present, there is no method that is completely optimal for all real data sets and in the data mining industry, this is accepted. Based on the research and the results achieved, we realize that there are many issues that need to be further researched and developed to contribute to the field of imbalanced data classification in particular and the field of data mining in general.

REFERENCES

[1] JR Quinlan, “Induction of Decision Trees,” Mach. Learn. , vol. 1, no. 1, pp. 81–106, 1986.

[2] J. Han, M. Kamber, and J. Pei, Data mining : Concepts and Techniques . Elsevier/Morgan Kaufmann, 2012.

[3] I. H. Witten, E. Frank, and M. a. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Third Edition , vol. 54, no. 2. 2011.

[4] V. Ganganwar, “An overview of classification algorithms for imbalanced datasets,” Int.

J. Emerg. Technol. Adv. Eng. , vol. 2, no. 4, pp. 42–47, 2012.

[5] Y. Yang and G. Ma, “Ensemble-based active learning for class imbalance problem,” J. Biomed. Sci. Eng. , vol. 03, no. 10, pp. 1022–1029, Oct. 2010.

[6] B. Zadrozny, J. Langford, and N. Abe, “Cost-sensitive learning by cost-proportionate example weighting,” in Third IEEE Int. Conf. on Data Mining , 2003, pp. 435–442.

[7] Y. Tang, S. Krasser, D. Alperovitch, and P. Judge, “Spam Sender Detection with Classification Modeling on Highly Imbalanced Mail Server Behavior Data,” in Proc. of Intl. Conf. on Artificial Int. and Pattern Recognition , 2008, pp. 174–180.

[8] V. Engen, “Machine learning for network based intrusion detection.,” Bounemouth University, 2010.

[9] X. Liu, J. Wu, and Z. Zhou, “Exploratory Under-Sampling for Class-Imbalance Learning,” in Sixth Int. Conf. on Data Mining (ICDM'06) , 2006, pp. 965–969.

[10] S.-J. Yen and Y.-S. Lee, “Cluster-based under-sampling approaches for imbalanced data distributions,” Expt. Syst. with Appl. , vol. 36, no. 3, pp. 5718–5727, Apr. 2009.

[11] N.M. Phuong, TT. Anh Tuyet, NT. Hong, and D. X. Tho, “Random Border Undersampling: A new algorithm to reduce random elements on the border in imbalanced data,” in FAIR - Basic and Applied Research in Information Technology , 2015.

[12] N. Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies,” AAAI wsh. Learn. from imb. data sets , vol. 68, pp. 10–15, 2000.

[13] N. V Chawla, K.W. Bowyer, LO Hall, and W.P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res. , vol. 16, pp. 321–357, 2002.

[14] H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” Springer, Berlin, Heidelberg, 2005, pp. 878– 887.

[15] G. Weiss, K. McCarthy, and B. Zabar, “Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs?,” Dmin , pp. 1–7, 2007.

[16] C. Drummond and RC Holte, “Exploiting the Cost(In)sensitivity of Decisions Tree Splitting Criteria,” Int. Conf. Mach. Learn. , vol. 1, no. 1, pp. 239–246, 2000.

[17] W. Fan, S. Stolfo, J. Zhang, and P. Chan, “AdaCost: Misclassification Cost-Sensitive Boosting,” ’99 Proc. Sixt. Intl. Conf. Mach. Learn. , pp. 97–105, 1999.

[18] Y. Sun, MS Kamel, AKC Wong, and Y. Wang, “Cost-sensitive boosting for classification of imb. data,” Patt. Recog. , vol. 40, no. 12, pp. 3358–3378, 2007.

[19] H. Guo and H.L. Viktor, “Learning from Imbalanced Data Sets with Boosting and Data Generation : The DataBoost-IM Approach,” ACM SIGKD Explor. Newsl. - Spec. issue Learn. from imb. datasets , vol. 6, no. 1, pp. 30–39, 2004.

[20] M. a Maloof, “Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown,” Analysis , vol. 21, no. II, pp. 1263–1284, 2003.

[21] JR Quinlan, “J. Ross Quinlan. C4.5 - Programs for Machine Learning,” Morgan Kaufmann , vol. 5, no. 3. p. 302, 1993.

[22] T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett. , vol. 27, no. 8, pp. 861–874, 2006.

[23] MR Tolun and SM Abu-Soud, “An Inductive Learning Algorithm for Production Rule Discovery,” 1999.

[24] PT Huan and LH Bac, “Mining frequent itemsets from transaction data with multiple minimum frequent thresholds on multi-core processors,” Can Tho Univ. J. Sci. , vol. CN, p. 155, Oct. 2017.

[25] A. Tran, T. Truong, and LH Bac, “Efficiently mining ass. rules based on max. single constraints,” Vietnam J. Comp. Sci. , vol. 4, no. 4, pp. 261–277, Nov. 2017.

[26] D. Nguyen, B. Vo, and LH Bac, “CCAR: An efficient method for mining class association rules with itemset constraints,” Eng. Appl. Artif. Intell. , vol. 37, pp. 115–124, Jan. 2015.

[27] SMA-S. Mehmet R. Tolun, Hayri Sever, Mahmu, Hayri Sever, Mahmut Uludag, “ILA-2: An Inductive Learning Algorithm For Knowledge Discovery,” Cybern. Syst. , vol. 30, no. 7, pp. 609–628, Oct. 1999.

[28] CL Blake and CJ Merz, “UCI Repository of machine learning databases,” Univ. Calif., p. http://archive.ics.uci.edu/ml/, 1998.

[29] J.-S. Lee, J. Lee, and B. Gu, “AUC-based C4.5 decision tree algorithm for imbalanced data classification”, 2016.