DATA BALANCING METHODS BY FUZZY ROUGH SETS

Tran Thanh Huyen

Các tác giả

Tran Thanh Huyen (Tác giả đại diện) Trường Đại học Công nghệ, Đại học Quốc gia Hà Nội

Từ khóa:

Lý thuyết tập hợp thô; Tập thô mờ; Tính toán hạt; Trích chọn đối tượng.

Tóm tắt

Sự hiệu quả của lý thuyết tập hợp thô trong việc làm sạch dữ liệu đã được chứng minh trong nhiều nghiên cứu. Gần đây, tập thô mờ cũng thực hiện xử lý dữ liệu mất cân bằng bằng hai cách tiếp cận. Đầu tiên là sự kết hợp của các phương pháp cân bằng và trích chọn đối tượng thô mờ. Phương pháp thứ hai cố gắng sử dụng các tiêu chí khác nhau để làm sạch các lớp đa số và thiểu số trong dữ liệu mất cân bằng. Công việc này là một phần mở rộng của phương pháp thứ hai đã được trình bày trong [15]. Bài báo mô tả nghiên cứu đầy đủ về phương pháp thứ hai với một số thuật toán được đề xuất. Nó tập trung chủ yếu vào phân loại nhị phân với kNN và SVM cho dữ liệu không cân bằng. Các thử nghiệm và so sánh giữa các phương pháp có liên quan sẽ xác nhận ưu điểm và xu hướng của từng phương pháp về độ chính xác của hiệu suất và mức tiêu thụ thời gian.

Tài liệu tham khảo

[1]. Jesus Alcala-Fdez, Alberto Fernandez, Julian Luengo, Joaquin Derrac, and Salvador Garcia. Keel data-mining software tool: “Data set repository, integration of algorithms and experimental analysis framework.” Multiple-Valued Logic and Soft Computing, 17(2-3):255–287, 2011.

[2]. K. Bache and M. Lichman. “UCI Machine Learning Repository”, 2013.

[3]. Yaile Caballero, Rafael Bello, Delia Alvarez, Maria M. Gareia, and Yaimara Pizano. “Improving the k-nn method: Rough set in edit training set”. In John Debenham, editor, Professional Practice in Artificial Intelligence, volume 218 of IFIP International Federation for Information Processing, pages 21–30. Springer US, 2006.

[4]. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. “Smote: Synthetic minority over-sampling technique”. Journal of Artificial Intelligence Research, 16:321–357, 2002.

[5]. Chris Cornelis, Nele Verbiest, and Richard Jensen. “Ordered weighted average based fuzzy rough sets”. In Rough Set and Knowledge Technology - 5th International Conference, RSKT 2010, Beijing, China, October 15- 17, 2010. Proceedings, pages 78–85, 2010.

[6]. Corinna Cortes and Vladimir Vapnik. “Support-vector networks”. Machine Learning, 20(3):273–297, 1995.

[7]. T. Cover and P. Hart. “Nearest neighbor pattern classification”. IEEE Trans. Inf. Theor., 13(1):21–27, September 1967.

[8]. Didier Dubois and Henri Prade. “Rough fuzzy sets and fuzzy rough sets”. In International Journal of General Systems, volume 17, pages 191–209. 1990.

[9]. Didier Dubois and Henri Prade. “Putting rough sets and fuzzy sets together. In Roman Slowinski”, editor, Intelligent Decision Support, volume 11 of Theory and Decision Library, pages 203–232. Springer Netherlands, 1992.

[10]. Friedman, M. “The use of ranks to avoid the assumption of normality implicit in the analysis of variance”. Journal of the American Statistical Association, 32(200):675–701, 1973.

[11]. Grzymala-Busse, J. W., Clark, P. G., and Kuehnhausen, M. “Generalized probabilistic approximations of incomplete data”. International Journal of Approximate Reasoning, 55(1, Part 2):180 – 196. Special issue on Decision-Theoretic Rough Sets, 2014.

[12]. Jin Huang and C.X. Ling. “Using auc and accuracy in evaluating learning algorithms”. Knowledge and Data Engineering, IEEE Transactions on, 17(3):299 310, March 2005.

[13]. R. Jensen and C. Cornelis. “Fuzzy-rough instance selection”. In Fuzzy Systems (FUZZ), 2010 IEEE International Conference on, pages 1–7, July 2010.

[14]. Marzena Kryszkiewicz. “Rough set approach to incomplete information systems”. Inf. Sci., 112(1-4):39–49, December 1998.

[15]. Victoria Lopez, Alberto Fernandez, Salvador Garcia, Vasile Palade, and Francisco Herrera. “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics”. Information Sciences, 250(0):113 – 141, 2013.

[16]. Do Van Nguyen, Keisuke Ogawa, Kazunori Matsumoto, and Masayuki Hashimoto. “Editing training sets from imbalanced data using fuzzyrough sets”. In IFIP Advances in Information and Communication Technology, volume 458, pages 115–129, France, 2015.

[17]. Do Van Nguyen, Koichi Yamada, and Muneyuki Unehara. “Extended tolerance relation to define a new rough set model in incomplete information systems”. Advances in Fuzzy Systems, 2013. Article ID 372091.

[18]. Do Van Nguyen, Koichi Yamada, and Muneyuki Unehara. “On probability of matching in probability based rough set definitions”. In IEEESMC2013, pages 449–454, Manchester, The UK, 2013.

[19]. Nguyen, D. V., Yamada, K., and Unehara, M. “Rough set approach with imperfect data based on dempster-shafer theory”. Journal of Advanced Computational Intelligence and Intelligent Informatics, 18(3):280–288, 2014.

[20]. Nguyen, H. S. “Discretization problem for rough sets methods”. In Proceedings of the First International Conference on Rough Sets and Current Trends in Computing, RSCTC ’98, pages 545–552, London, UK, UK. Springer-Verlag, 1998.

[21]. Zdzislaw Pawlak. “Rough sets”. International Journal of Computer and Information Sciences, 11:341–356, 1982.

[22]. Zdzislaw Pawlak. “Rough Sets”. Theoretical Aspects of Reasoning about Data. Kluwer Acad., 1991.

[23]. Anna Maria Radzikowska and Etienne E. Kerre. “A comparative study of fuzzy rough sets”. Fuzzy Sets Syst., 126(2):137–155, March 2002.

[24]. Enislay Ramentol, Yaile Caballero, Rafael Bello, and Francisco Herrera. SMOTE-RSB *: “A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory”. Knowl. Inf. Syst., 33(2):245–265, 2011.

[25]. Enislay Ramentol, Nele Verbiest, Rafael Bello, Yaille Caballero, Chris Cornelis, and Francisco Herrera. Smote-frst: “A new resampling method using fuzzy rough set theory”. In Cengiz Kahraman, Etienne Kerre, and Faik Tunc Bozbura, editors, World Scientific Proceedings Series on Computer Engineering and Decision Making, volume 7, pages 800–805. World Scientific, 2012.

[26]. Enislay Ramentol, Sarah Vluymans, Nele Verbiest, Yaille Caballero, Rafael Bello, Chris Cornelis, and Francisco Herrera. Ifrowann: “Imbalanced fuzzy rough ordered weighted average nearest neighbor classification”. In IEEE Transaction on Fuzzy System, volume 23, 2012.

[27]. Verbiest, N. Multi threshold frps: “A new approach to fuzzy rough set prototype selection”. In RSCTC 2014, LNAI, volume 8536, pages 83–91, 2014.

[28]. Nele Verbiest, Chris Cornelis, and Francisco Herrera. Frps: “A fuzzy rough prototype selection method”. Pattern Recognition, 46(10):2770 – 2782, 2013.

[29]. Nele Verbiest, Enislay Ramentol, Chris Cornelis, and Francisco Herrera. “Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection”. Appl. Soft Comput., 22:511–517, 2014.

[30]. Sarah Vluymans, Danel Sanchez Tarrago, Yvan Saeys, Chris Cornelis, and Francisco Herrera. “Fuzzy rough classifier for class imbalanced multi-instance data”. Pattern Recognition, 53:36–45, 2016.

[31]. Wilcoxon, F. “Individual comparisons by ranking methods”. Biometrics Bulletin, 1(6):80–83, 1945.

[32]. Ronald R. Yager. “On ordered weighted averaging aggregation operators in multicriteria decisionmaking”. IEEE Trans. Syst. Man Cybern., 18(1):183–190, January 1988.

[33]. Y. Y. Yao. “Combination of rough and fuzzy sets based on -level sets”. In Rough sets and data mining: Analysis for imprecise data, pages 301– 321. Kluwer Academic, 1997.

[34]. Hans-Jurgen Zimmermann. “Fuzzy Set Theory and its Applications”. Springer, 2001.

PHƯƠNG PHÁP CÂN BẰNG DỮ LIỆU BẰNG TẬP HỢP THÔ MỜ

Các tác giả

Từ khóa:

Tóm tắt

Tài liệu tham khảo

Tải xuống

Đã Xuất bản

Cách trích dẫn

Số

Chuyên mục

ISSN: 1859-1043

Ngôn ngữ

Gửi bài mới

Indexed by

Thông tin

Visitors

GTM