Optimizing dataset classification through hybrid grid partition and rough set method for fuzzy rule generation
DOI:
https://doi.org/10.35335/emod.v17i2.22Keywords:
Dataset Classification, Fuzzy Rule Generation, Grid Partition, Hybrid Approach, Rough Set MethodAbstract
This research presents a novel approach for optimizing dataset classification through the integration of a hybrid grid partition and rough set method for fuzzy rule generation. The objective is to improve classification accuracy and interpretability while effectively handling uncertainty in the dataset. The proposed approach combines grid partitioning, rough set theory, and fuzzy logic to identify relevant attributes within each grid cell, generate accurate fuzzy rules, and perform classification based on fuzzy inference. The research demonstrates the improved accuracy of the hybrid approach compared to traditional methods, along with enhanced interpretability of the generated fuzzy rules. The scalability and generalizability of the approach are validated through its application to a case example in customer churn prediction in the telecommunications industry. However, certain limitations, such as the selection of the partitioning scheme, computational complexity, and handling of missing data, need to be considered. Further research is required to address these limitations and benchmark the approach against state-of-the-art techniques. The proposed hybrid approach contributes to the field of dataset classification by offering an effective and interpretable methodology for improved classification performance and actionable insights in real-world applications
References
Abd El-Mageed, A. A., Gad, A. G., Sallam, K. M., Munasinghe, K., & Abohany, A. A. (2022). Improved binary adaptive wind driven optimization algorithm-based dimensionality reduction for supervised classification. Computers & Industrial Engineering, 167, 107904.
Aljarah, I., Al-Zoubi, A. M., Faris, H., Hassonah, M. A., Mirjalili, S., & Saadeh, H. (2018). Simultaneous feature selection and support vector machine optimization using the grasshopper optimization algorithm. Cognitive Computation, 10, 478–495.
Baaj, I., & Poli, J.-P. (2019). Natural language generation of explanations of fuzzy inference decisions. 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 1–6.
Bai, Y., & Wang, D. (2006). Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications. Advanced Fuzzy Logic Technologies in Industrial Applications, 17–36.
Bashabsheh, M. Q., Abualigah, L., & Alshinwan, M. (2022). Big data analysis using hybrid meta-heuristic optimization algorithm and MapReduce framework. In Integrating meta-heuristics and machine learning for real-world optimization problems (pp. 181–223). Springer.
Bastarache, L., Brown, J. S., Cimino, J. J., Dorr, D. A., Embi, P. J., Payne, P. R. O., Wilcox, A. B., & Weiner, M. G. (2022). Developing real‐world evidence from real‐world data: Transforming raw data into analytical datasets. Learning Health Systems, 6(1), e10293.
Behera, A. K., Dehuri, S., & Ghosh, A. (2023). Surrogate-Assisted Multi-objective Genetic Fuzzy Associative Classification by Multiple Granularity Measures. 2023 International Conference for Advancement in Technology (ICONAT), 1–9.
Bhatt, R., Ramanna, S., & Peters, J. F. (2009). Software defect classification: A comparative study of rough-neuro-fuzzy hybrid approaches with linear and non-linear SVMs. Rough Set Theory: A True Landmark in Data Analysis, 213–231.
Bibri, S. E. (2021). Data-driven smart eco-cities and sustainable integrated districts: A best-evidence synthesis approach to an extensive literature review. European Journal of Futures Research, 9(1), 1–43.
Bose, I., & Mahapatra, R. K. (2001). Business data mining—a machine learning perspective. Information & Management, 39(3), 211–225.
Buabeng, A., Simons, A., Frempong, N. K., & Ziggah, Y. Y. (2022). Predictive Maintenance Model Based on Multisensor Data Fusion of Hybrid Fuzzy Rough Set Theory Feature Selection and Stacked Ensemble for Fault Classification. Mathematical Problems in Engineering, 2022.
Cai, Y., Li, Q., Lu, G., Ryu, H. S., Li, Y., Jin, H., Chen, Z., Tang, Z., Lu, G., & Hao, X. (2022). Vertically optimized phase separation with improved exciton diffusion enables efficient organic solar cells with thick active layers. Nature Communications, 13(1), 2369.
Che, X., Chen, D., & Mi, J. (2023). Learning instance-level label correlation distribution for multi-label classification with fuzzy rough sets. IEEE Transactions on Fuzzy Systems.
Chen, T., Shang, C., Su, P., Keravnou-Papailiou, E., Zhao, Y., Antoniou, G., & Shen, Q. (2021). A decision tree-initialised neuro-fuzzy approach for clinical decision support. Artificial Intelligence in Medicine, 111, 101986.
Choudhary, K., DeCost, B., Chen, C., Jain, A., Tavazza, F., Cohn, R., Park, C. W., Choudhary, A., Agrawal, A., & Billinge, S. J. L. (2022). Recent advances and applications of deep learning methods in materials science. Npj Computational Materials, 8(1), 59.
Cuzzocrea, A. (2020). Uncertainty and Imprecision in Big Data Management: Models, Issues, Paradigms, and Future Research Directions. Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing, 6–9.
El-Sappagh, S., Ali, F., Ali, A., Hendawi, A., Badria, F. A., & Suh, D. Y. (2018). Clinical decision support system for liver fibrosis prediction in hepatitis patients: A case comparison of two soft computing techniques. IEEE Access, 6, 52911–52929.
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A. Y., Foufou, S., & Bouras, A. (2014). A survey of clustering algorithms for big data: Taxonomy and empirical analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), 267–279.
Fayek, A. R. (2020). Fuzzy logic and fuzzy hybrid techniques for construction engineering and management. Journal of Construction Engineering and Management, 146(7), 4020064.
García, S., Fernández, A., Luengo, J., & Herrera, F. (2009). A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Computing, 13, 959–977.
Garibaldi, J. M., Zhou, S.-M., Wang, X.-Y., John, R. I., & Ellis, I. O. (2012). Incorporation of expert variability into breast cancer treatment recommendation in designing clinical protocol guided fuzzy rule system models. Journal of Biomedical Informatics, 45(3), 447–459.
Gorzałczany, M. B., & Rudziński, F. (2016). A multi-objective genetic optimization for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability. Applied Soft Computing, 40, 206–220.
Hariri-Ardebili, M. A., & Pourkamali-Anaraki, F. (2022). Structural uncertainty quantification with partial information. Expert Systems with Applications, 198, 116736.
Hariri, R. H., Fredericks, E. M., & Bowers, K. M. (2019). Uncertainty in big data analytics: survey, opportunities, and challenges. Journal of Big Data, 6(1), 1–16.
Hassanien, A. E., Abraham, A., Peters, J. F., Schaefer, G., & Henry, C. (2009). Rough sets and near sets in medical imaging: A review. IEEE Transactions on Information Technology in Biomedicine, 13(6), 955–968.
Howlader, K. C., Satu, M. S., Awal, M. A., Islam, M. R., Islam, S. M. S., Quinn, J. M. W., & Moni, M. A. (2022). Machine learning models for classification and identification of significant attributes to detect type 2 diabetes. Health Information Science and Systems, 10(1), 2.
Hu, K.-H., Chen, F.-H., Hsu, M.-F., & Tzeng, G.-H. (2021). Identifying key factors for adopting artificial intelligence-enabled auditing techniques by joint utilization of fuzzy-rough set theory and MRDM technique. Technological and Economic Development of Economy, 27(2), 459–492.
Huang, H.-H., & Kuo, Y.-H. (2010). Cross-lingual document representation and semantic similarity measure: A fuzzy set and rough set based approach. IEEE Transactions on Fuzzy Systems, 18(6), 1098–1111.
Huang, Z., & Li, J. (2022). Noise-Tolerant discrimination indexes for fuzzy ɣ covering and feature subset selection. IEEE Transactions on Neural Networks and Learning Systems.
Hussain, W., Merigó, J. M., Raza, M. R., & Gao, H. (2022). A new QoS prediction model using hybrid IOWA-ANFIS with fuzzy C-means, subtractive clustering and grid partitioning. Information Sciences, 584, 280–300.
Jensen, R., & Shen, Q. (2007). Fuzzy-rough sets assisted attribute selection. IEEE Transactions on Fuzzy Systems, 15(1), 73–89.
Kaur, I., Doja, M. N., & Ahmad, T. (2022). Data mining and machine learning in cancer survival research: an overview and future recommendations. Journal of Biomedical Informatics, 104026.
Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., & Chouvarda, I. (2017). Machine learning and data mining methods in diabetes research. Computational and Structural Biotechnology Journal, 15, 104–116.
Khashei, M., Rezvan, M. T., Hamadani, A. Z., & Bijari, M. (2013). A bi‐level neural‐based fuzzy classification approach for credit scoring problems. Complexity, 18(6), 46–57.
Liu, H.-Y., Gao, Z.-Z., Wang, Z.-H., & Deng, Y.-H. (2022). Time Series Classification with Shapelet and Canonical Features. Applied Sciences, 12(17), 8685.
Majeed, A. (2019). Improving time complexity and accuracy of the machine learning algorithms through selection of highly weighted top k features from complex datasets. Annals of Data Science, 6, 599–621.
Megahed, M., & Mohammed, A. (2020). Modeling adaptive E-learning environment using facial expressions and fuzzy logic. Expert Systems with Applications, 157, 113460.
Mijwil, M., Salem, I. E., & Ismaeel, M. M. (2023). The Significance of Machine Learning and Deep Learning Techniques in Cybersecurity: A Comprehensive Review. Iraqi Journal For Computer Science and Mathematics, 4(1), 87–101.
Mitra, S., & Hayashi, Y. (2000). Neuro-fuzzy rule generation: survey in soft computing framework. IEEE Transactions on Neural Networks, 11(3), 748–768.
Mukherjee, S., Gupta, S., Rawlley, O., & Jain, S. (2022). Leveraging big data analytics in 5G‐enabled IoT and industrial IoT for the development of sustainable smart cities. Transactions on Emerging Telecommunications Technologies, 33(12), e4618.
Nanda, N. B., & Parikh, A. (2019). Hybrid approach for network intrusion detection system using random forest classifier and rough set theory for rules generation. Advanced Informatics for Computing Research: Third International Conference, ICAICR 2019, Shimla, India, June 15–16, 2019, Revised Selected Papers, Part II 3, 274–287.
Panda, N., Majhi, S. K., & Pradhan, R. (2022). A hybrid approach of spotted hyena optimization integrated with quadratic approximation for training wavelet neural network. Arabian Journal for Science and Engineering, 47(8), 10347–10363.
Papageorgiou, E. I. (2011). A new methodology for decisions in medical informatics using fuzzy cognitive maps based on fuzzy rule-extraction techniques. Applied Soft Computing, 11(1), 500–513.
Patel, H. R., & Shah, V. A. (2021). General type-2 fuzzy logic systems using shadowed sets: a new paradigm towards fault-tolerant control. 2021 Australian & New Zealand Control Conference (ANZCC), 116–121.
Peña‐Guerrero, J., Nguewa, P. A., & García‐Sosa, A. T. (2021). Machine learning, artificial intelligence, and data science breaking into drug design and neglected diseases. Wiley Interdisciplinary Reviews: Computational Molecular Science, 11(5), e1513.
Sahu, A. K., Sharma, M., Raut, R. D., Sahu, A. K., Sahu, N. K., Antony, J., & Tortorella, G. L. (2023). Decision-making framework for supplier selection using an integrated MCDM approach in a lean-agile-resilient-green environment: Evidence from Indian automotive sector. The TQM Journal, 35(4), 964–1006.
Salo, F., Nassif, A. B., & Essex, A. (2019). Dimensionality reduction with IG-PCA and ensemble classifier for network intrusion detection. Computer Networks, 148, 164–175.
Sarkar, S., Ejaz, N., Maiti, J., & Pramanik, A. (2022). An integrated approach using growing self-organizing map-based genetic K-means clustering and tolerance rough set in occupational risk analysis. Neural Computing and Applications, 34(12), 9661–9687.
Sarker, I. H., Kayes, A. S. M., Badsha, S., Alqahtani, H., Watters, P., & Ng, A. (2020). Cybersecurity data science: an overview from machine learning perspective. Journal of Big Data, 7, 1–29.
Selvi, S., & Chandrasekaran, M. (2022). Detection of Drug Abuse Using Rough Set and Neural Network-Based Elevated Mathematical Predictive Modelling. Neural Processing Letters, 1–28.
Sharma, A., Mukhopadhyay, T., Rangappa, S. M., Siengchin, S., & Kushvaha, V. (2022). Advances in computational intelligence of polymer composite materials: machine learning assisted modeling, analysis and design. Archives of Computational Methods in Engineering, 29(5), 3341–3385.
Sharma, H. K., Singh, A., Yadav, D., & Kar, S. (2022). Criteria selection and decision making of hotels using Dominance Based Rough Set Theory. Operational Research in Engineering Sciences: Theory and Applications, 5(1), 41–55.
Singh, S., & Som, T. (2022). Intuitionistic Fuzzy Rough Sets: Theory to Practice. Mathematics in Computational Science and Engineering, 91–133.
Steinbach, M., Ertöz, L., & Kumar, V. (2004). The challenges of clustering high dimensional data. New Directions in Statistical Physics: Econophysics, Bioinformatics, and Pattern Recognition, 273–309.
Suo, M., Tao, L., Zhu, B., Chen, Y., Lu, C., & Ding, Y. (2020). Soft decision-making based on decision-theoretic rough set and Takagi-Sugeno fuzzy model with application to the autonomous fault diagnosis of satellite power system. Aerospace Science and Technology, 106, 106108.
Swathy, M., & Saruladha, K. (2022). A comparative study of classification and prediction of Cardio-Vascular Diseases (CVD) using Machine Learning and Deep Learning techniques. ICT Express, 8(1), 109–116.
Thabtah, F., Hammoud, S., Kamalov, F., & Gonsalves, A. (2020). Data imbalance in classification: Experimental evaluation. Information Sciences, 513, 429–441.
Tirkolaee, E. B., & Torkayesh, A. E. (2022). A cluster-based stratified hybrid decision support model under uncertainty: sustainable healthcare landfill location selection. Applied Intelligence, 52(12), 13614–13633.
Tran, D. T., & Huh, J.-H. (2022). Building a model to exploit association rules and analyze purchasing behavior based on rough set theory. The Journal of Supercomputing, 78(8), 11051–11091.
Udhaya Kumar, S., & Hannah Inbarani, H. (2017). PSO-based feature selection and neighborhood rough set-based classification for BCI multiclass motor imagery task. Neural Computing and Applications, 28, 3239–3258.
Vidhya, K. A., & Geetha, T. V. (2017). Rough set theory for document clustering: A review. Journal of Intelligent & Fuzzy Systems, 32(3), 2165–2185.
Vluymans, S., D’eer, L., Saeys, Y., & Cornelis, C. (2015). Applications of Fuzzy Rough Set Theory in Machine Learning: a Survey. Fundam. Informaticae, 142(1–4), 53–86.
Walczak, B., & Massart, D. L. (1999). Rough sets theory. Chemometrics and Intelligent Laboratory Systems, 47(1), 1–16.
Wong, S. Y., Yap, K. S., Yap, H. J., Tan, S. C., & Chang, S. W. (2014). On equivalence of FIS and ELM for interpretable rule-based knowledge representation. IEEE Transactions on Neural Networks and Learning Systems, 26(7), 1417–1430.
Xia, S., Zhang, H., Li, W., Wang, G., Giem, E., & Chen, Z. (2020). GBNRS: A novel rough set algorithm for fast adaptive attribute reduction in classification. IEEE Transactions on Knowledge and Data Engineering, 34(3), 1231–1242.
Xu, Q., Xie, W., Liao, B., Hu, C., Qin, L., Yang, Z., Xiong, H., Lyu, Y., Zhou, Y., & Luo, A. (2023). Interpretability of Clinical Decision Support Systems Based on Artificial Intelligence from Technological and Medical Perspective: A Systematic Review. Journal of Healthcare Engineering, 2023.
Xu, Xiao, Wang, X., Zhang, M., Zhang, J., & Tan, J. (2018). A parallelized hybrid NS/DSMC-IP approach based on adaptive structured/unstructured overlapping grids for hypersonic transitional flows. Journal of Computational Physics, 371, 409–433.
Xu, Xiaoxia, Gong, Z., Herrera-Viedma, E., Kou, G., & Cabrerizo, F. J. (2022). Consensus Reaching in Group Decision Making With Linear Uncertain Preferences and Asymmetric Costs. IEEE Transactions on Systems, Man, and Cybernetics: Systems.
Ying, C., Slamu, W., & Ying, C. (2022). Multi-Attribute Decision Making with Einstein Aggregation Operators in Complex Q-Rung Orthopair Fuzzy Hypersoft Environments. Entropy, 24(10), 1494.
Zhang, F., Wan, W., Zhang, C., Zhai, J., Chai, Y., Li, H., & Du, X. (2022). CompressDB: Enabling efficient compressed data direct processing for various databases. Proceedings of the 2022 International Conference on Management of Data, 1655–1669.
Zhang, P., Li, T., Wang, G., Luo, C., Chen, H., Zhang, J., Wang, D., & Yu, Z. (2021). Multi-source information fusion based on rough set theory: A review. Information Fusion, 68, 85–117.
Zheng, B., Xu, J., Lee, W.-C., & Lee, D. L. (2006). Grid-partition index: a hybrid method for nearest-neighbor queries in wireless location-based services. The VLDB Journal, 15, 21–39.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Randrianja Velo, Jérôme Tamatave, Solofo Sahambala

This work is licensed under a Creative Commons Attribution 4.0 International License.
