Hybrid Grid Partitioning and Rough Set Method for Enhanced Dataset Clustering and Interpretable Rule Generation in Big Data Analysis
DOI:
https://doi.org/10.35335/emod.v14i2.32Keywords:
Hybrid grid partitioning, Rough set method, Dataset clustering, Interpretable rule generation, Big data analysisAbstract
This research introduces a novel approach that combines hybrid grid partitioning and rough set theory for enhanced dataset clustering and interpretable rule generation in big data analysis. The proposed method addresses the challenges of scalability, high dimensionality, and interpretability, which are common in analyzing large and complex datasets. The hybrid approach leverages grid partitioning to efficiently handle large datasets by dividing them into manageable subsets. This enables parallel processing and reduces computational complexity. Additionally, rough set theory is incorporated to identify essential attributes that contribute to cluster formation, thereby reducing the dimensionality of the data and enhancing clustering accuracy. One of the key contributions of this research is the generation of interpretable rules based on the clustering results. By applying rough set-based attribute selection, the method identifies the crucial attributes that determine cluster assignments. These interpretable rules provide valuable insights into the relationships between attributes and clusters, aiding in understanding the underlying patterns in the data. A numerical example is provided to demonstrate the effectiveness of the proposed method. The results show improved clustering accuracy and the generation of clear and interpretable rules based on the dataset attributes. While the research presents significant advancements, it is important to consider the limitations, including potential challenges in generalizability, sensitivity to parameter settings, and computational complexity. Future research should focus on further validation and evaluation of the method on diverse datasets and comparisons with other state-of-the-art clustering algorithms. In conclusion, the hybrid grid partitioning and rough set method offer a promising solution for enhanced dataset clustering and interpretable rule generation in big data analysis. The research contributes to the advancement of data analytics methodologies and provides practical approaches for extracting knowledge from complex datasets, supporting decision-making processes, and enabling better understanding of underlying data patterns.
References
Aghaeipoor, F., & Javidi, M. M. (2019). MOKBL+ MOMs: An interpretable multi-objective evolutionary fuzzy system for learning high-dimensional regression data. Information Sciences, 496, 1-24.
AL-HIMYARI, B. A. SELF LEARNING NEURO-FUZZY MODELING USING HYBRID GENETIC PROBABILISTIC APPROACH FOR ENGINE AIR/FUEL RATIO PREDICTION.
Arunarani, A. R., Manjula, D., & Sugumaran, V. (2019). Task scheduling techniques in cloud computing: A literature survey. Future Generation Computer Systems, 91, 407-415.
Banu, P. N., Azar, A. T., & Inbarani, H. H. (2017). Fuzzy firefly clustering for tumour and cancer analysis. International Journal of Modelling, Identification and Control, 27(2), 92-103.
Berger, P. A. (2004). Rough set rule induction for suitability assessment. Environmental management, 34, 546-558.
Birek, L., Grzywaczewski, A., Iqbal, R., Doctor, F., & Chang, V. (2018). A novel Big Data analytics and intelligent technique to predict driver's intent. Computers in Industry, 99, 226-240.
Castiello, C., Fanelli, A. M., Lucarelli, M., & Mencar, C. (2019). Interpretable fuzzy partitioning of classified data with variable granularity. Applied Soft Computing, 74, 567-582.
Chadha, K., & Jain, S. (2015). Hybrid genetic fuzzy rule based inference engine to detect intrusion in networks. In Intelligent Distributed Computing (pp. 185-198). Springer International Publishing.
Fan, C. (2016). Development of data mining-based big data analysis methodologies for building energy management.
Gong, R., Huang, S. H., & Chen, T. (2008). Robust and efficient rule extraction through data summarization and its application in welding fault diagnosis. IEEE Transactions on Industrial Informatics, 4(3), 198-206.
Gorzałczany, M. B., & Rudziński, F. (2016). A multi-objective genetic optimization for fast, fuzzy rule-based credit classification with balanced accuracy and interpretability. Applied Soft Computing, 40, 206-220.
Guo, J. Y. (2003). Rough set-based approach to data mining. Case Western Reserve University.
Hassanien, A. E., Abraham, A., Peters, J. F., Schaefer, G., & Henry, C. (2009). Rough sets and near sets in medical imaging: A review. IEEE Transactions on Information Technology in Biomedicine, 13(6), 955-968.
Hüllermeier, E. (2005). Fuzzy methods in machine learning and data mining: Status and prospects. Fuzzy sets and Systems, 156(3), 387-406.
Jensen, R., & Shen, Q. (2008). Computational intelligence and feature selection: rough and fuzzy approaches.
Jiao, L., Geng, X., & Pan, Q. (2019). Compact belief rule base learning for classification with evidential clustering. Entropy, 21(5), 443.
Kashyap, H., Ahmed, H. A., Hoque, N., Roy, S., & Bhattacharyya, D. K. (2015). Big data analytics in bioinformatics: A machine learning perspective. arXiv preprint arXiv:1506.05101.
Kashyap, H., Ahmed, H. A., Hoque, N., Roy, S., & Bhattacharyya, D. K. (2016). Big data analytics in bioinformatics: architectures, techniques, tools and issues. Network Modeling Analysis in Health Informatics and Bioinformatics, 5, 1-28.
Lee, H. E., Park, K. H., & Bien, Z. Z. (2008). Iterative fuzzy clustering algorithm with supervision to construct probabilistic fuzzy rule base from numerical data. IEEE Transactions on Fuzzy Systems, 16(1), 263-277.
Liao, T. W. (2006). Mining human interpretable knowledge with fuzzy modeling methods: An overview. Data mining and knowledge discovery approaches based on rule induction techniques, 495-550.
Mencar, C., & Fanelli, A. M. (2008). Interpretability constraints for fuzzy information granulation. Information Sciences, 178(24), 4585-4618.
Nadiammai, G. V., & Hemalatha, M. (2012, August). An evaluation of clustering technique over intrusion detection system. In Proceedings of the International Conference on Advances in Computing, Communications and Informatics (pp. 1054-1060).
Narayanan, S. J., Bhatt, R. B., & Paramasivam, I. (2016). An improved second order training algorithm for improving the accuracy of fuzzy decision trees. International Journal of Fuzzy System Applications (IJFSA), 5(4), 96-120.
Nushi, B., Kamar, E., & Horvitz, E. (2018, June). Towards accountable ai: Hybrid human-machine analyses for characterizing system failure. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing (Vol. 6, pp. 126-135).
Palepu, R. B., & Muley, R. R. (2017). An analysis of agricultural soils by using data mining techniques. Int. J. Eng. Sci. Comput, 7(10).
Parikh, V., & Shah, P. (2015). E-commerce Recommendation System usingAssociation Rule Mining and Clustering. Int. J. Innov. Adv. Comput. Sci, 91, 944-952.
Pruengkarn, R., Wong, K. W., & Fung, C. C. (2017). A review of data mining techniques and applications. Journal of Advanced Computational Intelligence and Intelligent Informatics, 21(1), 31-48.
Pulkkinen, P., & Koivisto, H. (2007). Identification of interpretable and accurate fuzzy classifiers and function estimators with hybrid methods. Applied Soft Computing, 7(2), 520-533.
Rajab, S. (2019). Handling interpretability issues in ANFIS using rule base simplification and constrained learning. Fuzzy Sets and Systems, 368, 36-58.
Rajab, S., & Sharma, V. (2019). An interpretable neuro-fuzzy approach to stock price forecasting. Soft Computing, 23, 921-936.
Sakthivel, N. R., Nair, B. B., Sugumaran, V., & Roy, R. S. (2012). Application of standalone system and hybrid system for fault diagnosis of centrifugal pump using time domain signals and statistical features. International Journal of Data Mining, Modelling and Management, 4(1), 74-104.
Sangeetha, J., & Prakash, V. S. J. (2017). A survey on big data mining techniques. International Journal of Computer Science and Information Security, 15(1), 482.
Stepaniuk, J. (2009). Rough–Granular Computing in Knowledge Discovery and Data Mining (Vol. 152). Springer.
Tsai, T. N. (2012). Development of a soldering quality classifier system using a hybrid data mining approach. expert systems with applications, 39(5), 5727-5738.
Valdés, J. J., & Barton, A. J. (2007). Finding relevant attributes in high dimensional data: a distributed computing hybrid data mining strategy. Transactions on Rough Sets VI: Commemorating the Life and Work of Zdzisław Pawlak, Part I, 366-396.
Vluymans, S., D'eer, L., Saeys, Y., & Cornelis, C. (2015). Applications of Fuzzy Rough Set Theory in Machine Learning: a Survey. Fundam. Informaticae, 142(1-4), 53-86.
Wang, L. (2017). Heterogeneous data and big data analytics. Automatic Control and Information Sciences, 3(1), 8-15.
Wang, X., Abraham, A., & Smith, K. A. (2005). Intelligent web traffic mining and analysis. Journal of Network and Computer Applications, 28(2), 147-165.
Zhang, J., Deng, Z., Choi, K. S., & Wang, S. (2017). Data-driven elastic fuzzy logic system modeling: Constructing a concise system with human-like inference mechanism. IEEE Transactions on Fuzzy Systems, 26(4), 2160-2173.
Zhang, Y., Ishibuchi, H., & Wang, S. (2017). Deep Takagi–Sugeno–Kang fuzzy classifier with shared linguistic fuzzy rules. IEEE Transactions on Fuzzy Systems, 26(3), 1535-1549.
Zhao, W., Niu, Q., Li, K., & Irwin, G. W. (2013). A hybrid learning method for constructing compact rule-based fuzzy models. IEEE transactions on cybernetics, 43(6), 1807-1821.
Zhou, T., Chung, F. L., & Wang, S. (2016). Deep TSK fuzzy classifier with stacked generalization and triplely concise interpretability guarantee for large data. IEEE Transactions on Fuzzy Systems, 25(5), 1207-1221.
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Miankoff Gordion JKacker

This work is licensed under a Creative Commons Attribution 4.0 International License.
