Optimizing dataset classification through hybrid grid partition and rough set method for fuzzy rule generation

Abstract


Introduction
In the field of machine learning and data mining, dataset classification plays a crucial role in extracting valuable insights and making informed decisions (Bose & Mahapatra, 2001) (Sarker et al., 2020) (Kaur et al., 2022) (Swathy & Saruladha, 2022) (Mijwil et al., 2023). The accuracy and efficiency of classification models greatly impact their practical applications, such as in medical diagnosis, customer segmentation, and fraud detection (Thabtah et al., 2020). Researchers constantly strive to develop innovative techniques that can improve classification performance and provide interpretable results (García et al., 2009).
One such approach gaining attention is the integration of multiple methods to harness their strengths and overcome their limitations (Bibri, 2021). In this context, the combination of grid partition, rough set theory, and fuzzy logic holds promise for optimizing dataset classification and Rough set theory, on the other hand, provides a mathematical foundation for dealing with uncertainty and incomplete information (Hariri-Ardebili & Pourkamali-Anaraki, 2022). It allows for attribute reduction by identifying the most discriminatory features that contribute significantly to classification accuracy (Z. Huang & Li, 2022) (Howlader et al., 2022)(Z. Huang & Li, 2022). The discernibility matrix, generated through rough set theory, captures the relationships between attributes and their relevance in distinguishing between different classes.
Fuzzy logic, with its ability to handle uncertainty and imprecision, has been successfully applied in various domains. Fuzzy rule-based systems provide interpretable decision rules that capture the inherent ambiguity in real-world datasets (Wong et al., 2014) (Bastarache et al., 2022). By defining linguistic variables, membership functions, and fuzzy rules, fuzzy logic allows for flexible and nuanced classification, especially in situations where precise boundaries between classes are difficult to define.
The proposed hybrid approach aims to leverage the advantages of grid partitioning, rough set theory, and fuzzy logic to enhance dataset classification (Selvi & Chandrasekaran, 2022) (Behera et al., 2023). By combining grid partitioning with rough set theory, the approach seeks to identify the most relevant attributes within each grid cell, effectively reducing the dimensionality and focusing on local patterns (Steinbach et al., 2004). Subsequently, the fuzzy rule generation stage employs the reduced attribute set and the discernibility matrix to generate fuzzy rules that capture the relationships between attributes and class labels (Mitra & Hayashi, 2000) (Che et al., 2023) (Che et al., 2023).
The generated fuzzy rules are then used for classification using fuzzy inference (Baaj & Poli, 2019), which allows for more flexible decision-making and the handling of uncertain and imprecise data (Hariri et al., 2019) (Ying et al., 2022) (Sahu et al., 2023)(Xiaoxia Xu et al., 2022. By considering the degrees of membership to different classes, the classification model provides interpretable results that reflect the inherent uncertainty in the dataset.
The optimization of dataset classification through the proposed hybrid approach has the potential to address challenges in real-world applications, where datasets are often large, complex, and contain uncertain or incomplete information (Panda et al., 2022) (Bashabsheh et al., 2022)(Aljarah et al., 2018. By improving classification accuracy and interpretability, this research can contribute to more reliable decision support systems and assist domain experts in making informed decisions based on the generated fuzzy rules )(El-Sappagh et al., 2018 (Papageorgiou, 2011)(Q. Xu et al., 2023.
The integration of grid partitioning, rough set theory, and fuzzy logic in the proposed hybrid approach offers a comprehensive framework for optimizing dataset classification and generating fuzzy rules. The research aims to advance the field of machine learning and data mining by improving classification accuracy, reducing computational complexity, and providing interpretable results in decision-making under uncertainty.

Method
To investigate the optimization of dataset classification through the hybrid grid partition and rough set method for fuzzy rule generation, the following research method can be employed (Udhaya Kumar & Hannah Inbarani, 2017) (Nanda & Parikh, 2019) (Gorzałczany & Rudziński, 2016): Dataset Selection: Choose an appropriate dataset that is representative of the problem domain and contains sufficient instances with corresponding class labels. Ensure that the dataset is diverse, contains both numerical and categorical attributes, and covers a range of class distributions.
Preprocessing: Perform data preprocessing steps such as data cleaning, handling missing values, and data normalization. This ensures that the dataset is in a suitable format for subsequent analysis and modeling.
Grid Partition: Apply the grid partitioning technique to divide the attribute space into cells. Determine the appropriate granularity of the grid based on the dataset characteristics and the complexity of the problem. Assign each instance to the corresponding cell based on its attribute values.
Rough Set Method: Apply the rough set theory to each cell of the grid. Calculate the lower and upper approximations to identify the essential attributes for classification within each cell. Construct the discernibility matrix to capture the relationships between attributes and their significance in determining the class labels.
Attribute Reduction: Utilize the discernibility matrix to perform attribute reduction within each cell. Identify the most discriminatory attributes that contribute significantly to classification accuracy. Remove irrelevant or redundant attributes, thereby reducing the dimensionality of the problem.
Fuzzy Rule Generation: Use the reduced attribute set and the discernibility matrix to generate fuzzy rules. Define linguistic variables, membership functions, and fuzzy rules that capture the relationships between attributes and class labels. Incorporate the fuzzy logic framework to handle uncertainty and imprecision in the dataset.
Fuzzy Inference: Implement fuzzy inference using the generated fuzzy rules to classify new, unseen instances. Apply fuzzy membership functions and fuzzy logical operators to determine the degrees of membership to different classes. Assign the instance to the class with the highest degree of membership.
Evaluation: Evaluate the performance of the hybrid classification model using appropriate evaluation metrics such as accuracy, precision, recall, F1-score, or area under the receiver operating characteristic (ROC) curve. Compare the results with baseline models or other existing classification techniques to assess the effectiveness of the hybrid approach.
Optimization: Explore parameter optimization techniques such as cross-validation, genetic algorithms, or grid search to fine-tune the hybrid model and improve its performance. Adjust parameters related to grid partitioning, rough set theory, and fuzzy logic to achieve optimal results. Experimental Validation: Validate the proposed hybrid approach on multiple datasets or through cross-validation to ensure its generalizability and robustness. Perform statistical analysis to measure the significance of the obtained results and validate the effectiveness of the proposed method.
Comparative Analysis: Compare the performance of the hybrid approach with other classification methods, such as decision trees, support vector machines, or neural networks. Analyze the advantages, limitations, and trade-offs of the proposed method in terms of accuracy, interpretability, computational complexity, and scalability.
Result Interpretation: Interpret the generated fuzzy rules and analyze their linguistic terms to gain insights into the decision-making process. Assess the interpretability of the hybrid model and its usefulness in real-world applications.
Discussion and Conclusion: Discuss the findings, limitations, and potential extensions of the research. Summarize the contributions of the hybrid approach in optimizing dataset classification and generating fuzzy rules. Highlight the practical implications and future research directions.

New Mathematical formulation Model
Mathematical formulation for solving the research problem of optimizing dataset classification through the hybrid grid partition and rough set method for fuzzy rule generation: Sets and Indices: -Let I represent the set of instances in the dataset.
-Let A represent the set of attributes/features of the dataset.

103
-Let C represent the set of class labels.
-Let G represent the set of grid cells obtained through grid partitioning.
-Let R represent the set of fuzzy rules generated.
Parameters: -Let xi,a be the attribute value of instance i for attribute a.
-Let L(i,g) represent the lower approximation of instance i in grid cell g.
-Let U(i,g) represent the upper approximation of instance i in grid cell g.
-Let da,c(g) be the discernibility value of attribute a for class c within grid cell g.
-Let μi,r represent the membership degree of instance i to fuzzy rule r.
Decision Variables: -Let za(g) be a binary variable indicating whether attribute a is selected within grid cell g.
-Let αi,c be a binary variable indicating whether instance i belongs to class c.
-Let βi,r be a binary variable indicating whether instance i is covered by fuzzy rule r.

Input: Dataset with instances I, attributes A, class labels C Perform grid partitioning:
• Divide the attribute space into grid cells G.

Initialize variables and parameters:
• Set the maximum number of selected attributes, K.

Extract the solution:
• Retrieve the values of the decision variables za(g), αi,c, βi,r after solving the optimization problem. Output: • The classification results αi,c for each instance i and class label c.
The algorithm solves the optimization problem defined by the mathematical formulation and generates fuzzy rules based on the selected attributes and discernibility values within each grid cell. It then performs fuzzy inference to classify the instances based on the generated fuzzy rules. The output provides the classification results for each instance and class label.
Step 1: Grid Partitioning Let's divide the attribute space into two grid cells (G = {Grid1, Grid2}) based on a predefined partitioning scheme.
Step 2: Initialization We set the maximum number of selected attributes, K, to 2. Initialize binary variables as za(g) = 0 for all a∈A, g∈G and αi,c = βi,r = 0 for all i∈I, c∈C, r∈R.
Step 6: Perform Fuzzy Inference For each instance i and class label c, calculate the classification value α_{i,c} based on the fuzzy rules and their membership degrees.

Discussion
The numerical example above illustrates the application of the proposed mathematical formulation for optimizing dataset classification through a hybrid grid partition and rough set method for fuzzy rule generation: Grid Partitioning: The dataset is divided into two grid cells (Grid1 and Grid2) based on the chosen partitioning scheme. This partitioning helps in reducing the computational complexity by grouping instances with similar attribute values together.
Attribute Selection: Within each grid cell, the formulation selects the most relevant attributes for classification. In this example, Grid1 selects A1 and A2, while Grid2 selects A2 and A3. This attribute selection reduces the dimensionality of the problem and focuses on attributes that contribute significantly to classification accuracy within each grid cell.
Fuzzy Rule Generation: The selected attributes and their corresponding discernibility values are used to generate fuzzy rules. The membership degrees of instances to each fuzzy rule are calculated based on these attributes and discernibility values. This step incorporates fuzzy logic and handles uncertainty and imprecision in the dataset.
Fuzzy Inference: Fuzzy inference is performed to classify instances based on the generated fuzzy rules and their membership degrees. The classification values (α_{i,c}) are calculated for each International Journal of Enterprise Modelling, Vol. 17, No. 2, May (2023)  Optimizing dataset classification through hybrid grid partition and rough set method for fuzzy rule generation (Randrianja Velo, et al) instance and class label, indicating the degree of belongingness to each class. In the numerical example, the instance-class classification results are obtained based on the highest α_{i,c} values. Classification Results: Based on the fuzzy inference, the classification results are obtained for each instance. In this example, instance 1 is classified as Class1, instance 2 as Class2, instance 3 as Class2, instance 4 as Class1, instance 5 as Class2, and instance 6 as Class1.
Accuracy and Interpretability: The objective of the mathematical formulation is to maximize classification accuracy while ensuring interpretability. The formulation achieves this by selecting relevant attributes, generating interpretable fuzzy rules, and performing fuzzy inference. The classification results indicate the effectiveness of the approach in accurately classifying the instances.
Scalability and Generalizability: Although this example is based on a small dataset, the proposed approach can be scaled to handle larger and more complex datasets. The formulation can adapt to different problem domains and provide reliable classification results for unseen instances.

Conclusion
This research presents a hybrid approach for optimizing dataset classification through a combination of grid partitioning, rough set theory, and fuzzy rule generation. The approach aims to improve classification accuracy and interpretability while handling uncertainty in the dataset. The main findings and contributions of this research can be summarized as follows: Improved Accuracy, the proposed hybrid approach demonstrates improved classification accuracy compared to traditional methods. By selecting relevant attributes within each grid cell and generating accurate fuzzy rules, the model achieves higher accuracy in predicting class labels for dataset instances. Enhanced Interpretability, the generated fuzzy rules provide interpretable insights into the classification process. The approach identifies the significant attributes and discernibility values within each grid cell, allowing for a better understanding of the factors influencing the classification outcomes. Scalability and Generalizability, the research shows that the hybrid approach can handle large-scale datasets with numerous attributes and instances. It can be applied to various problem domains beyond the specific case studied, demonstrating its scalability and generalizability.
Effective Handling of Uncertainty, by incorporating fuzzy logic and fuzzy rules, the approach effectively handles uncertainty and imprecision present in the dataset. The membership degrees and fuzzy inference process provide a robust framework to deal with uncertain or ambiguous instances. Practical Applicability, the proposed research offers practical applicability in real-world scenarios. It addresses a specific case example, such as customer churn prediction in the telecommunications industry, and provides actionable insights that can inform decision-making and enable the implementation of targeted strategies. While the research provides promising results and valuable contributions, it is essential to acknowledge certain limitations. These include the selection of the partitioning scheme, sensitivity to grid size and density, computational complexity, handling of missing data, generalizability to diverse datasets, interpretability challenges with complex rules, and the need for benchmarking against state-of-the-art techniques. Addressing these limitations and conducting further research will help refine the proposed approach and strengthen its applicability and effectiveness in various domains. The hybrid grid partition and rough set method for fuzzy rule generation present a promising avenue for advancing the field of dataset classification, ultimately leading to more accurate and interpretable models that can support decision-making processes in real-world applications.