A novel approach to find feature selection using modified fuzzy C-mean

Feature selection, also known as dimensionality reduction, is a common preprocessing step in the fields of pattern recognition, data mining, and machine learning. This is a critical issue when mining high-dimensional, massive data sets. Preprocessing the data before analysis to acquire a smaller collection of representative features and keeping the optimal salient properties of the data leads to more compactness of the models trained and better generalization, as well as a decrease in processing time. Therefore, the standard for dimension reduction is to save just the information that is most important from the original data, as determined by certain optimality criteria. In this paper we are going to form a new framework include the combination of MFCM and APSO, to find the HIDS.


Introduction
An ideal feature subset is always related to a certain criterion in the feature selection process.Generally speaking, various criteria may result in various optimum feature subsets.The capacity of a feature or a combination of characteristics to differentiate between the various class labels is measured by each criterion, though.The dependency or correlation measure is primarily used to determine the correlation between two characteristics or a feature and a class [8], whereas the distance measure is a conventional discrimination or divergence measure.These two measurements are extremely sensitive to noise or outliers in the data set since they depend on the actual values of the training data.On the other hand, information measures that compute the quantity of information or the uncertainty of a feature for categorization include entropy and mutual information [9,10].The information measure has been extensively utilised in feature selection since it only depends on the probability distribution of a random variable and not on its actual values [5-7, 9, 10].
The rapid reduct (QR) approach [21,22], one of the common rough-set-based feature selection techniques, first determines the dependence or quality of approximation of a particular attribute with regard to the class labels or decision attribute.To provide greater quality, more attributes are added to the best attribute once it has been chosen.When the final subset of attributes reaches the highest quality achievable for the data set, or when the quality of the chosen attributes stays the same, attribute additions come to an end.The discernibility-matrix-based technique [23,24] and dynamic reducts [25] are two further noteworthy methods.Recently, Parthalain et al. [26] reported on a distancemeasure-based strategy to investigate the rough set border region for feature selection.All of these methods, meanwhile, need a lot of computer power.For feature selection, other heuristic methods built on the rough set theory are also developed [27,28].Several techniques have been suggested [29][30][31] to find an optimal or nearly optimal subset of characteristics by combining rough sets with evolutionary algorithms.
Maji and Pal [13] provided a method for feature selection that uses fuzzy-rough sets to efficiently minimise discrete or real-valued noisy data, or a combination of both, without the requirement for user-specified information.Additionally, the technique may be used to classify and regression data sets as well as data with continuous or nominal decision criteria.By increasing the relevance and decreasing the redundancy of the chosen features, the approach chooses a subset of features from the entire feature set.The fuzzy approximation spaces' f-information measures are used to determine the relevance and redundancy of the feature sets.The f-information measures for both condition and choice characteristics are derived using the fuzzy equivalence partition matrix (FEPM) idea.As a result, the only data needed for the present feature selection approach is in the form of fuzzy partitions for each characteristic, which may be generated automatically from the provided data set.This removes the requirement for subject matter experts to offer details on the data at hand and links into the benefit of rough sets in that it just needs the data set itself.

Fuzzy C-Means
FCM is a technique for unsupervised learning that categorises data samples without using class label information.The fundamental benefit of fuzzy c-means clustering is that it enables data points to gradually join clusters that are quantified as degrees in [0,1].The ability to describe that data points might be a part of several clusters is made possible by this.FCM is a data clustering approach in which a data set is divided into N clusters, each of which contains some of the data points in the dataset.Performance of the fuzzy c-means algorithm is superior than that of k-means.We decide on fuzzy c-means because of this.

Particle swarm optimization
PSO, an optimization strategy based on research on the behaviour of animals and birds.PSO is a strategy for global search that is effective and efficient [4,5].Due to superior representation, the capacity to explore huge areas, being less computationally expensive, being simpler to construct, and needing fewer parameters, it is a suitable technique to handle feature selection difficulties.

Intrusion Detection system
This is why we've settled on the IDS as the target for our unique feature selection planning.When an IDS scans a network or system, it looks for suspicious behaviour or violations of software policies.It's a traffic system that scans network transmissions for signs of malicious behaviour and notifies administrators immediately.Preparing IDSs to distinguish between benign and malicious network activity requires careful traffic.

Host Intrusion Detection System (HIDS):
A host intrusion detection system (HIDS) is a type of IDS that operates on its own computers or other network nodes.The packets entering and leaving the device are the only ones a HIDS watches, and if anything unusual is identified, the administrator will be notified.It creates a copy of the current system files and then checks for changes since the last backup.As soon as any changes are made to, or files are removed from, the analytical system, an alert is issued to the administrator.

Methodology
We propose a seven-step approach that includes pre-processing, feature selection, MFCM and APSO, classification, training, testing, and assessment.To test this hypothesis, the NSL-KDD dataset was employed in the suggested model.The NSL-KDD data set is an updated version of the KDD'99 dataset.This data set serves as a useful standard against which researchers can evaluate other HIDS (Host intrusion detection methods).As the raw features confuse the classifier, leading to false alarms, preprocessing is required.In addition, the computational and memory requirements of the classification methods are not fully explored, and the few symbolic characteristics that augment them are underutilised.An expression for the NSL KDD dataset's raw feature set is Af= {ax1, ax2, ax3, ……, axn}………… (1) where n=21 denotes that the raw dataset has forty-one characteristics.In order to reduce the amount of unnecessary overhead in the learning process, the symbolic features are removed from the raw feature set.The unprocessed feature set obtained looks like this: Af'={ax1, ax2, ax3, ……………., axm} ……………….(2) where the raw feature count (m) is set to 18.This raw feature collection needs further preprocessing to choose significant characteristics based on their sensitivity.In Feature Selection, we explore the many methods that may be used to change characteristics and pick the most salient ones for this purpose.In the context of data analysis, the statistical selection known as accelerated particle swarm optimization (APSO) can be a powerful feature for determining which features to prioritise.To further highlight the significance of the data and make them more readily apparent, the APSO is used to turn the raw features into primary features.The last several years have seen this method employed in a variety of fields [21].Higher eigenvalue traits are chosen in this method, whereas lower eigenvalue features are disregarded.Due to the possibility of losing certain crucial traits, this way of feature selection is not optimum.What feature is crucial, and what is not?How do you know which feature is picked and which isn't in main space?Both APSO and MFCM have been shown to be effective in the past in solving similar optimization issues, making them the best choices here.Therefore, in this study, APSO and MFCM were employed for feature selection while the suggested new framework was used just for feature transformation.The used swarm algorithm is outlined.

Accelerated particle swarm optimization
In [14], a hybrid algorithm is suggested as a means of improving optimization.It doesn't matter what you put in as a beginning value or how you adjust the parameters, APSO and MFCM will still work.However, the APSO matures slowly because of the variable step length and unpredictable behaviour.As a result, the optimization accuracy is rarely good enough, leading to blind spots in the search and a lack of capacity to strike a good balance between discovery and exploitation.The inclusion of crossover grants the particle-swarm the ability to hop to and investigate a more varied region.Particle swarms can skip past the blind search phase and acquire their parents' benefits with a little help from the crossover operator.It was the MFCM concept that motivated them to include the APSO's belief-space-stored normative information and situational knowledge into the MFCM.In this model, the swarm serves as a population space from which subject expertise may be mined.Accordingly, we model and affect the population's evolution iteratively by creating and storing domain knowledge in belief space.
The following equation is used to decide if the algorithm is stuck in local minima:

………….. (3)
when the above criterion is satisfied crossover is applied on the ith artificial fish as inequation 3.
The main steps for the proposed one:  Initialize the N particle-swarm in the search ranges with random places and set all the parameter values. Start with a blank belief space and evaluate all particle-swarms with the fitness function y.  To find the best offspring, we must replicate each particle swarm's preying behaviour, swarming pattern, and chasing pattern independently.If the offspring is superior, it should be substituted for the ith particle-swarm. Revise the assumption set. If the crossover condition holds, the ith particle-swarm from Step 3 will be subjected to the crossover operator. In order to keep going until the end requirement is met, go back to Step 3.

Result and discussion
There is a direct comparison of feature selection methods that uses different numbers of characteristics, as shown in Figure 3.The raw datasets contain 38 features, the PCA conventional based method dataset has 20, the PCA + GA based selection method dataset contains 10, and the PCA + PSO based method dataset contains 8 features.As a result, the final approach delivered a more manageable collection of characteristics that enhanced the classifier's efficiency and reduced the complexity of its underlying architecture.

Conclusion
There is no shortage of intrusion detection methods, however the most of them suffer from poor performance.By suggesting a good approach of feature selection and categorization, we can boost performance.This study proposes a technique for feature selection in Host-intrusion detection for wireless sensor networks that, using APSO and MFCM, can pick the best features to use from either the primary space or the sample KNN space.The suggested method is evaluated using the NSL KDD dataset, which is widely recognised as the gold standard for testing intrusion detection approaches.After validating the chosen feature subset (based on APSO) on a modular neural network, and contrasting it with another feature subset (based on MFCM) and other approaches, the former is deemed to be the superior choice (KNN based, raw).The outcomes show that the feature selection approach based on APSO and MFCM works better than the current methods.

Disclosure of conflict of interest
No conflict of interest to be disclosed.

Figure 1
Figure 1Proposed framework for feature selection and evaluation Feature selection's goal is to establish how few feature subsets are required to adequately train a classifier to distinguish between benign and malicious associations or activities.Always, p is a smaller feature than m, the initial number of features in the feature set from which it was derived.Figure2illustrates the selection of selecting feature subsets.In what follows, we'll go through each of the steps in Figure2in step.

Figure 2
Figure 2 Feature selection flow

Figure 3
Figure 3 Comparison of feature selection techniques based on number of featuresIn Figure4we see a comparison of several feature selection methods in terms of their detection rates.A raw feature set is outperformed by the obtained feature subsets.In addition, the MFCM+APSO-based feature selection indicates the optimal performance with the fewest characteristics.

Figure 4
Figure 4 Comparison of feature selection techniques based on detection rate The false alarm rate comparison among the feature selection strategies is shown in Figure 5.When compared to alternative approaches, the MFCM+APSO-based technique of feature selection has a lower false-alarm rate.

Figure 5
Figure 5 Comparison of feature selection techniques based on false alarm rate