A comparative analysis of recurrent neural network and support vector machine for binary classification of spam short message service

Over the years, communication through Short Message Service (SMS) has been a primary tool for mobile subscribers. SMS has varied applications in health, industry, finances, education and social networking among others. The growth of mobile devices and SMS usage has consequently increased the attack surface for cyber-criminals culminating to the proliferation of malicious activities introduced using SMS spam, phishing, spyware, malware etc. Ham messages are normal messages people trade with one another and are usually not unwanted by the recipient, while spam messages are unsolicited junk and redundant messages that may be sent to a large number of people at once and are usually unwanted. Various spam detection models have been developed using various traditional machine and deep learning techniques. However, most studies where comparison between deep and traditional machine learning algorithms is done, have unfortunately omitted K-Nearest Neighbors and Support Vector Machine (SVM) which are empirically deemed as the most popular traditional machine learning algorithms. In this study, therefore, we develop a deep learning model based on Recurrent Neural Network (RNN) for Spam and Ham SMS classification and compare its performance against SVM model for the same University of California (UCl) SMS dataset. The results show that RNN has a slightly higher training and validation accuracy of 0.98 compared to SVM at 0.94, however, the false positive rate of SVM is marginally lower. Exploring application of deep learning with better optimization algorithms such as RNN improves accuracy, reduces computational complexities i.e. memory consumption and speed, and thus minimizing false positive rates. For future work, we suggest the use of varied performance metrics to validate the model in a distributed dataset environment.


Introduction
Critical infrastructures are essential for the economy, society and governments [1]- [5].The communication sector is one of the critical infrastructures essential for the economic, social and financial stability of any country [6].Communication over the Internet is significant to all users that operate on the cyberspace [7].The Internet of Things (IoT) main aim is to connect everything within a common infrastructure to enable control and updates of devices from anywhere and at any time [8]- [13].Critical infrastructure components are vulnerable to varied threats including, natural disasters, terrorism, or cybercrime among others [6].Organizations that require Internet for communication are vulnerable to cyber-attacks leading to the exploitation of cybercrime risks.Basically, the interconnected nature of the internet and the increasing dependence on digital systems make organizations potential targets for malicious actors seeking to exploit vulnerabilities for financial gain, data theft, or disruption of operations [14]- [17].Table 1 presents some of the cybercrime risks faced by organizations.

Risk Description
Data Breaches Cybercriminals may attempt to breach an organization's network security to gain unauthorized access [18] to sensitive information such as customer data, employee records, or intellectual property.This stolen data can be used for identity theft, financial fraud, or sold on the dark web.
Ransomware Attacks Ransomware is a type of malware that encrypts an organization's data and demands a ransom in exchange for the decryption key.These attacks can cripple operations and result in significant financial losses if organizations are unable or unwilling to pay the ransom [19]- [23].

Phishing and Social Engineering
Cybercriminals often employ deceptive tactics, such as phishing emails or phone calls, to trick employees into revealing sensitive information like login credentials or financial details [24] [25], [26], [27].This information can be used to gain unauthorized access to systems or carry out fraudulent activities.

Distributed Denial of Service (DDoS) Attacks
DDoS attacks involve overwhelming a targeted organization's network or website with a flood of traffic, rendering it inaccessible to users [28]- [31].This can disrupt operations, cause financial losses, and damage an organization's reputation.

Security Audits and Penetration Testing
Conduct regular security audits and penetration testing to identify vulnerabilities and weaknesses in the communication infrastructure.This allows organizations to proactively address security gaps and strengthen their defenses [59]- [65].

Employee Training and Awareness
Educate employees about cybersecurity best practices, including safe browsing habits, recognizing phishing attempts, and maintaining strong passwords [66]- [68].Regular training and awareness programs can help mitigate human error, which is often exploited by cybercriminals.
Incident Response Plan Develop and implement an incident response plan to effectively respond to security incidents [69]- [71].This includes defined procedures for reporting and addressing security breaches, isolating affected systems, and restoring operations while minimizing the impact.
Continuous Monitoring Implement continuous monitoring and logging of network activity to detect any suspicious behavior or anomalies [72]- [76].This enables organizations to identify and respond to potential security incidents in a timely manner.
Smart-phones are emerging as versatile devices enabling the user to perform various activities [77].According to a report by Statista, indicated that smart-phone mobile network subscriptions worldwide was at almost 6.6 billion in 2022, and this is expected to rise beyond 7.8 billion subscriptions by 2028.
The primary communication tool widely used by mobile subscribers is Short Message Service popularly known as SMS [78], [79], [80], [81], [82], [83].There are also existing SMS and chatting utilities such as WhatsApp, Hangout, Viber, WeChat, etc. with similar and advanced functionalities [84].The intention and the role of SMS ranges through various areas of life, i.e.health, education, financial, security, career, social networking among others.According to [84]and [85], the growth of mobile devices and SMS communication in support of various facets of life is tremendous but on the flip side it has led to proliferation of malicious activities [86] as well.Text messages or SMS are a part of smart-phones through which attackers can target the users [87]; [85], [88].
Multimedia SMS pose a challenge of malicious content and it is therefore imperative to adopt atechnique that can process languages, images, emoji, and videos.Activities could be introduced using SMS spam, phishing, spyware, malware etc. Spam could be unsolicited messages which may contain viruses, malwares, adverts and un-demanded contents targeting individuals, companies and business organizations [88], [89].Ham messages are the everyday messages that individuals trade with each other, these are not junk messages, while spam messages can be classified as redundant messages sent to a large number of people at once [83].The rise of spam messages is based on factors such as accessibility to affordable bulk SMS plans [83].For instance, a ham message may state "are you available on Thursday?", while a spam message may utilize the expression "free melodies and ringtones" [83].Unfortunately, these unsolicited messages are increasing at an alarming rate and according to [87], attackers use various communication mechanisms such as SMS phishing to get sensitive information from mobile users [89]- [93].These subscribers may not require internet connection in order to receive an SMS, making it convenient and efficient for cybercriminals to exploit.
There is need for researchers to exploit machine learning-based algorithms for classification, clustering and association to analyze data such as with SMS in order to gain more insight on the protection of communication systems [94]- [99].This approach will eventually support confidentiality as one of the tenets of information security by developing robust security systems.

Motivation
In order to understand data patterns and challenges, research in machine-learning techniques is paramount.We therefore desire to indulge in solution of practical problems in order to contribute in research, especially in developing text-based and language-based machine learning algorithms.This paper intends to design and develop a deep learning model based on Recurrent Neural Network (RNN) for Spam and Ham SMS classification and compare its performance against SVM model on the same dataset.

Contribution
The contributions of this study are as follows:  Mathematically describe and propose a framework of RNN for Spam SMS classification  Design a comparative model of RNN and SVM in order to illustrate the process flow from data preprocessing, training, fitting of SMS dataset, compilation and evaluation of results.

 Provide a comparison between RNN and SVM for SMS spam detection
The rest of the paper is organized as follows:Section 2 is the background of the study; Section 3 presents related work; Section 4 is the proposed methodology.Section 5, presents a discussion on the model experimental performance for both RNN and the SVM classifier and finally, Section 6 gives the conclusion of the study.

Background of Study
A number of supervised algorithms such as Naïve Bayes, Support Vector Machine (SVM), neural networks and regression have been used to develop most SMS spam classifiers [83], [89], [85], [100], [101].This is perhaps due to the availability of the output column (labelled data) of the SMS dataset making it possible to train classification problems.The authors in [102] designed an artificial model [103] that applies the concept of functional biological neurons in the human brain to process input and produce output as the sum of weighted inputs as shown in the formula below: Like human biological neurons, the computer based artificial neuron accepts inputs x1, x2, x3….xn, and each input is multiplied by corresponding weights w1, w2, w3,….wn.The sum of each subsequent product of weighted inputs, gives the resultant sum and is considered as the logit neuron.Sometime the logit is comprised of a constant value called the bias.
The logit is then expressed as a function, f, to make the desired output y = f (z).
The researchers in [104] describe deep learning as a subsequent derivative of machine learning that applies algorithms, processes the data, and develops abstractions.Deep learning is an emerging technology that drives artificial intelligence (AI) and the processing of big data; deep learning is ubiquitous, and the machine learning algorithms assist in modeling high-level abstract view of data by means of processing layers which encompasses complex structures [104]- [108].Researchers in [109] reported that various studies have shown that neural network have been used to understand human beings in a psychological way and therefore in certain occasions it has even outperformed human beings.Some of the researches done using neural network include a deep learning model where the California Renewable Production 2010-2018 dataset, is trained to predict the solar photovoltaic output (California ISO, 2020), deep learning for natural language processing in [110] among others.Researchers in [111] also conducted a research using CNN and extensively used it for image identification, object detection, face detection and classification of images.The technique is applied in OCR (optical character recognition) where text is identified from images.
The authors in [104] discussed in their paper some deep learning architecture applicable in healthcare system such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), auto-encoders (AEs), and restricted Boltzmann machines (RBMs).The major application of deep learning in healthcare system falls on image processing, especially to predict Alzheimer's disease using magnetic resonance imaging (MRI) scans [112], [113].

Related Work
Machine learning (ML)and deep learning (DL) methods have been preferred by researchers across different disciplines for providing solutions to detection of different categories of attacks [109], [114], [115], [116], [117].Existing methods on SMS spam classification show that Machine learning (ML), Statistical analysis and evolutionary methods are at 49%, 39%, and 12%, respectively [84].A number of machine learning algorithms have been used to develop most SMS spam classifiers.
The authors in [115] created a dictionary using the Turn Frequency Inverse Document Frequency (TF-IDF) Vectorizer algorithm, which included all the features of words a spam SMS would possess, based on the content of the message, then referring to this dictionary, the system classifies the SMS as either spam or ham.TF-IDF is used in machine learning (ML) and text mining as a weighting factor for identifying word features [115].The weight increases as the word frequency in a document increases, however, an offset is also used to differentiate important words from common words (stop words) like 'the' or 'a' that appear often in documents [115].TF-IDF Vectorizer is therefore used often in relevance ranking and scoring and to remove stop words from ML models.According to [115], ML algorithms [118]- [122] can play a vital role in identifying spam SMS because the accuracy obtained in their study was more than 95% in every try.
Result based on existing approaches shows that 83% is based on a content approach, 5% based on non-content and 12% based on the hybrid [84].Result analysis for existing anti-spam solutions status shows that Evaluated (E), Implemented (I), Proposed (P), Proposed and Evaluated (PE) and Proposed and Implemented (PI), was 29%, 6%, 20%, 35%, and 10%, respectively [84] and their study, therefore, concludes that majority of existing SMS spam filtering solutions are between the ''Proposed'' status or ''Proposed and Evaluated'' status.Methods such as Random Forest, Dendritic Cell Algorithm, SVM, Naive Bayes, and Artificial Immune system (AIS), show optimal performance result with higher accuracy [84].
According to [123]- [125], SVM is one of the most robust algorithms that solve the problems related to classification, which plots the data items in n-dimensional space, as points where the various features of the given data acts for the given coordinates.SVM is a supervised machine learning algorithm which can be used for both classification and regression challenges, however, it is mostly used in classification problems [114], [126]- [128].SVM separates the different data groups using the boundaries based on decisions and supports both binary and multi-class classifications; a set of instances having different class values between two groups are separated by using decision boundaries [123]- [125].
Researchers in [83] presented the detection of spam and ham messages using various supervised machine learning algorithms: Naive Bayes, SVM, and maximum entropy and compared their performance in filtering of ham and spam messages.They concluded that building an SMS spam classifier using SVM gives the best results possible with an accuracy of 97.4%.Maximum entropy gave an accuracy of 91.95 % while Naïve Bayes gave an accuracy of 94.55% [83].
Multimedia SMS also pose a security challenge by increasing likelihood for SMS spam through rich media including images, videos and emoji.It is therefore imperative to adopt deep learning techniques such as Recurrent Neural Networks (RNN) which process languages, images, emoji, and videos [84], [129]- [134].
Neural networks have been applied to separate unwanted SMS (spam) messages from normal (ham) messages [129].Chandra and Khatri (2019) proposed a method utilizing RNN and LSTM using Keras models and Tensorflow backend to detect Spam and Ham from the Spam SMS Collection dataset at University of California (UCI)ML repository and achieved 98% accuracy.The proposed method in [129], that applied RNN SMS spam filtering, achieved prediction accuracy of 98.11%, and indicated a considerable improvement compared to SVM, token-based SVM and Bayesian algorithms with accuracies of 97.81%, 97.64%, and 80.54% respectively.This paper therefore presents a comparative analysis of RNN and SVM in binary classification of spam SMS.

Challenges with existing works
Google's Bidirectional Encoder Representations from Transformers (BERT) was used in spam detection on 4 different datasets [135], [136] which recorded very promising results of not less than 97% in each case.However, their model did not validate the result against traditional machine learning models [137]- [140].Machine learning researchers such as the one in [141] and [142] also developed spam detection model using deep learning and LSTM respectively.
Researchers in [143]- [145] have developed deep learning models for spam detection and compared them against various traditional machine learning algorithms, however, there is no empirical evidence on comparison with popular traditional algorithms, such as K-Nearest Neighbor and SVM, and also on how fitting of data on a similar dataset was done.It is also evident that most models do not verify and validate their results against others, so that an evidencebased decision on the choice of algorithm in relation to the type of data is factual.The authors in [141] contributed towards the solution by comparing deep learning and traditional ML algorithms [146], which unfortunately omitted KNN and SVM in the result analysis.According to [83], SVM is poised as best fit for binary classification of SMS spam dataset based on empirical result of 94% accuracy.
In this study, a deep learning model is used for binary classification of SMS data and compared against SVM model in order to validate its performance.Exploring application of deep learning with better optimization algorithms such as RNN will improve accuracy, reduce computational complexities with respect to memory consumption and speed, and thus minimize false positives.

Methodology
RNN algorithm will be used to train the dataset because it is able to use information from the past and therefore, prediction of high temporal dependencies is possible in the dataset.RNN, which is used mainly in language tasks, is an improved concept after convolutional neural networks (CNN) which is mainly used in image processing [109].A use case of RNN includes Google mail, where when you type a sentence, it auto completes it; another use case is Google translation, for named entity recognition and sentiment analysis [147], [148].Tensor Flow comes with RNN models out of the box [149].Figure 2 below illustrates learning through back propagation technique using the feed forward artificial neural network (ANN) [150], [151], [152].

Figure 2 ANN technique
When you multiply input (X) by weight and add bias it goes through different activation functions within the hidden layers (H),then finally you get the output (Y); that is known as the feed forward propagation, which is then followed by back propagation.The reason for back propagation is to modify the weights in order to minimize the error [153].The error is the square of the difference between the model output and the actual output (known).We differentiate those errors with individual weights [154]- [158].The combination of back-propagation with feed forward multi-layers usually generates finer results [125].
The challenge with ANN is that it does not have the memory concept i.e. there is no connection between the previous input-output and next input-output [147], [159].Therefore, in order to have dependencies between the data we need to use a recurrent method such as RNN.Feedback loops present in RNNs store information in 'memory' for lengthy phases [160].This allows for quick classification of a message as spam or ham just by parsing a few initial words [141].
Figure 3 above shows how multiple ANNs have been used to create aRNN, such that for every layer there is an input (Xt-1, Xt, Xt+1 and Xt+n), a hidden layer (Ht-1, Ht, Ht+1, Ht+n) and output (Yt-1, Yt, Yt+1, Yt+n).The hidden layer Ht-1passes its output to Ht, which then passes toHt+1 in that order.We use gradient descent to find the loss (L) in each layer such that: To find the change in weight we use: where ἠ is the learning rate, then we update weight as follows: The change in weight (∆w), could be very small number closer to zero, especially when the RNN structure is made up of very many layers of ANN [161-[164] and this is known as vanishing gradient challenge solved through the Long Short-Term Memory (LSTM) neural network.

Performance Comparison between RNN and SVM
TensorFlow provides basic building blocks such as fully connected layer, convolutional layer, recurrent neural network module [166-[170], and non-linear activation functions (Developers, 2022).The computation of the gradients of loss will be done on the output in order to accomplish forward pass of the model.Some of the loss functions that TensorFlow uses for computations include, mean squared error and cross-entropy.Minimization of errors is performed using autodifferentiation, which automatically calculates the gradients.To import TensorFlow we use this line of code in google Colab environment: import tensorflow as tf.
TensorFlow is an open-source machine learning framework developed by the Google Brain team.It has gained significant popularity and has become one of the most widely used frameworks for building and deploying machine learning models.Table 3 describes some key points about TensorFlow.
TensorFlow's versatility, scalability, and extensive ecosystem make it a powerful framework for various machine learning tasks.Its wide adoption and active community support ensure that it continues to evolve and stay at the forefront of the machine learning and deep learning landscape.
Various software libraries are available, that accelerate research and application of neural network models in solution of various problems.TensorFlow, originally created by researchers at Google, is the most popular among different deep learning libraries [165].Also, neural networks are flexible and scalable and thus, have the potential to promote data analysis and modeling applications; however, implementation and optimization algorithms in neural networks are time consuming and prone to errors [165].TensorFlow is an end-to-end open source platform for machine learning with a wide-ranging set of tools, and libraries for easy building and deployment of machine learning applications.TensorFlow greatly eases and accelerates the research and application of neural network models [165].TFX is a platform built on top of TensorFlow that provides end-to-end machine learning pipeline orchestration.It includes components for data ingestion, data validation, preprocessing, training, evaluation, and deployment.TFX helps streamline the process of developing and maintaining machine learning workflows at scale.

TensorFlow Hub and Model Zoo
TensorFlow Hub is a repository that hosts a wide range of pre-trained models, including both TensorFlow-native models and models from other frameworks.It allows users to easily discover, reuse, and transfer learned representations or entire models for their specific tasks.The TensorFlow Model Zoo offers a curated collection of state-of-the-art models for computer vision, natural language processing, and other domains.

Community and Documentation
TensorFlow has a large and active community of developers and researchers.This vibrant community contributes to the development of new features, provides support through forums and mailing lists, and shares resources, tutorials, and research papers.The official TensorFlow website provides extensive documentation, tutorials, and examples to help users get started and explore different functionalities.

Materials
The stages of implementation applied in this research include, data collection and pre-processing, rules formulation, libraries importation in the Google Colab platform, model development and training using deep learning (Tensorflow), and evaluation of the developed deep neural network spam detection model.#This enables panda(pd) to read the file spam.csvmounted on /gdrive and assigns it to variable called df The preprocessing stages involves case conversion, punctuation mark removal, abbreviation expansion, tokenization, stemming and stop words removal.

SVM Algorithm
In the proposed system, we have one dataset with a single feature which includes the source text and label.The only feature which is selected here is "text" that would be used to perform binary classification using machine learning recurrent deep neural networks in order to find out the most dominant and accurate model for classification with respect to SVM.Refer to the SVM algorithm in Table 4 below.In order to consider worst case possible, we have used the following characteristics: sigmoid kernel for SVC, sigmoid kernel for neural networks.First, the respective data is split into train data (80%) and test data (20%).Like the case of SVM model we have used Spam dataset in its entirety without equating the outcome values.5 below.Next, we have tokenized the data frames, followed by padding of the text and finally encoding of the labels after splitting them.

Tokenization
The text essentially was turned into tokens so that they can be processed easily by the system.The text, which is in alphanumeric format, must be converted into numerical format.This tokenization will tend to keep the order of words intact unlike vectorization.Each word will be indexed and mapped to the corresponding ones in the training dataset.This can be easily checked by printing the first SMS text both before and after tokenization: print(x_test[0]) #small x, prints actual text begging from 0+1 print(X_test[0]) #big X, prints the indexed tokens in numbers After words have been tokenized, they are made to be of same length because the model structure is the same for training, testing and deployment.Then we pad our inputs to fit that size.

Encoding
Transforms categorical variables into vectors of zeros (0s) and ones (1s) where each of the vector has a link equal to the number of output categories.Therefore, each time the model receives input label it takes it as a 1 while the other input labels are represented by 0. For instance, if you have two inputs (x1, x2) and you receive x1 the model will receive vector as (1, 0), where x1 is 1 and x2 is 0. This is known as One-hot/dummy encoding [172].

RNN Model Algorithm
We use sequential neural network with 9layers where the first layer is embedding input layer, which is determined by maximum length calculated before (790).The embedding of the word (output), essentially acts as the input of another layer.The next layer is dropout layer, which uses 20% of neurons to train the dataset.The dropouts are added to help with overtraining.There is a pool layer, which reduces neurons and additional hidden layers with 50neurons each and activation function.Finally, a prediction layer with 2neurons, which represents the number of labels we have (SPAM, HAM).After building the model, we compile by specifying the optimizer as "adam", loss as "categorical_cross entropy" and accuracy as "metrics".The evaluation of the model before fitting it shows an accuracy of 0.696179211139679, and a loss of 0.22576302289962769 which is significant.At this point our concern is more on the reduction of the loss.

Figure 10 Accuracy and loss
The model is then trained by fitting it in the dataset, specifying the epochs, which determine the number of times model will run through the data:  We performed training, testing and evaluation of SVM and RNN model with an intention to compare them.We have used a similar dataset, sample size, random state and kept all parameters constant.The validation accuracy and loss of RNN outperforms that of SVM model classifier using parameters of same dataset.Based on the graph, RNN has a slightly higher training and validation accuracy of 0.98 compared to SVM at 0.94, however the false positive rate of SVM is marginally lower.

Key findings
It has been noted that RNN and SVM are both popular machine learning algorithms used for binary classification tasks, including spam detection in Short Message Service (SMS) messages.While they approach the problem from different perspectives, both methods can be effective in identifying spam messages.Table 7 gives a summary of each these machine learning algorithms.Challenges: While Recurrent Neural Networks (RNNs) have proven to be effective for spam detection in Short Message Service (SMS), they also face certain challenges.Here are some of the key challenges associated with using RNNs for spam detection in SMS: Limited context modeling: RNNs are designed to capture sequential dependencies in data by maintaining a hidden state [173].However, in the case of SMS messages, the available context is often limited due to the short length of messages.This can make it challenging for RNNs to effectively capture and model the context necessary for accurate spam detection.Data sparsity: SMS spam detection often deals with imbalanced datasets, where the number of spam messages is significantly lower than non-spam messages.This data sparsity can lead to difficulties in training RNNs effectively, as the network may not have sufficient examples of spam messages to learn from.Addressing data sparsity requires careful pre-processing, sampling techniques, or using specialized loss functions to account for class imbalance.
Out-of-vocabulary words: SMS messages can contain informal language, slang, abbreviations, or misspelled words, which may not be present in the vocabulary used during training [174].RNNs can struggle to handle out-of-vocabulary words, as they rely on pre-existing word representations.
Handling these out-of-vocabulary words often requires preprocessing techniques like word normalization, stemming, or incorporating external sources to enrich the vocabulary.
Over-fitting: RNNs can be prone to overfitting, especially when the training data is limited.Overfitting occurs when the model learns to memorize the training data instead of generalizing well to new, unseen SMS messages.Regularization techniques, such as dropout and L2 regularization, can be applied to mitigate overfitting.Additionally, using techniques like data augmentation or incorporating external data sources can help improve generalization [176].
Computational requirements: RNNs, especially when using complex architectures like LSTM or GRU, can be computationally expensive and require significant resources for training.Training large-scale RNN models on large SMS datasets may be time-consuming and require high-performance hardware or distributed computing infrastructure.Interpretability: RNNs, particularly with deep architectures, are often considered as black boxes, meaning it can be challenging to interpret the internal workings and understand the decision-making process.Interpreting RNN-based spam detection models and providing explanations for their predictions can be difficult, which may be a requirement in certain applications or industries.

SVM
Is a widely used machine learning algorithm for binary classification tasks.It aims to find an optimal hyperplane that separates the data points of different classes with a maximum margin.In the case of spam classification, SVM tries to find a decision boundary that distinguishes between spam and nonspam messages.
To use SVM for text classification, SMS messages need to be represented as numerical feature vectors.One common approach is to convert the messages into a numerical representation using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) or word embeddings (such as Word2Vec or GloVe).These representations capture the important characteristics of the text, allowing SVM to learn the decision boundary.Can handle high-dimensional feature spaces and are known for their ability to generalize well even with limited training data.They work effectively when the data is linearly separable or when it can be mapped into a higher-dimensional space where linear separation is possible through the use of kernel functions.
Challenges: While Support Vector Machines (SVMs) have been widely used for spam detection in various domains, including Short Message Service (SMS), they also face certain challenges when applied to this specific task.Here are some challenges associated with using SVMs for spam detection in SMS: Text representation: SVMs require numerical representations of text data as input.Converting SMS messages into suitable feature vectors can be challenging, especially considering the unique characteristics of SMS language.Dealing with slang, abbreviations, misspellings, and the use of informal language in SMS requires careful preprocessing and feature engineering to ensure relevant information is captured effectively.Curse of dimensionality: SMS messages can contain a large number of features or words, leading to high-dimensional feature spaces.The curse of dimensionality refers to the fact that as the number of features increases, the sparsity of the data increases as well [176].This sparsity can adversely affect the performance of SVMs, as the available training instances become sparser, and it becomes harder to find an optimal decision boundary.
Imbalanced datasets: Like many classification tasks, SMS spam detection often involves imbalanced datasets, where the number of spam messages is much lower than non-spam messages.SVMs can be sensitive to class imbalance, leading to biased models favoring the majority class.Techniques like resampling (e.g., oversampling or under-sampling), cost-sensitive learning, or using specialized loss functions can help address the class imbalance issue.Selection of kernel function: SVMs rely on kernel functions to map the input data into a higherdimensional space where linear separation is possible.Selecting an appropriate kernel function can be challenging, as different kernels may work better for different SMS spam detection scenarios.
Choosing the wrong kernel can result in poor performance or difficulties in achieving an effective separation of spam and non-spam classes [177].Scalability and computational requirements: SVMs can become computationally expensive, particularly when dealing with large-scale SMS datasets or high-dimensional feature spaces.Training SVMs with large datasets or with complex kernels can require significant computational resources and time.This can be a challenge when dealing with real-time or high-throughput SMS spam detection systems.
Interpretability: While SVMs can provide clear decision boundaries and support vectors, they may lack interpretability when applied to text classification tasks.Understanding the importance of individual features or words in the classification decision can be challenging with SVMs, as they primarily focus on optimizing the separation between classes rather than providing feature-level interpretability.
Despite these RNNs challenges, they have demonstrated good performance in spam detection for SMS messages.
Researchers continue to explore techniques to address these challenges, such as incorporating attention mechanisms, transfer learning, or combining RNNs with other models to improve overall performance and overcome limitations.On the other hand, addressing SVM challenges often requires careful preprocessing, feature engineering, and hyperparameter tuning to achieve optimal performance.Additionally, incorporating techniques such as dimensionality reduction, ensemble methods, or combining SVMs with other algorithms may help mitigate some of the challenges associated with SVM-based SMS spam detection.Table 8 compares and contrasts RNN and SVM algorithms using key performance concepts.
Table 8 Comparison of RNN and SVM algorithms

Descriptions
Training RNNs require more computational resources and training time compared to SVMs, especially when dealing with large datasets.SVMs, on the other hand, can be trained relatively quickly.

Feature Engineering
RNNs can automatically learn representations from the input data, eliminating the need for extensive feature engineering.SVMs, however, typically require manual feature engineering to convert text data into numerical representations.

Interpretability
SVMs provide better interpretability as they offer clear decision boundaries and support vectors that can be inspected.RNNs, being more complex, are often considered as black boxes, making it challenging to understand their internal workings.

Handling Sequential Data
In practice, both RNNs and SVMs have been used successfully for spam detection in SMS messages.The choice between the two depends on the specific requirements of the problem, available computational resources, and the trade-offs between interpretability and performance.
In summary, both RNNs and SVMs have been successfully applied to spam SMS detection.RNNs excel at capturing sequential information and context but require more computational resources and lack interpretability.SVMs are efficient, interpretable, and generalize well but rely on effective text representation and manual feature engineering.The choice between the two depends on the specific requirements, available resources, and the trade-offs between interpretability and performance.

Conclusion
A mathematical concept of RNN and back propagation was used to operationalize the model under the development environment of Google Colab using Tensorflow libraries.The study also provides a block diagram, which illustrates process flow from reading of UCl Spam SMS dataset mounted on \gdrive to preprocessing, training, fitting, compilation and evaluation of results.It is evident that both SVM and RNN provides best classification results with a marginal performance in favor of RNN.Back propagation technique in RNN enables minimization of gradient loss, which reduces false positives in every given epoch.The higher the epoch the lesser the loss.Therefore, our result shows that RNN has higher probability when it comes to identification and classification of spam SMS as compared to SVM as shown in area under curve score (AUC) score above.In general, the experimental setting proves that validation accuracy and loss of RNN outperforms that of SVM model classifier using parameters of the same UCl Spam SMS dataset.We suggest use of more performance metrics to validate the model in a distributed dataset environment as a future work.

Figure 3
Figure 3 Unfolded Representation of RNN

Figure 5
Figure 5 SMS dataset on Google drive

Figure 6
Figure 6 SVM model result The train data is again split into Training set (80% of Train data) and Validation set (20% of Train data) as shown in Table5below.Next, we have tokenized the data frames, followed by padding of the text and finally encoding of the labels after splitting them.

Figure 9
Figure 9 Model summary

Table 3
Key points about TensorFlowArchitectureTensorFlow follows a flexible architecture that allows users to define and execute computational graphs.The core of TensorFlow is based on a data flow graph, where nodes represent mathematical operations and edges represent the flow of data between operations.This graph-based approach enables efficient parallel computation and distributed training across multiple devices or machines.TensorFlow provides a comprehensive ecosystem of tools, libraries, and high-level APIs that simplify the process of developing and deploying machine learning models.The most commonly used high-level API is TensorFlow Keras, which offers a user-friendly interface for defining and training models with minimal boilerplate code.Model Deployment TensorFlow offers various options for deploying trained models in production environments.It provides tools for model serialization and serving, allowing models to be deployed as web services or integrated into existing applications.TensorFlow Serving, TensorFlow Lite, and TensorFlow.js are examples of deployment frameworks that cater to different deployment scenarios.

Table 6
model=tf.keras.Sequential([3 layers of relu and sigmoid function]) Step 11: Compile model by invoking Optimizer class with Adam's learning rate and compute loss function (BinaryCrossentropy())and accuracy

Table 7
Summary of RNN and SVM for SMS spam detection They are well-suited for text-based classification tasks, including spam detection.RNNs have the ability to capture the temporal dependencies in SMS messages, considering the ordering and context of words within the message.Process input sequences step by step, maintaining a hidden state that captures the previous context.They learn to predict the next word or classify the text based on the accumulated information from the previous steps.This hidden state allows RNNs to capture long-term dependencies, making them effective for spam detection where the presence of certain words or patterns throughout the message is important.Can be trained using various architectures, such as vanilla RNNs, Long Short-Term Memory (LSTM), or Gated Recurrent Units (GRU).LSTM and GRU are more commonly used due to their ability to mitigate the vanishing gradient problem, allowing them to capture long-term dependencies more effectively.