Pothole detection using image surveillance system: A review

Recent studies have shown that researchers have proposed various techniques for Pothole detection using data collected from different parts of the world. Automating pothole detection will go a long way in providing safe driving for road users and intelligent transportation systems. This is not only necessary to guarantee safe and adequate performance, but also to adjust to the drivers’ needs, potentiate their acceptability, and ultimately meet drivers’ preferences in bad roads. Machine learning and Object detection algorithms are mainly traditional or deep learning based. Currently, algorithms based on deep learning are widely used in various fields as a mainstream method of object detection. This paper reviewed the various pothole detection systems with different road characteristics and dataset locations. This work was able to highlight various machine learning and object detection techniques that can be applied to pothole detection which has been used in different road characteristics and their corresponding form of dataset as presented by various researchers across the world.


Introduction
"Surveillance" is a French word the action to watch over. Surveillance is the act of monitoring persistent and transient objects within a certain environment. Visual surveillance attempts not only to detect, recognize, and track interesting objects in the scene but most importantly understand and describe objects' behaviors [1,2]. These systems are mainly used to detect any suspicious activity within these environments. They are considered important tools that assist humans by extending their perception and reasoning capabilities about various situations of interest. Over the last decade, these systems have gained much attention from the academic community, industry, and governments. Visual surveillance has grown remarkably due to security and safety concerns. High levels of precautious and defensive measures are needed in all fields.
Video surveillance systems produce huge amounts of data for storage and display. Long-term human monitoring of the acquired video is impractical and ineffective. Automatic abnormal motion detection system that can effectively attract operator attention and trigger recording is considered as the key to successful video surveillance in dynamic scenes, such as airport terminals. The video-surveillance architectures are used with limited computing power and available near the camera for compression and communication. The algorithm uses the macro block motion vectors that are generated in any case as part of the video compression process. Motion features are derived from the motion vectors [3].
The algorithm is modular, in the sense that different feature vectors are suggested and alternative probability density estimation or modeling methods are being used. The input to the algorithm is the set of macro-block motion vectors (such as intra-frame and intra-block flags) that are produced anyway by the compression process which is an essential part of many modern video surveillance systems [3]. The algorithm is used mainly for triggering video recording for later human analysis and then transmission to a human observer by detecting the 'object' that generated the abnormal motion. An algorithm can work with the idea of building a model of the static scene (i.e. without moving objects) called background, and compares every frame of the sequence to this background with respect to discriminate the regions of abnormal motion, called foreground (the moving objects) [4]. These algorithms do not use much resource like computing power and memory. These used algorithms imply that the statistical measures on the temporal activity must be kept locally available in every pixel and which needs to constantly be updated. This algorithmic approach makes use of the single model like the previous frame or a temporal average for the background and global threshold for decision. Also there are some background estimation methods that are based on the analysis of the histogram check of the values that are taken by each pixel within a fixed number of past frames, with the features like mean, median or mode of the histogram that can be chosen to set the background value and the foreground that can be discriminated by comparing the difference between the current frame and the background with the histogram check variance. Therefore, the video surveillance system requires the recursive methods that do not keep in memory a histogram for each pixel, but rather a fixed number of estimates are computed recursively. This value of the variance is used directly afterward for the detection of moving area [4].

Related Works
Object detection is also known as image segmentation or foreground detection. It aims at identifying the constituting regions of an image or partitioning that image into a set of meaningful regions that cover it [5]. A common definition of detection is the way an image can resolve itself into significant important objects and the background on which these objects lie. The type of objects detected depends on the problem being solved. Actually, object detection has been widely employed in computer vision applications. It is used in face detection, 3-D reconstruction, video compression, medical imaging, augmented reality, robotics, content-based indexing and retrieval, video surveillance, and many others.
Each one of these applications requires identifying different types of objects. In surveillance systems, detection involves decomposing an image into two main categories: foreground and background. Foreground includes moving objects such as human beings, cars, machines, etc... Background represents the rest of the image describing the inside of a room, hall, highway, production site, forest, etc... Background is considered the stationary part of the image [6].

Object Detection Approaches
Object detection has received a lot of attention due to its need in a wide range of vision applications. There are several techniques for object detection. The following are three main approaches: Optical flow, temporal difference and Background subtraction [7].
 Optical flow techniques use flow vectors of moving objects over time to identify foreground in an image. For example, the idea in is to compute displacement vectors to initialize a contour based tracking of articulated objects. This type of techniques is usually employed with moving cameras. Optical flow schemes are however computationally complex and cannot be used in real time applications without specialized hardware [8].  Temporal difference involves subtracting two or more consecutive frames, followed by thresholding to identify moving objects [9]. The underlying idea is that objects change locations over time whereas background is mostly static. This is translated in almost constant intensity values for background pixels from frame to frame and varying ones for foreground pixels. So when performing pixel by pixel subtraction of consecutive frames, pixels whose intensity values changed considerably are identified as moving foreground. This technique is very adaptive and suitable for dynamic environments. However, it suffers from the aperture problem. It fails to extract all relevant interior object pixels when objects are static or slowly moving [6].  Background subtraction techniques are the most common schemes in surveillance systems with stationary cameras. A background image or reference model is kept and updated regularly. It represents an estimate of the appearance of the background in the image. Every frame is compared to this background model.

Machine Learning Algorithms
There are three types of Machine Learning Algorithms [15]  Supervised Learning: This algorithm consists of a target / outcome variable (or dependent variable) which is to be predicted from a given set of predictors (independent variables). Using these set of variables, they generate a function that map inputs to desired outputs. The training process continues until the model achieves a desired level of accuracy on the training data. Examples of Supervised Learning: Regression, Decision Tree, Random Forest, KNN, Logistic Regression etc.  Unsupervised Learning: This algorithm does not have any target or outcome variable to predict/estimate. It is used for clustering population in different groups, which is widely used for segmenting customers in different groups for specific intervention. Examples of Unsupervised Learning: A priori algorithm, K-means.  Reinforcement Learning: Here the machine is trained to make specific decisions. It works this way: the machine is exposed to an environment where it trains itself continually using trial and error. This machine learns from past experience and tries to capture the best possible knowledge to make accurate business decisions. Example of Reinforcement Learning is Markov Decision Process

Common Machine Learning Algorithms
According to [16], here are commonly used machine learning algorithms. These algorithms can be applied to almost any data problem:

Linear Regression
It is used to estimate real values (cost of houses, number of calls, total sales etc.) based on continuous variable(s). Here, we establish relationship between independent and dependent variables by fitting a best line. This best fit line is known as regression line. Simple Linear Regression and Multiple Linear Regression exist. Simple Linear Regression is characterized by one independent variable and Multiple Linear Regression (as the name suggests) is characterized by multiple (more than 1) independent variables. While finding the best fit line, you can fit a polynomial or curvilinear regression. And these are known as polynomial or curvilinear regression.

Logistic Regression
It is a classification, not a regression algorithm. It is used to estimate discrete values (Binary values like 0/1, yes/no, true/false) based on a given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, it is also known as logit regression. Since, it predicts the probability, its output values lie between 0 and 1 (as expected).

Decision Tree
It is a type of supervised learning algorithm that is mostly used for classification problems. Surprisingly, it works for both categorical and continuous dependent variables. In this algorithm, the population is split into two or more homogeneous sets. This is done based on the most significant attributes/ independent variables to make as distinct groups as possible.

SVM (Support Vector Machine)
Support Vector Machines (SVM) is a supervised learning method used for regression and classification [17]. The algorithm tries to find an optimal hyperplane that separates the d-dimensional training data perfectly into its classes. An optimal hyperplane is one that maximizes the distance between examples on the margin (border) which separates different classes. These examples on the margin are the so-called support vectors. Since training data is often not linearly separable, SVM maps data into a high-dimensional feature space through some nonlinear mapping. In this space, an optimal separating hyperplane is constructed. In order to reduce the computational cost, the mapping will be performed by kernel functions, which depend only on input space variables. The most used kernel functions are linear, polynomial, radial base function (RBF) and sigmoid.

Naive Bayes
It is a classification technique based on Bayes' theorem with an assumption of independence between predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter.

KNN (K-Nearest Neighbors)
It can be used for both classification and regression problems. However, it is more widely used in classification problems in the industry. K nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors. The case being assigned to the class is most common amongst its K nearest neighbors measured by a distance function. These distance functions can be Euclidean, Manhattan, Minkowski and Hamming distance.

K-Means
It is a type of unsupervised algorithm which solves the clustering problem. Its procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters). Data points inside a cluster are homogeneous and heterogeneous to peer groups. Remember figuring out shapes from ink blots, k means is somewhat similar to this activity. You look at the shape and spread to decipher how many different clusters/populations are present. [16].

Random Forest
Random Forests (RF) are sets of decision trees that vote together in a classification. Each tree is constructed by chance and selects a subset of features randomly from a subset of data points. The tree is then trained on these data points (only on the selected characteristics), and the remaining out of bag is used to evaluate the tree. [18].

Dimensionality Reduction Algorithms
In the last 4-5 years, there has been an exponential increase in data capturing at every possible stages. Corporates/Government Agencies/Research organisations are not only coming with new sources but also they are capturing data in great detail. For example: E-commerce companies are capturing more details about customer like their demographics, web crawling history, what they like or dislike, purchase history, feedback and many others to give them personalized attention more than your nearest grocery shopkeeper. As a data scientist, the data offered also consist of many features, this sounds good for building good robust model but there is a challenge. How'd you identify highly significant variable(s) out of 1000 or 2000, In such cases, dimensionality reduction algorithm helps us along with various other algorithms like Decision Tree, Random Forest, PCA, Factor Analysis, identify based on correlation matrix, missing value ratio and others.

Gradient Boosting Algorithms
GBM is a boosting algorithm that is used to deal with plenty of data and to make a prediction with high prediction power. Boosting is actually an ensemble of learning algorithms that combines the prediction of several base estimators in order to improve robustness over a single estimator. It combines multiple weak or average predictors to build strong predictor. These boosting algorithms always work well in data science competitions like Kaggle, AV Hackathon, and Crowd Analytix. Other types of Gradient Boosting Algorithms are i. Extreme Gradient Boosting (XGBoost), ii. LightGBM and iii Category and Boosting (Catboost)

Bayesian Networks (BNs)
According to [19], Bayesian Networks (BNs) belong to the family of probabilistic graphical models. These graph structures are used to represent knowledge about an uncertain domain. In particular, each node in the graph represents a random variable, while the edges between the nodes represent probabilistic dependencies among the corresponding random variables. Such conditional dependencies in the graph are often estimated using known statistics and computational methods. Thus, Bayesian networks combine principles of graph theory, probability theory and statistics.

Artificial neural networks
Artificial Neural Networks (ANN) are composed of several computational elements that interact through connections with different weights. With inspiration from the human brain, neural networks exhibit features such as the ability to learn complex patterns of data and generalize learned information [20]. The simplest form of an ANN is the MultiLayer Perceptron (MLP) consisting of three layers: the input layer, the hidden layer, and the output layer. [21] States that the learning processes of an artificial neural network are determined by how parameter changes occur. Different types of neural networks use different principles in determining their own rules.

Feedforward Neural Network -Artificial Neuron
This is one of the simplest types of artificial neural networks. In a feedforward neural network, the data passes through the different input nodes until it reaches the output node. In other words, data moves in only one direction from the first tier onwards until it reaches the output node. This is also known as a front propagated wave which is usually achieved by using a classifying activation function.

Radial Basis Function Neural Network
A radial basis function considers the distance of any point relative to the centre. Such neural networks have two layers.
In the inner layer, the features are combined with the radial basis function. Then the output of these features is taken into account when calculating the same output in the next time-step.

Multilayer Perceptron
A multilayer perceptron has three or more layers. It is used to classify data that cannot be separated linearly. It is a type of artificial neural network that is fully connected. This is because every single node in a layer is connected to each node in the following layer.
A multilayer perceptron uses a nonlinear activation function (mainly hyperbolic tangent or logistic function).
Here's what a multilayer perceptron looks like.

Convolutional Neural Network
A convolutional neural network (CNN) uses a variation of the multilayer perceptron's. A CNN contains one or more than one convolutional layer. These layers can either be completely interconnected or pooled. Before passing the result to the next layer, the convolutional layer uses a convolutional operation on the input. Due to this convolutional operation, the network can be much deeper but with much fewer parameters. Due to this ability, convolutional neural networks show very effective results in image and video recognition, natural language processing, and recommender systems. Convolutional neural networks also show great results in semantic parsing and paraphrase detection. They are also applied in signal processing and image classification. CNN's are also being used in image analysis and recognition in agriculture where weather features are extracted from satellites like LSAT to predict the growth and yield of a piece of land.

Recurrent Neural Network (RNN) -Long Short-Term Memory
A Recurrent Neural Network is a type of artificial neural network in which the output of a particular layer is saved and fed back to the input. This helps predict the outcome of the layer.

Modular Neural Network
A modular neural network has a number of different networks that function independently and perform sub-tasks. The different networks do not really interact with or signal each other during the computation process. They work independently towards achieving the output.

Sequence-To-Sequence Models
A sequence-to-sequence model consists of two recurrent neural networks. There's an encoder that processes the input and a decoder that processes the output. The encoder and decoder can either use the same or different parameters. This model is particularly applicable in those cases where the length of the input data is not the same as the length of the output data. Sequence-to-sequence models are applied mainly in chatbots, machine translation, and question answering systems.

Conclusion
Our main goal in this paper was to review on how to improve safe driving through an image survelliance system with special reference to Nigeria. Unavailability of data in most of the African countries including Nigeria made it difficult for researchers and even industries to get a first-hand system that could be specific to this part of the world. Many of the systems available in handling safe driving usually drops its efficiency when applied in developing countries of the world like in Africa because of unavailability of trainable data or application of generic data collected that may not suite some specific areas within the continent. Therefore, the need for collection of local data cannot be overemphasized. And this data when trained and implemented will take a further step to influence drivers and vehicles safety through active and passive corrective feedback towards safer and friendly practices, the local data will also provide an enhanced feature for car manufacturers when such data are deployed in an embedded system designed in vehicles that are used in this part of the world. Hence prospective researchers having gone through this work can understand what has been done in survelliance for pothole detection. They are now provided with choices to take with respect to machine learning method, data collection methods and the form of data to be applied in their own research. This review, further exposed the need for researchers to delve into working in African environment since not much has been done in terms of data capturing as they relate to intelligent transport system in the continent and when these work are done, Nigerian in particular and Africa as a continent will be closer to curtailing the disturbing effect of road accidents in this part of the world.