LIMITING COVID-19 INFECTION BY AUTOMATIC REMOTE FACE MASK MONITORING AND DETECTION USING DEEP LEARNING WITH IOT

Researchers on the remote control of electronic devices are becoming more interested in the following fields to simplify lifestyles: Human-Computer Interaction (HCI), Artificial Intelligence (AI), and Embedded Systems (ES) [1]. Besides, Artificial Intelligence (AI) has now become a required part of the medical sector, especially with the development of integrated circuits, sensors, and cameras [2]. Specifically, the spread of COVID-19 around the world and other spreadable diseases made the countries call for a complete lockdown for all sectors of life, which was affecting the normal life of humans such as daily labor and business. As a result, that affected the world economy. However, the lockdowns failed to save the situation of the world, moreover, raised other problems and issues like ceaselessness, poverty, unemployment, and also fear of the future. Therefore, the fight against COVID-19 is unstoppable, to safeguard human life until the exact vaccine is discovered. AI/Deep learning can enhance and respond to the situation, through inspecting different aspects of the pandemic [3]. In terms of COVID-19 nature, which is an acronym for Coronavirus Disease 2019, it is a respiratory disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Also, it is an infectious virus belonging to a single-stranded family. SARS-CoV-2 affects the respiratory system and induces symptoms such as cough, fever, exhaustion, and breathlessness [4]. In this paper, AI and Deep learning with a proposed and designed framework will be used to battle the COVID-19 outbreak. AI has several types of confronting, or at least to limit this pandemic as explained in [5]. One of the possible ways is to confront this contagious disease to ensure that all individuals are wearing a face mask. Faces with and without masks are depicted in Fig. 1, which shows the differences between the two classes (masked face and non-masked face). These photos in Fig. 1 have been taken from the published SMFRD (Simulated Masked Face Recognition Dataset) database, which is used in this research paper. LIMITING COVID-19 INFECTION BY AUTOMATIC REMOTE FACE MASK MONITORING AND DETECTION USING DEEP LEARNING WITH IOT


Introduction
Researchers on the remote control of electronic devices are becoming more interested in the following fields to simplify lifestyles: Human-Computer Interaction (HCI), Artificial Intelligence (AI), and Embedded Systems (ES) [1]. Besides, Artificial Intelligence (AI) has now become a required part of the medical sector, especially with the development of integrated circuits, sensors, and cameras [2]. Specifically, the spread of COVID-19 around the world and other spreadable diseases made the countries call for a complete lockdown for all sectors of life, which was affecting the normal life of humans such as daily labor and business. As a result, that affected the world economy. However, the lockdowns failed to save the situation of the world, moreover, raised other problems and issues like ceaselessness, poverty, unemployment, and also fear of the future. Therefore, the fight against COVID-19 is unstoppable, to safeguard human life until the exact vaccine is discovered. AI/Deep learning can enhance and respond to the situation, through inspecting different aspects of the pandemic [3]. In terms of COVID-19 nature, which is an acronym for Coronavirus Disease 2019, it is a respiratory disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Also, it is an infectious virus belonging to a single-stranded family. SARS-CoV-2 affects the respiratory system and induces symptoms such as cough, fever, exhaustion, and breathlessness [4].
In this paper, AI and Deep learning with a proposed and designed framework will be used to battle the COVID-19 outbreak. AI has several types of confronting, or at least to limit this pandemic as explained in [5]. One of the possible ways is to confront this contagious disease to ensure that all individuals are wearing a face mask. Faces with and without masks are depicted in Fig. 1, which shows the differences between the two classes (masked face and non-masked face). These photos in Fig. 1 have been taken from the published SMFRD (Simulated Masked Face Recognition Dataset) database, which is used in this research paper.
as . For example, the detection of a face mask from camera images [11]. Another concept was conceived to restrict the spread of COVID-19 using drones, which helps to implement and monitor patients who violate quarantine and ensure adherence to wearing face masks. These drones are mounted with cameras designed to send guidance and warnings to residents who are not wearing a face mask or ignoring emergency protocols [12], however, the disadvantage of drones is the need for batteries and charging, which make this idea expensive to be implemented.
Recently, the Internet of Things (IoT) framework has also been used to mitigate COVID-19 challenges as in [13]. Also, confronting COVID-19 by monitoring the temperature remotely using the infrared thermal camera is presented in the paper [14], here in case the temperature is abnormal, then caution will be sent to authorities. However, the drawback of this work is that the abnormal individuals' temperature might be a reason for another disease rather than COVID-19. In reference [15], several IoT-based solutions are aiming to increase COVID-19 indoor safety. For instance, contactless temperature sensing is implemented by Arduino Uno using an infrared sensor [16]. While face mask detection and social distancing checks are performed by selecting one of the computer vision techniques on camera-equipped Raspberry Pi.
As it is noted, embedded systems can be controlled remotely via the Internet of Things (IoT), as in the reference [17]. This control is performed by any signal or image after being analyzed and understood by the smart system. The event is therefore taken based on the analysis. For example, facial emotions, which are smiling as one state or non-smiling as the second state, are the vision signal (biosignal) captured by the remote camera. Here, a combination of both machine learning (smiling recognition) based on HOG and SVM extraction for classification, and embedded systems (IoT remote control) consisting of MCU (TX and RX) nodes and Arduino are presented to achieve such monitoring.
The main problem or issue noticed from the literature is that some works are handling the face mask detection but are limited to a small area or room without expanding the coverage range, as well as low detection rate. At the same time, the spread of COVID-19 around the world is affecting the normal life of humans such as daily labor and business. As a result, that affected the world economy, which raises other difficulties, for instance, poverty and unemployment.

The aim and objectives of the study
The aim of this study is to limit the contagiousness of COVID-19 in a public area, especially in a large area or throughout the city where it is difficult to be controlled by officers who enforce individuals to wear a face mask.
To achieve this aim, the following objectives are accomplished: -to propose a face mask detection system with high accuracy for monitoring individuals by using deep learning; -the control will not be limited to one area. This control and monitoring might be connected to another place of management control by transferring the signals throughout the Internet as an Internet of Things (IoT) application based on COVID-19 [18]. This is beneficial if the program is located in one place such as a server place, which controls multi-place or multi-building for monitor-Therefore, this research is necessary to be conducted to improve the situations in smart cities for the upcoming years in terms of medical sectors. This can be achieved by monitoring people via their faces wearing a mask or not, to be part of confronting the Covid-19 infection with its different generations.

Literature review and problem statement
In terms of the recent works in this regard, it is preferred to divide the review into two parts. The first part is related to face mask detection, which is based on computer vision and supervised deep-learning training, and the second part is related to the Internet of Things (IoT) works, then the connection between the two parts is considered the literature review of the paper topic. It is worth mentioning that the significance of this work is to be a part of updating and developing smart cities and logistics to battle the recent COVID-19 pandemic [6].
In terms of the training and prediction whether the face is wearing a mask or not, several works have been presented in the literature such as in reference [7], a transfer learning using the InceptionV3 method is used to model a classifying system, which is used to identify the people who are wearing a mask or not. The trained and tested database used the Simulated Masked Face Recognition Dataset (SMFRD) and achieved an accuracy of 100 % during testing, but the mask detection is still an unresolved issue because here the disadvantage is working in a small area and not generalized throughout a building or a local area to improve the control. Another work [8] used deep learning for real-time face mask recognition with an alarm system. The database has 25,000 images using 224×224 pixel resolution and achieved an accuracy rate of 96 % as to the performance of the trained model. The implementation was accomplished using a Raspberry Pi-based real-time facemask recognition, however, the recognition rate is not accurate. An alternative method for face mask detection is presented in [7], where a one-stage detector that consists of a feature pyramid network is used to fuse high-level semantic information with multiple feature maps. Then, in the classification, a cross-class object removal algorithm is exploited to reject predictions with low confidences and the high intersection of the union, the results on a public face mask dataset are suitable by implementing on a light-weighted neural network Mobile Net for mobile devices [9], but still, the problem has not been resolved as that system is working for a small area, which is considered as a disadvantage. While the approach presented in the paper [10] tried to come up with a face mask detection system for a large area of the entire city by developing a framework that restricts the growth of COVID-19. This is done by searching for people who are not wearing any facial mask in a smart city network with Closed-Circuit Television (CCTV) cameras. This method is also performed by using deep learning as training a model and achieved 98.7 % accuracy in disatinguishing people with and without a facial mask. However, the accuracy of that approach might be improved to decrease the detection error.
Also, as noted in the literature, deep learning (DL) algorithms have been successfully used by researchers to minimize the COVID-19 pandemic with the assistance of medical IoT devices. Where medical IoT devices are increasingly becoming part of pandemic control systems such ing the people and give an alarm for those people who are not wearing a mask.

Materials and methods
The entire proposed idea is divided into two main phases as depicted in Fig. 2. The first one is dedicated to deep learning, which contains both training and testing operations by using Convolutional Neural Network (CNN) [19]. The second phase is dedicated to the Internet of Things (IoT) [20] for transmitting and receiving the resulted signal using WAN as an application of the Internet of Things (IoT) [21].
1. Deep learning. Usually, the deep learning (DL) [22] technique can be exploited for the prediction task by using the convolutional neural network [23]. To explain DL in terms of the proposed paper as depicted in Fig. 2, DL begins with a dataset of pictorial files containing a human face with 2 categories. One image category is for a face wearing a normal mask (covered nose & mouth), the second category is for the human face without wearing a mask. The next step is pre-processing, which includes some image processing tools that can improve the image and adjust the size to unify all the images that are planned for the next steps.
In this paper, histogram equalization has been exploited for the contrast enhancement, and the resize of the image based on bilinear interpolation has been used for the proposed unified size to the whole images of the dataset to be 46 x46 x3 for height, width, and the three channels for RGB. After that, a face detection algorithm is used to focus only on the face and not on the whole background. During this research, the Viola-Jones algorithm is used to extract the region of interest (ROI). Next, the convolutional neural network is used for the extraction of features with training and testing operations. Convolutional layers have been selected after a large number of experiments to extract the best accuracy, and the least number of layers has also been maintained in this research to be eligible for use in lightweight devices. The sequence of the layers is shown in Table 1.
After that, the trained model is extracted to be stored for future prediction, to predict whether the face is wearing a mask or not. Then, further steps and events will be taken remotely by using IoT. More details of the connection are explained in the IoT sub-section. It is worth mentioning that the dotted line arrows in Fig. 2 represent the testing path while the solid line arrows represent the training path. Table 1 explains the number of layers with their names and specifications. The first layer is a convolution operation between the window of the image and 10 filters. Then, the padding operation is applied to the input window of the sub-images, so that the output size is the same as the input size. Afterward, a batch normalization layer is applied to normalize each channel across a mini-batch for the sake of reducing sensitivity to variations within the data. Then, a rectified linear unit layer to perform a simple threshold operation, where any input value less than zero will be set to zero. Next, the layer named max pooling is performed.
A max-pooling layer divides the input into rectangular pooling regions and outputs the maximum of each region. It is a kind of down-sampling by a factor of n, (in this research, n is selected to be 2). These aforementioned four sub-layers are repeated three times exactly. However, the difference is only with the number of filters in the convolution process. The numbers of the convolutional filter in the proposed convolutional neural network are as follows: Conv-1, Conv-2, and Conv-3, to be as follows 10, 64, and 30 filters, respectively. These filter numbers have been selected by the trial-and-error method for achieving the best accuracy result.
A max-pooling layer divides the input into rectangular pooling regions and outputs the maximum of each region. It is a kind of down-sampling by a factor of n, (in this research, n is selected to be 2). These aforementioned four sub-layers are repeated three times exactly. However, the difference is only with the number of filters in the convolution process. The numbers of the convolutional filter in the proposed convolutional neural network are as follows: Conv-1, Conv-2, and Conv-3, to be as follows 10, 64, and 30 filters, respectively. These filter numbers have been selected by the trial-and-error method for achieving the best accuracy result. Internet of Things (IoT). In this work, a scenario of an airport entrance has been adopted to elaborate the proposed system. The IoT proposed system consists of NodeMCU [24,25], as a transmitter (TX), connected to a cloud server, and another NodeMCU, which is located remotely as a receiver (RX). In the proposed system, TX is connected directly to an AI system to receive the predicted result of wearing a mask or not. This result will be encapsulated in a message with a field number and API key. It will then be sent in real time to the server via the Internet where the server will send it directly to RX. The benefit of using a field number is that the server can receive multiple messages from different ports in the airport so that by using a field number the server can recognize the specific port. On the other side, RX is connected to the airport entrance and allows the passenger to enter the airport according to the received message. If the passenger is wearing a mask, a positive signal will be sent to RX and open all ports for the passenger. If not, RX sends an alarm message to the passenger to wear a mask. As shown in Fig. 3, the architecture of the IoT phase is divided into three layers (as the most basic IoT architecture), perception layer, network layer, and application layer.
Firstly, the perception layer is considered the physical layer such as the camera, which captures the human face, as well as the NodeMCU, which decides the mask-wearing status from the AI phase. Secondly, the network layer is responsible for connecting the NodeMCU (physical parameter) to the server and any other network devices. In this paper, the NodeMCU (TX) is connected to the cloud server via the Internet and to the remote side of another NodeMCU (RX), which was located remotely. In other words, (TX) is considered the first party, and (RX) is the second party. Thirdly, the Application layer has the responsibility of delivering a specific service to the user as an application.
The benefit of using a cloud server is to control the system remotely in the data center. Therefore, the controller can monitor the passenger regarding face mask wearing in different airport entrances. The second reason for using a cloud server is to increase the security of the IoT system. To open the airport doors, the NodeMCU (TX) sends a message to the cloud server, this message contains the mask-wearing signal with an API key, channel ID, and field number to the server.
The experimental setup is explained in this section to evaluate the proposed system. The experiment is made up of two stages. The first stage concerns deep learning, training, and testing to build a reference model for mask detection. While the second stage is to use the built reference model to remotely control the LED. Once the control has been successfully applied to the LED, several applications may be controlled in the same manner as that controlled by the LED.
Regarding the first stage, many experiments were conducted on public databases of face mask detection. A wellknown dataset called Simulated Masked Face Recognition Dataset (SMFRD) was exploited to determine the accuracy of the proposed method for phase 1, each with a separate number of observations. The database is described in Table 2, which shows the number of images with and without wearing masks. The dataset is labeled with_mask and without_mask having a count of samples as 690 and 686, respectively.  [26] In Table 3, two classes are available in the system, Masked, and Non-masked. The confusion matrix for machine learning is explained as in [27,28], which contains four main parameters: TP is deemed a variable of the unmasked correctly predicted by the proposed model for unmasked tested samples, TN is a variable of masked samples for correctly predicted masked faces. FP is a masked variable that is expected to be falsely unmasked. For the fourth variable, FN is expected to be unmasked as incorrectly masked by the proposed system. Table 3 Confusion matrix for face mask detection, two classes: Masked, Non-masked In the experiments, based on the above confusion matrix, the accuracy metric is calculated based on (1): where TP -true positive; TN -true negative; FP -false positive; FN -false negative. Now from (1), increasing TP and TN parameters as high as possible is preferred to achieve the best accuracy. In terms of the second stage of the experiment as depicted in Fig. 4, the second party (receiver) must respond to the first party (transmitter). Where the first party is considered the commander, which sends an instruction of control signal based on the detected result whether the signal is referred to as masked or non-masked class. Fig. 4 illustrates the procedure of the IoT. It is obvious that the information flows from the first party to the second one through the Internet using NodeMCU TX and RX. The experiment is performed under the following specifications: the training is characterized by 200 iterations with 6.5 minutes on a single CPU with the 0.01 learning rate, the platform is Windows 7, Core i3, 6-RAM workstation.

1. Face Mask Detection (FMD)
The results of this research are presented in three parts. The first one is related to the training and testing of deep learning that consists of CNN. The second part is about controlling the remote embedded devices via the trained model. The third part shows the server responses during transferring the signal between masked and non-masked signals that are generated from the detected device. Table 4 shows the number of the tested and trained samples that are used for four independent experiments with the corresponding accuracy. The database used in this research is the Simulated Masked Face Recognition Dataset (SMFRD) [26]. Table 4 Accuracy of the trained reference model The results in the form of a confusion matrix of the four experiments conducted in this research are shown in Table 5. Four experiments have been run and recorded, as it is clear that experiment 1 has a total accuracy up to 99.64 % that comes from the testing of 20 % of the full dataset. Accordingly, 20 % of the dataset is 275 samples. In experiment 1, only 1 sample has been predicted incorrectly (i.e., the image sample with a mask is predicted wrongly as non-masked). Therefore, the accuracy is calculated as follows: (136+136)/(136+0+136+1)=99.63 %.
Similarly, in experiment 2, as it is obvious in Table 5, the confusion matrix tells that two samples have been wrongly predicted as a masked face, while the two should be non-masked. Also, one image sample has been wrongly predicted as non-masked, while in fact, it must be a masked face. Thus, the calculation of the accuracy is as follows: (136+136)/(136+2+136+1)=98.91 %, and so on for experiments 3 and 4 as specified in Table 5.  According to the aforementioned results, the four experiments show acceptable accuracies (average accuracy is up to 98.98 %). Consequently, the trained mask detection model can be used for future prediction with an IoT connection.

2. Face Mask detection (FMD) with the Internet of Things (IoT)
The second part of the results is reported by showing and visualizing the mask detection process remotely via the Internet, where the Arduino Uno is used as the interface between the PC and the transmitter Node-MCU-TX. The mask-wearing status will be sent in real time to the Node-MCU-RX to do a specific action (such as opening the airport entrance for masked persons) depending on the received signal via the Internet. Fig. 5, a depicts the case in which the face is detected as wearing a mask, where a green-colored box appears on the masked image and TX sends a signal to RX to turn the green LED on, as shown in Fig. 5, b. Similarly, Fig. 6 shows the case in which a face has been detected as non-wearing a mask. Consequently, the TX indicator is turned on to red color, and then the RX indicator also will be turned to red color. Here, in TX and RX, a signal is transferred remotely by using the Wireless LAN (WLAN) as a form of IoT connection.
Similarly, if the LED will be turned green in TX, certainly the RX LED will be switched to green directly. This is deemed as a sign of the face wearing the mask again.
Also, a result graph is presented in Fig. 7. This graph shows the status of the detected masked face after running the proposed system for 10 minutes. Where logic one "1" represents that, for example, the passenger was wearing a mask (masked face detected), while logic zero "0" represents that the passenger is not wearing a mask (non-masked face).
As it is clear in Fig. 7, there were several switches from the case "1" wearing a mask to case "0" non-wearing a mask within 10 minutes as a period. Certainly, the switching beitween the two states was a result when the individual wore or took off the mask out of his/her face for simulation and testing. Therefore, in the graph, there are approximately 16 switching times from "0" to "1" and vice versa.

Discussion of the experimental results of the proposed mask monitoring system
The result of the experiments as explained above assured the possibilities of the implementation, which consists of both face mask detection and control by an embedded system remotely or sending a warning signal remotely by using the Internet. As in Fig. 7, the proposed system has been simulated by turning it on for 10 minutes as testing. The testing is achieved by an individual who tried to change the state of his face many times and the IoT system correctly responded to his face accordingly.
In this idea, the detection is based on a computer vision by recognizing a face by camera depending on the accuracy of recognition. This idea is relatively better than the temperature measurement idea. Because any expected high temperature is not related strictly to COVID-19, it might be another disease.
In comparison, the work in [10] has the same aim as the proposed idea, which is monitoring faces wearing masks in the entire city or a big building. However, the recognition rate of the proposed system (98.98 %) is relatively higher than that in [10], which is 98.7 %. Also, monitoring operation for the entire city or by using the IoT is better than using drones as seen in the literature review [12] for a b   Fig. 7. Status of wearing a mask and non-wearing a mask in Web server of the IoT monitoring to avoid any obstacles of batteries and charging operation for the drones. Therefore, the FMD based on IoT is much more secure and easier in terms of implementation. It is worth mentioning that the limitation of this methodology is the ability to capture more than one individual at the same time. This depends on the processor unit speed that can handle one individual after another before disappearing from the camera during walking. In other words, the embedded system that is bearing this software must be strong in terms of capacity and processing speed.
As everything has advantages and disadvantages, light reflection and brightness negatively affect the accuracy of recognition. Thus, to eliminate this disadvantage, trying to perfect the lightness for the camera, which is essential to work properly and send correct cautions to the management center or authorities.
The difficulties of this study lie in terms of the implementation of face mask detection and informing the authorities throughout the city to make the smart city. Besides that organizing a management committee that is in charge of controlling the management operation.

Conclusions
1. The average accuracy achieved by the proposed prediction system is up to 98.98 %. The method of prediction is achieved by using deep learning as CNN with four experiments. The training and testing processes are done based on the SMFRD database.
2. The system shows that it can work efficiently over wide areas due to the use of Arduino Uno as an interface between the PC and the transmitter Node-MCU. Where the mask-wearing status (signal) is transmitted in real time via the Internet to the receiver Node-MCU, which has been located on the remote side as this is a form of IoT. The receiver Node-MCU acts according to the mask-wearing status (e. g. opening the airport entry).