DISTINGUISHING OF DIFFERENT TISSUE TYPES USING K-MEANS CLUSTERING OF COLOR SEGMENTATION

Even in this day and age of scientific developments, cancer can be lethal if it is not detected at an early stage in the healthcare industry. Rapid detection of malignant cells has the potential to save millions of lives. Image classification algorithms may be used to detect the shape of cancerous cells, which is crucial in determining the severity of the disease. Many image segmentation algorithms and approaches have been developed over the years that use domain-based knowledge for solving segmentation issues in a variety of applications, including object detection, medical imaging, iris recognition, machine vision, video surveillance, and many others. With the rapid advancement of digital technology, digital images now play a critical role in the current day, with rapid applications in the medical and visualization fields. The digital image is composed of a large number of distinct dots, each with its brightness level. Pixels or visual elements are represented by these dots. “The search for homogenous regions in an image and then the classification of these images” is how image segmentation is defined. Segmentation is the separation of a picture into distinct objects or related parts that do not overlap. Real-world image segmentation problems are frequently used to achieve several goals, such as lowering overall deviation, maximizing property, reducing alternatives, or lowering the classifier’s error rate. It’s presently unknown how to tell if one approach in image processing is more accurate than another, whether for a similar picture, a single image, a group of images, or, more often, an entire class of each image [1]. Clustering is the partition of a data collection to clusters/subsets to have the data of every subset share some common feature ideally, generally based on a distance metric. Hierarchical algorithms, which are consecutive clusters employing previously established clusters and comprise agglomerative (“bottom-up”) and divisive clustering (“top-down”), are the two basic types of clustering. The agglomerative methods start with every element as a split cluster and gradually combine it with bigger clusters, whereas the divisive (“top-down”) approaches start with the full dataset and gradually split it into minor clusters. Partitioned clustering, which includes derivatives and K-Means, the QT clustering method, and Fuzzy C-means clustering is the second category [2, 3]. The clustering classification showing the area this study is focused on is shown in Fig. 1. DISTINGUISHING OF DIFFERENT TISSUE TYPES USING K-MEANS CLUSTERING OF COLOR SEGMENTATION


Introduction
Even in this day and age of scientific developments, cancer can be lethal if it is not detected at an early stage in the healthcare industry. Rapid detection of malignant cells has the potential to save millions of lives. Image classification algorithms may be used to detect the shape of cancerous cells, which is crucial in determining the severity of the disease. Many image segmentation algorithms and approaches have been developed over the years that use domain-based knowledge for solving segmentation issues in a variety of applications, including object detection, medical imaging, iris recognition, machine vision, video surveillance, and many others.
With the rapid advancement of digital technology, digital images now play a critical role in the current day, with rapid applications in the medical and visualization fields. The digital image is composed of a large number of distinct dots, each with its brightness level. Pixels or visual elements are represented by these dots. "The search for homogenous regions in an image and then the classification of these images" is how image segmentation is defined. Segmentation is the separation of a picture into distinct objects or related parts that do not overlap. Real-world image segmentation problems are frequently used to achieve several goals, such as lowering overall deviation, maximizing property, reducing alternatives, or lowering the classifier's error rate. It's presently unknown how to tell if one approach in image processing is more accurate than another, whether for a similar picture, a single image, a group of images, or, more often, an entire class of each image [1]. Clustering is the partition of a data collection to clusters/subsets to have the data of every subset share some common feature ideally, generally based on a distance metric. Hierarchical algorithms, which are consecutive clusters employing previously established clusters and comprise agglomerative ("bottom-up") and divisive clustering ("top-down"), are the two basic types of clustering. The agglomerative methods start with every element as a split cluster and gradually combine it with bigger clusters, whereas the divisive ("top-down") approaches start with the full dataset and gradually split it into minor clusters. Partitioned clustering, which includes derivatives and K-Means, the QT clustering method, and Fuzzy C-means clustering is the second category [2,3]. The clustering classification showing the area this study is focused on is shown in Fig. 1.

DISTINGUISHING OF DIFFERENT TISSUE TYPES USING K-MEANS CLUSTERING OF COLOR SEGMENTATION Z i n a h R . H u s s e i n
al clustering methods including FCM and K-Means are used to classify items into separate groups based on comparable characteristics of the data objects [6,21]. An option to overcome the noise-sensitive difficulties is discussed by the studies [22,23] that provide a complete definition of image segmentation. These papers examined uniformity criteria by suggesting uniform regions, straightforward (not tattered) boundaries, and neighboring regions with a large difference. However, the user-defined parameters have a considerable impact on the clustering outcomes because the client has no previous knowledge of the number of clusters. To deal with these difficulties, various studies have suggested adaptive cluster initialization strategies.
Another option to overcome the relevant difficulties can be by using the Mean Shift (MS) method that does not require any parameterization or prior knowledge of the clusters' number as discussed in [24] or the paper [25] that offers the Automated Forensic Handwriting Analysis (AFHA) as an adaptive unsupervised clustered technique. The AFHA algorithm combines the Fuzzy C-means and Ant System techniques [26]. The Ant System finds compact and identifiable groupings. This approach is a modified form of K-Means clustering [27], a traditional K-Means clustering methodology, in contrast to X-means [28], Normalized cut [29], and mean shift (MS) [30]. Until the required K-clusters number is obtained and the inter-cluster correspondence falls below a given threshold, the MKM system divides clusters into subcategories. The paper [31] also presented a hybrid-based adaptive cluster technique for image segmentation, which is only suitable for gray-scale images. However, all these allow us to argue that it is appropriate to conduct a study devoted to a clustering-based color segmentation that can successfully find the core points of image clusters by establishing automated initialization settings for K-Means clustering.

The aim and objectives of the study
The aim of the study is to develop a cluster-based color image segmentation approach that can accurately search the RGB pairings for the core points of clusters without requiring prior information. This will make it possible to obtain the image segmentation feature.
To achieve the aim, the following objectives were set: -to create pictures to segment H&E images by color-separating elements in the original image, resulting in three photos; -to use the 'L*' layer in the {L*a*b*} CS to differentiate the nuclei of the stained tissue cells with dark color from the surrounds.

Materials and methods
The technique of segmenting a digital image into many different sections comprising each pixel (sets of pixels, sometimes known as superpixels) with comparable features is known as image segmentation. Image segmentation aims to transform an image representation into something more Segmentation strategies include region extraction and identification, thresholding, and edge detection approaches [4,5], data clustering, and physically-based systems methods [6,7]. In the same class, the datapoints are comparable to datapoints from other classes, but they are not identical, according to the data clustering technique, which divides objects into classes and subclasses. Clustering is one of the most frequently used methods for image data analysis, data mining, and segmentation [6]. Clustering techniques are also utilized in medicine for early identification of lung nodules [8], MRI [9,10], clustering for bipolar disorder [11,12], and automated clustering methods for superparticles [13]. The K-Means technique, developed by MacQueen in 1967 [14], is one of the most basic clustering algorithms. A database is divided into k groups by the K-Means algorithm [15,16]. The K-Means approach separates a dataset of users into unmarked datapoints that are subsequently scattered amongst K clusters. The centroids of an object are defined by a distance criterion.
Color image segmentation supposes that the homogeneous colors in an image match discrete clusters. Therefore, meaningful objects in the picture are based on the color characteristic of image pixels. To put it another way, each cluster designates a group of pixels with similar color attributes. Because the outcomes of segmentation are dependent on the Color Space (CS) utilized, no single CS can produce satisfactory results for all types of photos. As a result, several authors have attempted to identify the CS that will best suit their color image segmentation challenge [17,18].
Tissue segmentation in whole-slide photographs is a crucial task in digital pathology, as it is necessary for fast and accurate computer-aided diagnoses. When a tissue picture is stained with eosin and hematoxylin, precise tissue segmentation is especially important for a successful diagnosis. This kind of staining aids pathologists in distinguishing between different tissue types.

Literature review and problem statement
K-Means algorithm was introduced by [19] as one of the most simple and fast-convergent algorithms, but it was found to be very sensitive to additive noise and incapable of managing noisy pictures. The reason for this can be the difficulties of ignoring some image content. The paper [20] discussed this issue by proposing the Fuzzy C-means (FCM) method to mitigate the high sensitivity of surrounding noises. However, this approach also requires complex computations and usually leads to over-segmentation. Similarly, previous knowledge about the pictures is required for the classic K-Means technique, such as the initial centroid information and the number of clusters. These convention- meaningful and easier to understand. Image segmentation is frequently used to find objects and boundaries (lines, curves, and so on) in pictures. To put it another way, segmentation of an image is the procedure of giving a label to each pixel of a picture so that the same label pixels have similar attributes.
The main goal is to use the {L*a*b*} CS and K-Means clustering to automate color segmentation. K-Means clustering aims to get the shortest feasible sum of squared distances between all locations and the cluster center.
The distance measure will influence the shape of the clusters and will determine how the similarity of two items is assessed. They are as follows: 1. The following is how the Euclidean distance (also known as the 2-norm distance) is calculated: where (x 1 -y 1 ) and (x 2 -y 2 ) are the coordinates of the first and second point, respectively, d is the distance between these points. 4. Mahalanobis distances adjust data to account for varied scales and relationships.
5. Inner-product-space: when clustering high-dimensional data, the angle between two vectors can be employed as a distance metric.
6. Hamming distance (also known as edit distance) is a measurement of how many replacements are required to turn one member into another.
Based on their characteristics, the K-Means technique splits n items into k segments. It is comparable to the expectation-maximization technique for Gaussian mixture with the aim that both find the natural clusters' centers of data. A vector space is considered to be formed by the object characteristics.
Using this method, N datapoints are partitioned (or clustered) into K distinct subgroups. With the sum of squares criterion reduced, Sj consists of datapoints: where j denotes the datapoints geometric centroid into Sj, and X n denotes the datapoint vector. K-Means clustering is a technique for categorizing or grouping objects into K groups based on attributes/features, to put it simply. The number K is a positive integer. In order to group data, the sum of squared distances between data and the cluster centroid is decreased. Fig. 2 depicts the steps of the K-Means clustering method.
If brightness variations are ignored, there are three colors: blue, pink, and white. The {L*a*b*} CS, also known as CIELAB or CIE {L*a*b*}, may be used to quantify these perceptual discrepancies.
The {L*a*b*} CS is created using the CIE XYZ tristimulus values. The {L*a*b*} space is made up of three layers: a luminosity layer 'L*,' a chromaticity layer 'a*' indicating color distribution along the red-green axis, and a chromaticity layer 'b*' indicating color distribution along the blue-yellow axis. The 'a*' and 'b*' layers have all of the color information. Using the Euclidean distance metric, the difference between two colors can be calculated. The algorithm steps are demonstrated below.

1. Segmenting H&E images by color-separating elements
A picture of tissue stained with eosin and hematoxylin was used in this study to test a newly developed K-Means clustering method (H&E). This kind of staining aids pathologists in distinguishing between different tissue types. A MATLAB-based code is developed to obtain the results of this work. Fig. 3 shows images of tissue stained with eosin and hematoxylin by reading these images in [32].
K-Means clustering considers each object to have a physical location. It identifies divisions that keep things inside each cluster as near together as possible while keeping them as far apart as possible from objects in other clusters. Several partitioned clusters, as well as a distance metric to define how near two items are to each other, are required for K-Means clustering.
The objects are pixels with 'a*' and 'b*' values since the color information is in the 'a*b*' color space. The data are converted to a single-valued format and the entries are grouped into three categories. The picture presented in Fig. 4 is created by identifying each pixel in the image with its pixel label.
In order to make images for color segmentation of the H&E image, which separates objects of the original image by color and results in three images, Fig. 5 shows the objects of cluster 1.
The process of color segmentation of the H&E image, which separates objects of the original image by color into three images, is depicted in Fig. 6 that shows the objects of cluster 2 and depicted in Fig. 7 to show the objects of cluster 3.
Therefore, the process of color segmentation of the H&E image successes to separate objects of the original image by color into three image clusters.

2. Results of Segmenting the Nuclei
Color is one of the qualities that we utilized in our working approach, thus segmenting by colors generally offers a very accurate sense of things and can be a useful pretreatment for stages such as object recognition. The resulting image of segmenting the Nuclei with Blue Nuclei is shown in Fig. 8. The blue items are found in Cluster 3. Both dark and bright blue hues are present. We can differentiate light blue from dark blue by using the 'L*' layer in the {L*a*b*} CS. The nuclei of the cells are dark blue.

Discussion of the research results of K-Means clustering for color segmentation
In order to show the effectiveness of this approach, four images of 100 magnification of tissue stained with eosin and hematoxylin were used: a) healthy tissue; b) adenomas; c) ulcerative colitis; d) adenocarcinomas (magnification ×100).
The obtained results explained that each color's brightness values are preserved in the 'L*' layer to the brightness values of the pixels in this cluster when applying the global threshold. The obtained clustering determines the indices of brilliant blue pixels. The objects of cluster 3 were created by duplicating the blue object mask, removing the light blue pixels from the mask, then applying the new mask to the original image. Getting the correct K from our suggested method was quite challenging. These results prove segmenting H&E images by color-separating elements. The final image only shows dark blue cell nuclei that improve sensitivity to the surrounding noise as compared with existing studies.
This type of tissue segmentation in whole-slide photographs is a crucial task in digital pathology, as it is necessary for fast and accurate computer-aided diagnoses. When a tissue picture is stained with eosin and hematoxylin, precise tissue segmentation is especially important for a successful diagnosis. This kind of staining aids pathologists in distinguishing between different tissue types.
In order to evaluate the acquired images, six segmentation evaluation metrics were used "Accuracy, Precision, F-Measure, Matthews Correlation Coefficient (MCC), Sensitivity, Specificity, Dice index and Jaccard index" [33,34]. Table 1 lists the main points that can compare the presented work with the most related one for the Blue Nuclei of Fig. 8.
We observed that, while the K-Means method performs admirably, the fact that we must pick a K makes it a little difficult to use. When we attempted to automate the process, we discovered that it was not a simple task, and while our results are not flawless, they are not far off.
In the future, we will strive to improve our suggested automated K-Means method in order to obtain correct K in all CSs all of the time.
This work has only experimented with two Color Spaces that do not include CMYK, L*a*b, YCbCr, HSL, and other CSs. Future work will expand our palette to include these elements.

Conclusions
1. Color is one of the qualities that we utilize in our working approach, thus segmenting by colors generally offers a very accurate sense of things and can be a useful pretreatment for stages such as object recognition. We discovered the values we utilized to end the K-reduction based on samples from stained tissue photos and manipulations of the findings; nevertheless, generic pictures are not intended to operate this way.
2. The presented approach succeeded to discover the values we utilized to end the K reduction based on samples by using the 'L*' layer in the {L*a*b*} CS to differentiate the nuclei from stained tissue photos and manipulations of the findings. However, we wanted to stress that no one portrayal is superior to the other. Table 1 The main factors that can compare the presented work with the most related one