Wikipedia defines K-mean clustering as:
k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.
That is here take n observations and try to cluster them(group them) in k clusters. This allows us to:
1. Learn about similar properties of different points.
2. Learn about the difference between points in different clusters.
3. Learn about the number of clusters within same or different frames of time.
Umm… but how is this useful…? One might ask, so let’s through examples and use cases of K-mean Clustering.
This article explains how UPI payment messages can be clustered into different topics to earn about the receiver’s product of interest, which can be further used for targeted advertisement. Hence this algorithm allowed the user to learn about the behavior and interest of the customer.
I found this amazingly well written article if you’d like to learn more about clustering and how it is implemented https://blog.dataiku.com/clustering-how-it-works-in-plain-english.
Finally I would like to tell you how this clustering is also helpful in cyber security:
Use of AI in cyber security has been increasing with the increasing number of users as well as the number of attackers. This is due to inability of tools IDS and IPS to adapt themselves with time i.e. whenever a new kind of attack is used by the attacker, hence complete reliance on such system can become fatal to any kind of new attack. Cyber security analytics is an alternative solution to such traditional security systems, which can use big data analytics techniques to provide a faster and scalable framework to handle a large amount of cyber security related data in real time.
K-means clustering is one of the most commonly used clustering algorithms in cyber security analytics aimed at dividing security related data into groups of similar entities, which in turn can help in gaining important insights about the known and unknown attack patterns. This technique helps a security analyst to focus on the data specific to some clusters. Hence reducing the time required to analyze, detect and protect against the incoming attacks.