The range and depth of applications dependent on IoT sensors continues to swell – from collecting real-time data on the floors of smart factories, to monitoring supply chains, to enabling smart cities, to tracking our health and wellness behaviors. The networks utilizing IoT sensors are capable of providing critical insights into the innerworkings of vast systems, empowering engineers to take better informed actions and ultimately introduce far greater efficiency, safety, and performance into these ecosystems.
One outsized example of this: IoT sensors can support predictive maintenance by detecting data anomalies that deviate from baseline behavior and that suggest potential mechanical failures – thus enabling an IoT-fueled organization to repair or replace components before issues become serious or downtime occurs. Because IoT sensors provide such a tremendous amount of data pertaining to each particular piece of equipment when in good working condition, anomalies in that same data can clearly indicate issues.
Looking at this from a data science perspective, anomalies are rare events which cannot be classified using currently available data examples; anomalies can also come from cybersecurity threats, or fraudulent transactions. It is therefore vital to the integrity of IoT systems to have solutions in place for detecting these anomalies and taking preventative action. Anomaly detection systems require a technology stack that folds in solutions for machine learning, statistical analysis, algorithm optimization, and data-layer technologies that can ingest, process, analyze, disseminate, and store streaming data from myriad IoT sources.
But that said, actually creating an IoT anomaly detection system remains especially challenging given the large-scale nature inherent to IoT environments, where millions or even billions of data events occur daily. To be successful, the data-layer technologies supporting an IoT anomaly detection system must be capable of meeting the scalability, computational, and performance needs fundamental to a successful IoT deployment.
I don’t work for a company that sells anomaly detection, but I – along with colleagues on our engineering team – recently created an experimental anomaly detection solution to see if it could stand up to the specific needs of large-scale IoT environments using pure open source data-layer technologies (in their 100% open source form). The testing utilized Apache Kafka and Apache Cassandra to produce an architecture capable of delivering the features required for IoT anomaly detection technology from the perspectives of scalability, performance, and realistic cost effectiveness. In addition to matching up against these attributes, Kafka and Cassandra are highly compatible and complementary technologies that lend themselves to being used in tandem. Not fully knowing what to expect, we went to work.
In our experiment, Kafka, Cassandra, and our anomaly detection application are combined in a Lambda architecture, with Kafka and our streaming data pipeline serving as the speed layer, and Cassandra acting as the batch and serving layer. (See full details on GitHub, here.) Kafka enables rapid and scalable ingestion of streaming data, while leveraging a “store and forward” technique that acts as a buffer for ensuring that Cassandra is not overwhelmed when data surges spike. At the same time, Cassandra provides a linearly scalable, write-optimized database well-suited to storing the high-velocity streaming data produced by IoT environments. The experiment also leveraged Kubernetes on AWS EKS, to provide automation for the experimental application’s provisioning, deployment, and scaling.
We progressed through the development of our anomaly detection application test using an incremental approach, continually optimizing capabilities, monitoring, debugging, refining, and so on. Then we tested scale: 19 billion real-time events per day were processed, enough to satisfy the requirements of most any IoT use case out there. Achieving this result meant scaling out the application from three to 48 Cassandra nodes, while utilizing 574 CPU cores across Cassandra, Kafka, and Kubernetes clusters. It also included maintaining a peak 2.3 million writes per second into Kafka, for a sustainable 220,000 anomaly checks per second.
In completing this experiment, we’ve demonstrated a method that IoT-centric organizations can use for themselves in building a highly scalable, performant, and affordable anomaly detection application for IoT use cases, fueled by leveraging the unique advantages offered by pure open source Apache Kafka and Cassandra at the all-important data layer.