Rule-Based Stream Data Processing at the Edge with eKuiper
October 3, 2023
Governed by LF and designed to shift data analytics closer to devices, this lightweight IoT tool is actively advancing toward v2.0.
Limitations at the edge
One of the main concerns in edge computing is efficiently processing streaming data generated close to the Internet of Things (IoT) network edge. Unfortunately, traditional cloud-centric models are unable to meet all the latency/throughput demands for data-intensive, real-time streaming analytics. At the same time, typical software used for stream analytics, such as Apache Spark and Flink, may not be suited for these scenarios, often resulting in high latencies and increased data transmission costs. Therefore, at the edge, apps have to be lightweight, efficiently utilizing CPU and memory resources.
Recognizing these challenges, in early 2019, provider of MQTT services and IoT tools EMQ launched an internal stream analytics project now known as eKuiper (initially, EMQX Kuiper). It was then officially open-sourced on October 23, 2019. The stable version (v1.0.0) was released on October 22, 2020.
On April 21, 2021, the product was presented to LF Edge’s Technical Advisory Council (see details). LF Edge is an umbrella organization under The Linux Foundation that aims to build an open edge computing ecosystem/framework, fostering complimentary IoT products. In a previous post, we covered Project Alvarium, which focuses on establishing a trusted data fabric in edge environments by using distributed ledgers. In its turn, eKuiper looks to provide low-latency streaming data analytics at the edge, integrating with other LF Edge’s platforms and tools.
On May 24, 2021, Governing Board Strategic Planning Committee approved the donation/proposal to LF Edge. The project’s technical charter was adopted on June 23, 2021.
How eKuiper works
The goal behind the development of eKuiper is to provide a powerful rules engine for processing and analyzing streaming data right at the edge. Written with Golang, it aims to achieve high throughput and low latency while handling large volumes of information.
- Creating a stream to ingest data, specifying the source and data schema.
- Creating a rule with SQL-like syntax to process the data from the stream, defining the conditions and actions.
- Running the rule to start processing data, so that eKuiper will execute the defined actions when the conditions are met.
eKuiper has a built-in visual editor called Flow Editor. This enables users to easily create and edit rules by dragging and dropping elements in the UI.
To ensure that eKuiper can run even with limited resources, the project underwent performance tests. On an AWS t2 microinstance, the throughput was 10,000 messages per second, 25% CPU consumption, and 20 MB memory usage.
“When running on a Raspberry Pi 3B+, eKuiper achieves about 12,000 transactions per second with CPU consumption at 70%. With eKuiper, data extraction, transformation, and loading can be done on the edge.”
—Rocky Jin, EMQ
Rocky also noted how lightweight eKuiper is. The core server package is only about 4.5 MB with a memory footprint of 10 MB. It can be installed at the edge using Docker images, Helm charts, and binary packages.
Main eKuiper scenarios
The proliferation of the industrial IoT presents massive opportunities in intelligent manufacturing. As the amount of data being collected at the edge grows higher and higher, the ability to harness all that information is crucial. In this case, a lightweight stream processing engine, such as eKuiper, can be deployed at the edge to ensure real-time IIoT analysis. The low-latency processing of eKuiper can become crucial for time-sensitive industrial use cases like accident prevention or ongoing machinery diagnostics, enhancing safety and efficiency.
In such a scenario, eKuiper, along with a few other IoT/edge tools, can be combined to facilitate the connection, movement, processing, storage, and analysis of industrial equipment data.
By constantly analyzing industrial IoT data, eKuiper can be deployed to monitor energy consumption, predictive maintenance, and product quality tracing. More specifically, the product can help with:
- Real-time analysis of factory production data for efficient control of product quality
- Data cleansing, transformation, and compression
- Detecting irregularities, sending alerts, and resolving issues
- Production process optimization with artificial intelligence
As automotive manufacturers put more into the development of connected vehicles each year, the requirements for streaming analytics for the Internet of Vehicles (IoV) grows gradually. Thus, eKuiper can be instrumental in vehicle data collection implemented locally at the edge, making it possible to receive immediate insights and actions. For instance, the tool can help to analyze sensor data for abnormal patterns to predict maintenance needs or detect unsafe conditions, triggering alerts or actions accordingly.
According to its documentation, eKuiper can also be a useful tool for analyzing public data due to its lightweight stream processing engine that can handle real-time streaming analytics. In this case, it can be crucial in providing immediate insights in public data scenarios—such as emergency response, urban planning, and public transportation monitoring.
During Kubernetes on Edge Day, Rocky highlighted another use case, where eKuiper was deployed at a power plant to monitor, analyze, and make use of data in real time. In addition to these large scale scenarios, eKuiper can be deployed to a Raspberry Pi device to create a smart home hub. This enables heating, ventilation, and air conditioning (HVAC) systems to communicate with IoT devices, such as temperature and humidity sensors.
Integrations and extensions
Through its integration with other edge or cloud systems, eKuiper can facilitate broader analytics and intelligent decision-making. One such integration is with the EdgeX Foundry platform—made official on May 21, 2020. Here, eKuiper is deployed as a default rules engine, analyzing data from the EdgeX message bus. Based on the rules, the data is analyzed, sent back to the message bus, and the processed further by other EdgeX’s microservices.
In addition to these scenarios, the product can be integrated with KubeEdge to enable a containerized deployment of an eKuiper instance. This particular implementation was deployed at China Mobile, where eKuiper receives data from the edge and then sends the processed information to the cloud.
Additionally, eKuiper integrates with OpenYurt, which is built on native Kubernetes, to simplify its deployment and management, streamlining the edge computing setup and management process.
eKuiper can also work with the MQTT protocol by receiving data from message brokers and processing it in real time. Users can configure eKuiper to subscribe to specific topics on an MQTT broker and then process the incoming data using a streaming SQL engine. (If you work with this protocol and care about latencies/throughput, make sure you have read this collection of 20+ MQTT broker performance benchmarks published in 2020–2023.)
Specifically, eKuiper can be integrated with other open-source MQTT tools supported by EMQ, such as EMQX and NanoMQ. It also integrates into Neuron, pairing protocol connectivity and stream data processing.
Baetyl—another LF Edge project that extends cloud computing, data, and services to IoT devices—can also be integrated with eKuiper. Officially announced on March 27, 2023, the integration enables quicker and more convenient deployments of eKuiper.
It is also possible to use eKuiper with Azure IoT Hub.
For engineers who want to extend the product’s functionality, eKuiper supports plugins. Created with Python, Golang, or JSON, these extensions can serve various purposes—such as adding new sources, sinks, or functions to the eKuiper processing engine.
The full list of built-in source/sink connectors, as well as plugin-based bridges for custom systems, can be found in the documentation.
eKuiper v1.11 was just released on September 19, 2023. The release brought about substantial upgrades and new features, improving, for instance, the SQL syntax to offer more comprehensive rules. Other advances include enhancements to the management and operation of rules in edge environments. Finally, a more flexible sink cache retransmission policy was introduced, reducing the impact of edge network instability.
As eKuiper continues to develop, the focal point remains on enhancing the core capabilities to cater to the growing demands of edge deployments. The project’s roadmap includes refining interoperability with other edge and cloud technologies. There are also plans to expand the community of contributors and introduce more features to bolster performance and ease of use.
According to one of the latest Technical Steering Committee (TSC) meetings held on September 12, 2023, the community is currently working on fixes for v1.11.1 and v1.11.2. Meanwhile, eKuiper v1.12 is estimated to be released in November 2023.
While there is no exact timeline for the v2.0 update just yet, the eKuiper team expects it might be released in either December 2023 or in the first quarter of 2024. Some of the features planned for v2.0 include improvements to portable plugins, as well as support for load balancing and high availability.
Anyone interested in learning more about eKuiper can find its source code in this GitHub repository. The project also has official documentation and a wiki. The presentation that was used to present to LF Edge’s Technical Advisory Committee in 2021 can be found here. If you want to follow the project more closely or contribute, check out its mailing lists. There are also the #ekuiper, #ekuiper-user, and #ekuiper-tsc channels in the LF Edge Slack workspace.
Frequently asked questions (FAQ)
What is eKuiper and why was it developed?
eKuiper is a project under LF Edge’s umbrella, developed to provide low-latency streaming data analytics at the edge. It was designed to address the challenges of efficiently processing streaming data generated close to the IoT network edge.
When was eKuiper open-sourced and who are the primary contributors to the project?
eKuiper was initially developed by EMQ in early 2019 and was open-sourced on October 23, 2019. In 2021, eKuiper was donated to the LF Edge foundation and is now being supported by members from EMQ, China Mobile, VMware, Intel, IOTech, INTECH Process Automation, etc.
What are the main goals and features of eKuiper?
eKuiper provides low-latency, high-throughput streaming data analytics at the edge. Its features include a lightweight core server package ensuring it runs efficiently even on devices with limited resources, a visual Flow Editor for intuitive rule creation and editing, and a powerful rules engine that processes and analyzes streaming data in real time. Moreover, eKuiper’s extensible architecture allows integration with various edge or cloud systems, enhancing its capability to provide broader analytics and intelligent decision-making across different scenarios.
How does eKuiper ensure high throughput and low latency in data processing?
eKuiper is designed to be lightweight, with a core server package of about 4.5 MB and a memory footprint of 10 MB, enabling high throughput and low latency even on limited resources. On a Raspberry Pi 3B+ device, eKuiper achieves about 12,000 transactions per second with CPU consumption at 70%.
What are the primary use cases for deploying eKuiper?
eKuiper can be deployed for real-time analysis of factory production data, data cleansing, and detecting irregularities in IIoT. In IoV, it’s used for analyzing sensor data to predict maintenance needs or detect unsafe conditions. For public data scenarios, eKuiper provides immediate insights for emergency response, urban planning, and public transportation monitoring. In smart homes, it facilitates communication between HVAC systems and IoT devices.
What are the main integrations of eKuiper with other platforms or projects?
eKuiper has integrations with EdgeX Foundry, KubeEdge, OpenYurt, MQTT brokers (EMQX, NanoMQ, etc.), and Baetyl among others, to enhance real-time data processing in multiple environments.
How can users extend eKuiper’s functionality?
Users can create plugins with Python, Golang, or JSON to extend eKuiper’s functionality by adding new sources, sinks, or functions to the processing engine.
Want details? Watch the video!
In this session from Kubernetes on Edge Day in 2021, Rocky Jin provides an overview of eKuiper and how the product enables lightweight data processing at the edge.
- LF’s Project Alvarium: Ensuring Trusted IoT Data at the Edge with DLT
- A Collection of 20+ MQTT Broker Performance Benchmarks (2020–2023)
- Alternatives to Google Cloud IoT Core—Where to Migrate?
About the expert
Rocky Jin is cofounder at EMQ. He is a committer for the EdgeX Foundry project under Linux Foundation and an initiator of the eKuiper project. Rocky works closely with the KubeEdge team for supporting edge streaming analytics by integrating eKuiper and KubeEdge. He is a former IBM China Software Development Lab technology and product leader, architect, and senior software development engineer.