A COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING PIPELINES
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE credits
Student thesis
Abstract [en]
In recent years there has been an increasing demand for real-time streaming applications that handle large volumes of data with low latency. Examples of such applications include real-time monitoring and analytics, electronic trading, advertising, fraud detection, and more. In a streaming pipeline the first step is ingesting the incoming data events, after which they can be sent off for processing. Choosing the correct tool that satisfies application requirements is an important technical decision that must be made. This thesis focuses entirely on the data ingestion part by evaluating three different platforms: Apache Kafka, Apache Pulsar and Redis Streams. The platforms are compared both on characteristics and performance. Architectural and design differences reveal that Kafka and Pulsar are more suited for use cases involving long-term persistent storage of events, whereas Redis is a potential solution when only short-term persistence is required. They all provide means for scalability and fault tolerance, ensuring high availability and reliable service. Two metrics, throughput and latency, were used in evaluating performance in a single node cluster. Kafka proves to be the most consistent in throughput but performs the worst in latency. Pulsar manages high throughput with low message sizes but struggles with larger message sizes. Pulsar performs the best in overall average latency across all message sizes tested, followed by Redis. The tests also show Redis being the most inconsistent in terms of throughput potential between different message sizes
Place, publisher, year, edition, pages
2020. , p. 35
Keywords [en]
stream processing, data ingestion, Redis Streams, Apache Kafka, Apache Pulsar, performance benchmark, real-time streaming
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:mdh:diva-48744OAI: oai:DiVA.org:mdh-48744DiVA, id: diva2:1440436
External cooperation
Addiva AB
Subject / course
Computer Science
Presentation
2020-06-05, Zoom, track 5 afternoon session, 14:05 (English)
Supervisors
Examiners
2020-06-162020-06-152020-06-16Bibliographically approved