https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A COMPARISON OF DATA INGESTION PLATFORMS IN REAL-TIME STREAM PROCESSING PIPELINES
Mälardalen University, School of Innovation, Design and Engineering.
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

In recent years there has been an increasing demand for real-time streaming applications that handle large volumes of data with low latency. Examples of such applications include real-time monitoring and analytics, electronic trading, advertising, fraud detection, and more. In a streaming pipeline the first step is ingesting the incoming data events, after which they can be sent off for processing. Choosing the correct tool that satisfies application requirements is an important technical decision that must be made. This thesis focuses entirely on the data ingestion part by evaluating three different platforms: Apache Kafka, Apache Pulsar and Redis Streams. The platforms are compared both on characteristics and performance. Architectural and design differences reveal that Kafka and Pulsar are more suited for use cases involving long-term persistent storage of events, whereas Redis is a potential solution when only short-term persistence is required. They all provide means for scalability and fault tolerance, ensuring high availability and reliable service. Two metrics, throughput and latency, were used in evaluating performance in a single node cluster. Kafka proves to be the most consistent in throughput but performs the worst in latency. Pulsar manages high throughput with low message sizes but struggles with larger message sizes. Pulsar performs the best in overall average latency across all message sizes tested, followed by Redis. The tests also show Redis being the most inconsistent in terms of throughput potential between different message sizes

Place, publisher, year, edition, pages
2020. , p. 35
Keywords [en]
stream processing, data ingestion, Redis Streams, Apache Kafka, Apache Pulsar, performance benchmark, real-time streaming
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:mdh:diva-48744OAI: oai:DiVA.org:mdh-48744DiVA, id: diva2:1440436
External cooperation
Addiva AB
Subject / course
Computer Science
Presentation
2020-06-05, Zoom, track 5 afternoon session, 14:05 (English)
Supervisors
Examiners
Available from: 2020-06-16 Created: 2020-06-15 Last updated: 2020-06-16Bibliographically approved

Open Access in DiVA

fulltext(1241 kB)3203 downloads
File information
File name FULLTEXT01.pdfFile size 1241 kBChecksum SHA-512
5ad1a406505678258749383ca024281a703a2e510510e4d8240f74867d704f1a659e3dfb924ac3b0a31c5e645870867bdfc82648ff043e628210f9971e8df2fc
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Tallberg, Sebastian
By organisation
School of Innovation, Design and Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 3203 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1411 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf