A comparative analysis of application-level and system-level container runtimes for state-of-the-art data deduplication techniques
Loading...
Date
Publisher
BRAC University
Citation
Abstract
Containerization has become a cornerstone of modern software deployment, offering
lightweight isolation and rapid scalability across diverse environments. However,
the growing variety of container runtimes introduces uncertainty regarding their behavior
under data-intensive workloads such as deduplication, where computational
efficiency and resource utilization directly affect scalability and responsiveness. To
investigate this, we design a structured experimental pipeline that executes three
hash-based deduplication algorithms - CRC32, MD5, and SHA-256; within three
container runtimes: Docker, LXC, and Podman. Each algorithm is run ten times
across datasets of 1M, 5M, and 10M records to ensure statistical consistency, generating
over 3,700 performance samples consolidated into 180+ representative instances.
Building on this pipeline, we develop a holistic scalability assessment framework that
quantifies container efficiency through throughput trends, variability in CPU and
memory usage, and collision rates, offering a comprehensive perspective on runtime
behavior. Experimental findings show that Docker maintains balanced scalability
with stable throughput growth through efficient daemon-managed scheduling,
while LXC delivers superior computational efficiency under heavy workloads due
to its direct kernel namespace access. Podman, though optimized for lightweight
and security-focused tasks, demonstrates performance variability when scaled. Finally,
we introduced a decision tree to assist in selecting optimal container–algorithm
configurations tailored to workload requirements. This work establishes an empirical
foundation for understanding container performance in deduplication contexts,
providing actionable insights for building efficient and resilient cloud-native data
processing infrastructures.
Description
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 45-49).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Includes bibliographical references (pages 45-49).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Publisher Link
Type
Thesis