Integrating compression and encryption into a modular system to maximize storage efficiency and robust data security

TEXT | Egor Shulgin & Tommi Rintala
Permalink http://urn.fi/URN:NBN:fi-fe2025082893018
A network of blue and black cubes floating in space, connected by thin lines. The background is blurred, creating a sense of depth and complexity.

In today’s era of big data, both organizations and individuals generate a vast amount of information. Some information is public, but some is private and needs proper security measures to be implemented: compression and encryption. Traditionally, separate software solutions are used for encryption and compression, but this separation can lead to potential data leakage and human errors at different processing stages. The solution is to create an integrated approach which will solve this and other problems.

During the development of such a system the “security-efficiency paradox” appears. Encrypting data before compression removes the redundancy that compression algorithms depend on, significantly reducing their effectiveness. Conversely, compressing data first may reveal some metadata (like filenames or directory structure) that could be used for cyber-attacks. To solve this paradox system should apply novel anonymization practices during compression stage and only then apply encryption stage which now also will use less resources (e.g. RAM) cause it need to encrypt less data. This leads to the question: Which algorithm should be chosen to be the optimal solution?

The next feature of such an integrated system is adoptability. If the system is used in some real-time applications, it can use lightweight algorithms like DEFLATE, while more resource-intensive algorithms like LZMA are better suited for archival purposes. To optimize the workflow, the system divides files into multiple chunks which are next processed parallel using multi-thread CPU. To improve encryption speed, hardware security modules (HSM) can be used. To avoid metadata leakage during different steps, proper cleanup tools are needed: they overwrite parts of the disk where the system saved any data. Such a system is also modular: it consists of “blocks” of code, which can be updated or replaced without changing whole program when new algorithms are released.

A digital, futuristic network of interconnected cubes made of blue dots with glowing edges. Some cubes emit light from openings. The background features circular shapes and colored lights in red and green, suggesting data flow or a virtual environment.

Artificial intelligence and machine learning techniques can be implemented to mitigate human errors and improve overall efficiency of a system. AI can dynamically select the most appropriate algorithms based on data type, file size, context, available resources, and identified cyber-threats. If the company uses cloud services, AI can also expand resources if they are needed to avoid system crashes during high-load times. To be future-proofed, developers nowadays should not only use AI but think about integrating quantum computing features in such a system.

There is only one big problem in such systems: if one error occurs during any stage, it can lead to data loss or even a complete system crash. To avoid it, error-detection mechanisms must be applied. They can catch errors at the very beginning and restart the process or alert an operator to solve them.

To summarize, combining compression and encryption into a single, integrated system is not only nice, but a very important feature in the changing environment of today. Developing a modular system capable of anonymization, adaptation, and scalability is critical for both maximizing efficiency and security. To be future proved it should use AI solutions and be ready to integrate quantum computing features.

This thesis was written by Egor Shulgin and supervised by Tommi Rintala. Whole thesis is available at: https://urn.fi/URN:NBN:fi:amk-2025061122292

Related articles