Data Compression

46 Computer Science Topics

We’ve created 46 modules covering every Computer Science topic needed for GCSE level. These are transferable across AQA, CIE, Edexcel, CIE & Internationally. Suitable for teachers or home educators alike.

Whether you’re a brand new Computer Science teacher, or you’ve been teaching ICT for years, our resources will save you hours and hours of lesson preparation every single week.

View the resources →

What is Data Compression?

Data compression is used everywhere. Many different file types use compressed data. Without data compression a 3-minute song would be over 100Mb in size, while a 10-minute video would be over 1Gb in size. Data compression shrinks big files into much smaller ones. It does this by getting rid of unnecessary data while retaining the information in the file.

Data compression can be expressed as a decrease in the number of bits required to illustrate data. Compressing data can conserve storage capacity, accelerate file transfer, and minimise costs for hardware storage and network capacity.

How Compression Works?

Compression is executed by a program that uses a procedure to identify how to reduce the data size.

Text compression can be done by eliminating unnecessary characters, embedding a repeat character to specify repeated characters, and substituting a smaller bit string for a commonly occurring bit string. Data compression can cut a text file to 50%, or to a percentage still smaller of its original size.

For data transmission, compression can be done on the data content or on the transmission unit as a whole. When data needs to be transferred over the internet, larger files can be sent in a ZIP, GZIP or other compressed format.

What is the Purpose of Compression?

The purpose of compression is to make a file, message, or any other chunk of data smaller. Data compression can significantly decrease the amount of storage space a file takes up. If we had a 10Mb file and could shrink it down to 5Mb, we have compressed it with a compression ratio of 2, since it is half the size of the original file. If we compressed the 10Mb file to 1Mb it would have a compression ratio of 10 because the new file is a 10th the size of the original. The higher the compression ratio the better the compression. Because of compression, administrators save money and time that would otherwise be spent on storage.

Compression enhances backup storage operation and has also affected primary storage data reduction. Compression will continue to play a significant role in data reduction as data continues its own exponential growth.

Almost any type of file can be compressed, but it’s imperative to follow best practices when selecting files to compress. For example, some files are already compressed, so compressing them would not have a substantial impact.

Data Compression Methods

There are two kinds of compression: Lossless and Lossy.

Lossy compression loses data, while lossless compression keeps all the data. With lossless compression we don’t get rid of any data. Instead, the technique is based on finding smarter ways to encode the data. With lossy compression we get rid of data, which is why we need to distinguish data from information.

Lossless compression allows the potential for a file to return to its original size, without the loss of a single bit of data, when the file is uncompressed. Lossless compression is the usual approach taken with executables, as well as with text and spreadsheet files, where the loss of words or numbers would change the information. Lossless compression can compress the data whenever redundancy is present. Therefore, lossless compression takes advantage of data redundancy.

Lossy compression permanently removes bits of data that are redundant, insignificant or unnoticeable. Lossy compression is suitable with graphics, audio, video and images, where the deletion of some data bits has little or no apparent effect on the illustration of the content. In lossy compression, messages become more efficient by getting rid of unwanted data. Lossy compression lessens the size of the data while retaining more information.

Graphical image compression can be either lossy or lossless. Graphic image file formats are usually developed to compress information since the files tend to be big. JPEG is an image file format that promotes lossy image compression. Formats such as GIF and PNG use lossless compression.