Huffman coding and decoding – Step by step

What is huffman coding?

Huffman coding is an algorithm to generate the binary code based on the frequencies of corresponding characters in the input string.

What is huffman decoding?

Huffman decoding is to recover the original text from the binary code generated from the huffman coding process.

By applying huffman algorithms, we compress the input string by using the generated code. We can also use the code to decompress to get the original string. Here are the steps of huffman coding and decoding. The code is available in Java, JavaScript and Python.

How to generate huffman code, coding and decoding the message?

  1. Build a frequency map from the input string for each character.

    First we construct a frequency map from the input string. Then create a queue of nodes, which contains the char and its frequency. Sort this queue by their frequency. We can see this queue as priority queue. The node with lowest frequency has highest priority.

  2. Build a queue of characters sorted by their frequency showing in the string.

    Next is to construct the binary tree from the sorted queue. Starting from the low frequency characters, take two characters as leaves and form a subtree. Build tree from the bottom up, until all characters have been included as leaves. A finished tree has n nodes, n=2*m-1.

  3. Build frequency-sorted binary tree based on the sequence of elements in the queue.

    After the tree is built, now traverse the tree to generate a dictionary which maps the char to binary code. The path from the root to that character (leaf) is the binary code for that character, such as A(01), B(100). Bit ‘0’ means going to the left child, bit ‘1’ means going to the right child. This is the preorder of the tree. Please note the huffman code in not necessary unique.

  4. create Huffman code map from the tree.

    Once the frequency-sorted binary tree and the Huffman code map is generated, we can encode the input string to binary code (compressed) by using the huffman code map. We can decode the binary code to the original string (uncompressed) by using the binary tree.

    huffman coding 6 step

  5. encode the input string to binary code (compressed) by using the huffman code map.

    Use a for loop to go through each character in the string, use it as key to get the code from the map, concatenate them as output binary string.

  6. Decode the binary code to the original string (uncompressed) by using the binary tree.

    This is the opposite of the last step. It is to convert back to the original string by using the same huffman code in step 4.


Amazon Interview Question
(modified) Implement the Huffman compression algorithm as shown in the example below:
1) Given a string AAAAAABBCCDDEEFFFFF, group them according to the number of occurrences: A => 6, B => 2, C => 2, D => 2, E => 2, F => 5
2) Put the string in a tree-like structure:
3. Output the Huffman code.

Java

JavaScript

Python

Output:
code: {‘F’: ’00’, ‘A’: ’01’, ‘B’: ‘100’, ‘C’: ‘101’, ‘D’: ‘110’, ‘E’: ‘111’}
encode message: 0101010101011001001011011101101111110000000000
decoding string: AAAAAABBCCDDEEFFFFF

O Notation:
Time complexity: O(nlogn + m + s)
Space complexity: O(n+m+o)
s is input string length; n is number of unique chars in input; m is number of nodes in tree, m = 2n-1;
o is encoded string length.


Download HuffmanCoding.java
Download HuffmanCoding.js
Download HuffmanCoding.py

Comments are closed