The three rules of Ruby Q.:
Please do not post any solutions or spoiler discussion for this quiz
48 hours have passed from the time on this message.
Support Ruby Q. by submitting ideas as often as you can:
Suggestion: A [QUIZ] in the subject of emails about the problem helps
on Ruby T. follow the discussion. Please reply to the original quiz
if you can.
Huffman Coding is a common form of data compression where none of the
data gets lost. It begins by analyzing a string of data to determine
pieces occur with the highest frequencies. These frequencies and pieces
used to construct a binary tree. It is the â€œpathâ€ from root node to the
leaf with this data that forms its encoding. The following example
Data: ABRRKBAARAA (11 bytes)
In Huffman Tree form, with frequency weights in parentheses:
A (5) RBK (6)
R (3) BK (3)
B (2) K (1)
The encoding for each character is simply the path to that character:
Here is the original data encoded:
01101010 11111000 1000 (fits in 3 bytes)
We have compressed the original information by 80%!
A key point to note is that every character encoding has a unique
corresponding to the unique path to that character within the tree. If
were not so, then decoding would be impossible due to ambiguity.
The quiz this time is to write a program to implement a compression
using Huffman encoding.
Perform the actual encoding using your tree. You may encounter one
during the decompression/decoding phase. Your encoded string may not
multiple of 8. This means that when you compress your encoding into a
binary number, padding 0â€™s get added. Then, upon decompression, you
see extra characters. To counter this, one solution is to add your
padding of 1 extra character every time. And then simply strip it off
once you have decoded.
You may also wish to provide a way to serialize the Huffman Tree so it
can be shared among copies of your program.
./huffman_encode.rb I want this message to get encoded!
11111111 11111110 11111111 11101111 10111111
01100110 11111111 11110111 11111111 11011100
11111111 11010111 01110111 11011110 10011011
11111100 11110101 10010111 11101111 11111011
11111101 11111101 01111111 01111111 11111110
I want this message to get encoded!