The three rules of Ruby Q.:
-
Please do not post any solutions or spoiler discussion for this quiz
until
48 hours have passed from the time on this message. -
Support Ruby Q. by submitting ideas as often as you can:
- Enjoy!
Suggestion: A [QUIZ] in the subject of emails about the problem helps
everyone
on Ruby T. follow the discussion. Please reply to the original quiz
message,
if you can.
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
by Harlan
Huffman Coding is a common form of data compression where none of the
original
data gets lost. It begins by analyzing a string of data to determine
which
pieces occur with the highest frequencies. These frequencies and pieces
are
used to construct a binary tree. It is the “path†from root node to the
leaf with this data that forms its encoding. The following example
should
explain things:
Data: ABRRKBAARAA (11 bytes)
Frequency counts:
A 5
R 3
B 2
K 1
In Huffman Tree form, with frequency weights in parentheses:
ARBK (11)
/
0 1
/
A (5) RBK (6)
/
0 1
/
R (3) BK (3)
/
0 1
/
B (2) K (1)
The encoding for each character is simply the path to that character:
A 0
R 10
B 110
K 111
Here is the original data encoded:
01101010 11111000 1000 (fits in 3 bytes)
We have compressed the original information by 80%!
A key point to note is that every character encoding has a unique
prefix,
corresponding to the unique path to that character within the tree. If
this
were not so, then decoding would be impossible due to ambiguity.
The quiz this time is to write a program to implement a compression
program
using Huffman encoding.
Extra Credit:
Perform the actual encoding using your tree. You may encounter one
issue
during the decompression/decoding phase. Your encoded string may not
be a
multiple of 8. This means that when you compress your encoding into a
binary number, padding 0’s get added. Then, upon decompression, you
may
see extra characters. To counter this, one solution is to add your
own
padding of 1 extra character every time. And then simply strip it off
once you have decoded.
You may also wish to provide a way to serialize the Huffman Tree so it
can be shared among copies of your program.
./huffman_encode.rb I want this message to get encoded!
Encoded:
11111111 11111110 11111111 11101111 10111111
01100110 11111111 11110111 11111111 11011100
11111111 11010111 01110111 11011110 10011011
11111100 11110101 10010111 11101111 11111011
11111101 11111101 01111111 01111111 11111110
Encoded Bytes:
25
Original:
I want this message to get encoded!
Original Bytes:
35
Compressed: 28%