SOLUTION QUIZ 123 # Huffman Encoder

Here’s an interim solution. I’m working on writing and reading binary
huffman encoded files. There is code here which isn’t used yet by the
demo code. I figured that the time had come to publish my approach to
building the tree, and encoding and decoding. The current code is
more pedagogical than useful since the ‘compressed’ data is
represented by a string representation of the bit string.

There are three files attached. huffman.rb is a module and some core
extensions. huffman_demo1.rb runs the example from the quiz, dumping
the tree and various computed data. Then ‘compresses’ and
‘decompresses’ the sample string. huffman_wikipedia.rb is an earlier
version which uses the algorithm in the wikipedia article.

A few remarks.

I started out using the wikipedia algorithm. Once I got that working,
I decided that it would be better to use a priority queue instead of
two separate queues. For this I used Brian Schroeder’s PriorityQueue
gem.

To handle the problem of padding to a given word length, I add an
artificial ‘character’ to the tree with an occurrence count of 0. This
should ensure that it doesn’t take a shorter encoding away from a
character which actually occurs in the input data.

In order to decode the compressed data, I use a regex which is a
simple alternation of all of the string representations of the codes,
anchored at the begining of the string. Since the huffman encoding
algorithm ensures that no code is a prefix of any other code, this
regex can be used to find the code at the head of the input stream.

So walking through a string representation of a huffman encoded bit
stream can be done by matching this pattern against the string, the
match will be the code. After this match, the string can be replaced
by the string remaining after the match.

Extending this to binary data will involve making an input stream
class which reads the binary data and converts it into reasonably
sized binary string representations, and an output stream which
accumulates the string representation of the codes and writes them as
binary as enough data is produced. The huffman.rb file contains these
classes in an embrionic form.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

“Rick DeNatale” [email protected] writes:

Extending this to binary data will involve making an input stream
class which reads the binary data and converts it into reasonably
sized binary string representations, and an output stream which
accumulates the string representation of the codes and writes them as
binary as enough data is produced. The huffman.rb file contains these
classes in an embrionic form.

Note that I did this with just a few methods, that yielded strings of
zeros and ones to other methods. The structure was like this:

encoding:
encode called encode_bits, passing in the input io stream
encode_bits would yield strings of 0s and 1s into a block
that was inside encode, which used pack to squish them into
binary bytes.

decoding:
decode called decode_bits, which yielded strings of 0s and 1s
that were read from the (binary) input - unpack was used to
make those strings.

I admit that I don’t really like the separation of the logic here -
it’s not really parallel between encoding and decoding, for instance -
but this works and didn’t require that much code.

The Vlsi design implementation of huffman encoding from scratch to
complete with verilog codes and vhdl codes, systematic diagram can be
found in the link below, here
http://blog.ektel.com.np/2012/03/vlsi-architecture-and-design-for-mpeg-encoder-decoder
I thought this view is also helpful for possible cross reference between
ruby language and hdl language