Determine the Huffman codes for encoding the folowing input text: YABBADABBADABBADOO
Determine the LZW encoding for the following input text: YABBADABBADABBADOO
Implement a simple compression/decompression tool, in one of the variants:
A - Huffman codes
B - LZW compression
FileCompresser mode inputfile outputfile
where mode can be -c or -d, meaning compression or decompression.
The tool must accept for compression files of any type (text or binary). After using the tool for a sequence of compression followed by decompression, the obtained file must be identical with the original file.
Implementation: For doing compression, the program reads the inputfile as a binary file, each byte being a character of an alphabet of 256 symbols, and counts the frequencies of the characters. Huffman codes are built. After this, encoding (writing the outputfile) can begin. Information about the used codes (frequencies of characters) has to be also written in the outputfile, in order to enable later decoding. The inputfile is then read again and each character replaced by its Huffman code. For writing in the output file (the compressed file), the Huffman codes must be constructed at bit-level using bitwise operations ! If you are not working at bitlevel, but just output characters '0' and '1', then a code such as 10011 that was meant to be 5 bits will take 5 bytes and the tool will actually expand the size of the input file instead of compressing it ! Also, if the total number of bits used in the encoding of the input file is not a multiple of 8, some "padding" bits have to be added at the end of the output file.
For doing decompression, the program reads from the file the table of character frequencies and builds the Huffman codes (the Huffman tree), using the same algorithm as for the compression. Then, it reads the compressed data and using the Huffman tree decodes the original characters and writes them into the decompressed file.
FileCompresser mode inputfile outputfile
where mode can be -c or -d, meaning compression or decompression.
The tool must accept for compression files of any type(text or binary). After using the tool for a sequence of compression followed by decompression, the obtained file must be identical with the original file.
Implementation:For doing compression, the program reads the inputfile as a binary file, each byte being a symbol of an alphabet of 256 symbols. For every symbol read, it updates the dictionary and maybe outputs some encoding (an index in the current dictionary). Use for the length of the index-word 12 bits in your implementation. If the size of the dictionary increases such that it exceeds the addressable size, exit with an error code.
For doing decompression, the program reads the encoded file as a binary file, where every 12 bits represent an index into the dictionary. For every read index, the program outputs the corrsponding string from the dictionary. The dictionary is continuously updated according to the LZW algorithm.