Interface
#include <codecogs/computing/io/compression/variablelzw.h>
using namespace Computing::IO::Compression;
| class | VariableLZW
Implements the variable compression size LZW algorithm | [constructor] | VariableLZW () Implicit constructor | | [constructor] | VariableLZW (std::string &alphabet) Constructor that sets the alphabet | | void | setAlphabet (std::string &alphabet) Sets the alphabet. | | std::string | getAlphabet () Returns the alphabet. | | void | setInitCodeSize (int initCodeSize) Sets the initial code size. | | int | getInitCodeSize () Returns the initial code size. | | void | calculateInitCodeSize () Calculates the initial code size. | | void | setMaxCodeSize (int maxCodeSize) Sets the maximum code size. | | int | getMaxCodeSize () Returns the maximum code size. | | std::string | encode (std::string &input, bool calcInitCodeSize = true) Encodes the given raw data. | | std::string | decode (std::string &input) Decodes the given compressed data. | | std::string | m_alphabet | | unsigned short | m_cc | | std::vector<...> | m_dictionary | | int | m_initCodeSize | | void | initDictionary () | | std::string | packBits (std::vector<bool> &bits) | | std::vector<bool> | unpackBytes (std::string &bytes) | | void | addCode (std::vector<bool> &compressed, int codeSize, unsigned short code) | | void | addCurrentString (std::vector<bool> &compressed, int codeSize, std::string &currString) | | unsigned short | getNextCode (std::vector<bool> &input, int &pos, unsigned int size) |
|
Use the following HTML code to embed the calculators within other websites:
Class VariableLZW
This class implements a modified version of the Lempel-Ziv-Welch (LZW) compression method, allowing for the symbols in the compressed and uncompressed data streams to contain a variable number of bits. On contrast, when using the classic LZW algorithm available as
IO/Compression/LZW, each symbol contains a fixed number of bits.
Using a variable number of bits per symbol allows for higher compression rates, which is why this version of the algorithm is better than the original one. It is also the algorithm used by the GIF file format to encode and decode its image data.
The example below shows how to encode a random string of 1000 characters, only consisting of the capital letters in the English alphabet.
Example 1
#include <codecogs/computing/io/compression/variablelzw.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
// the number of characters to generate in the sample string
#define N 1000
using namespace Computing::IO::Compression;
int main()
{
// initialize random seed
srand(time(0));
// set the alphabet to include the letters A, B, C, D
std::string alphabet = "ABCD";
// create a new variable LZW coder
// with the alphabet set to the one defined above
IO::Compression::VariableLZW coder(alphabet);
// generate a random string of N capital letters
// from the set {A, B, C, D} to allow more repetitions
std::string sample;
for (int i = 0; i < N; ++i)
{
sample += 'A' + rand() % 4;
}
// compress the sample string
std::string compressed = coder.encode(sample);
// uncompress the compressed data
std::string uncompressed = coder.decode(compressed);
// compare the sizes of the compressed and uncompressed strings
std::cout << " Size of the sample string: " << N << std::endl;
std::cout << " Size of the compressed string: " << compressed.size() << std::endl;
std::cout << "Size of the uncompressed string: " << uncompressed.size() << std::endl;
std::cout << std::endl;
// test if the sample and the uncompressed strings are identical
// (this proves that the LZW compression algorithm does not affect the initial data)
bool identical = (N == uncompressed.size());
for (size_t i = 0; identical && i < uncompressed.size(); ++i)
if (sample[i] != uncompressed[i])
identical = false;
if (identical)
std::cout << "The sample and uncompressed strings are identical." << std::endl;
else
std::cout << "Error! The sample and uncompressed strings are distinct." << std::endl;
return 0;
}
Output:
Size of the sample string: 1000
Size of the compressed string: 342
Size of the uncompressed string: 1000
The sample and uncompressed strings are identical.
Authors
- Lucian Bentea (August 2009)
References
- LZW and GIF explained by Steve Blackstock, http://www.cis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html
- "Compressed image file formats: JPEG, PNG, GIF, XBM, BMP" by John Miano, Chapter 12, http://books.google.com/books?id=_nJLvY757dQC&dq=Compressed+image+file+formats&printsec=frontcover&source=bn&hl=ro&ei=ImaPSuGxIsiK_Abyh5SvAg&sa=X&oi=book_result&ct=result&resnum=4#v=onepage&q=&f=false
- Implementation of the original LZW algorithm using various programming languages, http://rosettacode.org/wiki/LZW_compression
Source Code
Source code is available when you agree to a GP Licence or buy a Commercial Licence.
Not a member, then Register with CodeCogs. Already a Member, then Login.
Members of VariableLZW
VariableLZW
The implicit constructor sets the default alphabet to the extended ASCII table (codes 0 through 255) and also sets the initial code size to 9 bits, correspondingly. It also sets the maximum code size to 12, allowing the dictionary to hold at most 4096 entries.
VariableLZW
The alphabet is the set of values that the symbols in your raw or uncompressed data can take. For instance, if you would like to encode sequences like ABAABBAAABBB, only containing the letters A and B, then the length two alphabet "AB" is sufficient.
This version of the constructor sets the alphabet to the one given through the alphabet argument.| alphabet | the alphabet to use when encoding and decoding |
SetAlphabet
Sets the alphabet to the one specified by the alphabet argument.| alphabet | the alphabet to use when encoding and decoding |
GetAlphabet
Returns the current alphabet used by the encoding and decoding methods.Returns
- the currently used alphabet
SetInitCodeSize
Sets the initial code size used when encoding and decoding data.| initCodeSize | the value to set the initial code size to |
GetInitCodeSize
Returns the current value of the initial code size.
CalculateInitCodeSize
This method calculates the initial code size to use when encoding
and decoding data, based on the specified alphabet size.
SetMaxCodeSize
Sets the maximum code size to the value specified by maxCodeSize.
The maximum code size gives the maximum number of bits used when encoding a symbol in the raw data. This value is important in the encoding algorithm because, as soon as the dictionary has a number of entries equal to two to the power of maximum code size, a clear code is added to the compressed data stream and the dictionary is reset.| maxCodeSize | the maximum code size to use when encoding and decoding data |
GetMaxCodeSize
Returns the current maximum code size used by the encoding and decoding methods.Returns
- the currently used maximum code size
Encode
This method compresses the raw data given through input using the variable code size LZW encoding algorithm.
If the calcInitCodeSize parameter is set to true, the initial code size is calculated prior to
compressing the data, using the calculateInitCodeSize method.
Assuming that sample is a string and coder is an instance of the VariableLZW class, the following line encodes sample and stores the result in the compressed string:
std::string compressed = coder.encode(sample);
| input | the raw data |
| calcInitCodeSize | specifies whether the initial code size should be calculated prior to encoding the data; this parameter is set to true by default |
Returns
- the compressed data
Decode
This method uncompresses the compressed data given through input using the variable code size LZW decoding algorithm.
Make sure that the alphabet and the initial code size are initialized prior to calling this method.
Assuming that compressed is a string resulted from an encoding operation, and coder is an instance of the VariableLZW class, the following line decodes the compressed string and stores the result in the uncompressed string:
std::string uncompressed = coder.decode(compressed);
Returns
- the uncompressed data