Development: Compression Methods

From WxWiki

Jump to: navigation, search

Listed below are various compression methods related to some of the compression streams implemented in wxWidgets, or in the works.

Everything here will refer to a file read in one byte at a time. In general, encoding tries to guess what's next and make it smaller.

Contents

[edit] Encodings

[edit] RLE (Run Length Encoding)

  • Compact run length sequences.
  • Format in all archive formats I've seen -
  • CHARACTER DLE #OFREPEATS
    • CHARACTER - value of byte
    • DLE - usually 0x90 (No idea what DLE means :\)
    • #OFREPEATS - number of times CHARACTER repeats before another character is met

[edit] Static Huffman

Sometimes called "entropy coding".

[edit] Canonocal Static Huffman

[edit] Dynamic Huffman

Rather than making two runs of the file for character count and encoding, Dynamic Huffman readjusts the tree while scanning the file, so there's no need to run through a file twice.

There are two algorithms for Dynamic Huffman - FGK & V. V is FGK with more strictly enforced rules, and is preferred over FGK.

[edit] FGK

[edit] V

[edit] Arithmatic Encoding

[edit] Shannon-Fado

[edit] Burrows-Wheeler Transform

An article defining this compression method can be found here.

[edit] Implementations

  • Probably the best overall implementation is Karl Malbrain's.
    • Note that this is (exactly?) the same as bzip2, except for CRC checking and magic bytes. A local copy is available at Karl Marlbrian BWT.
  • One for embedded machines that uses a low amount of memory.
    • Uses a whole slew of different compression methods, even some rare ones. A local copy is available at Embedded BWT.
  • The most popular implementation is bzip2 (zlib license).

[edit] Splay

[edit] LZ78

[edit] LZW

[edit] LZC

Version of LZW that begins with a specified number for the "code size".

  • I.E. number of bytes in the hash table. 255 - table clear
  • Zip Squash - Like LZC only partial clearing and compiler increases code size
    • 255, 1 Code Size Increase... 255, 2 Partial Clear
      • PARTIAL CLEAR - confusing, best explained with some psuedo-code
        • loop through hashes
          • Mark as unused (or with a mask)
        • loop through hashes
          • In use? Unor the unused value...
        • loop through hashes
          • Not in use? Clear this hash index.

[edit] LZAM

[edit] LZB

[edit] LZ77

[edit] LZSS

[edit] LZH

[edit] LZAR

[edit] Various ARC Compression Types

For crunching, distilling, etc.

[edit] Implementations

People have devised some bizarre variants over the years, and given them trendy codenames.

[edit] Crunching (Rev Uncrunching)

Optional Packing followed by either LZW or LZC

[edit] Crushing (Rev ???)

[edit] Deflating (Rev Inflating)

[edit] Distilling (Rev ???)

[edit] Imploding (Rev Exploding)

[edit] Shrinking (Rev Expanding)

Like LZW only partial clearing and compiler increases code size

  • Code 255 then Code 1 - Size Increase...
  • 255, 2 Partial Clear
    • PARTIAL CLEAR - confusing, best explained with some psuedo-code
      • loop through hashes
        • Mark as unused (or with a mask)
      • loop through hashes
        • In use (uses another hash index)? Unor the unused value...
      • loop through hashes
        • Not in use? Clear this hash index.

[edit] Squashing (Rev Unsquashing)

[edit] Squeezing (Rev Unsqueezing)

Canonical Static Huffman

[edit] Dumb Alg

Origin: Origin

Theory: Theory

Algorithm: Algorithm

Encoding:

Decoding:

[edit] References

  • Look for version 1.3 (Bold because most places only have 1.2) of the LDS (lossless datacompression sources) kit net - it contains the stuff that everyone uses and almost all the stuff inside is public domain! Be careful - LZAH.c is NOT public domain.
  • DataCompression.info - Links and resources for many algorithms.
  • Karl Malbrain - Public domain implementations of a few algorithms.
  • Andy McFadden's Data Compression Page - Some of the best tutorials (MUST READ if new to compression).
    • Lacks explanation on Huffman and others, unfortunately.
  • RocketAware: Data Compression - Listing of various RFCs, etc.
  • Data compression - Information compiled on Wikipedia.
Personal tools