# Compression Techniques

maths, coding-theory

The Data Compression course covers a variety of compression techniques that must be learned. Some are simple, and some are complicated, but all are not as hard as learning how computers actually work.

# Lossless Techniques

### Shannon Coding

Possibly the simplest, this is purely for research and isnt really used anywhere. We will start with the following properties:

$$ A={x_1,\ x_2,\ x_3} $$ $$ P={\frac{1}{2},\frac{3}{8},\frac{1}{8}} $$

Now we start the steps:

- Using the probabilities in $P$ create the cumulative probabilities list $P_C$, starting at $0$: $$ P_C={0,\frac{1}{2},\frac{7}{8}}={0,\ 0.5,\ 0.875} $$ This will be used, once converted to binary, to be a the representations of our words. This will require us to know how many bits of the binary form to include however, that is step 2.
- To find the number of bits each word requires we need to find the self-information of the word. This is done using the self-information function: $$ I(a_i) = \lceil-log_2(p_i)\rceil $$ This gives: $$ I(A)={1,2,3} $$ Which represents the lengths of each of the expansions for the probabilities found in step 1.
- Now we can convert $P_C$ to binary up to the length given in $I(A)$, this can be done any way you want. I use the multiplying expansion technique (not shown). This gives:

$$ 0\rightarrow 0\quad | \quad 0.5\rightarrow 10,\quad | \quad 0.875\rightarrow 111 $$

These are the expansions for the respective codewords: $$ x_1=0\quad | \quad x_2=10,\quad | \quad x_3=111 $$ This gives a uniquely decodable prefix code but it may not be optimal, the average compression length is: $$ L_{avg}=\sum^{|A|}_{i=1} l_i p_i = (1\cdot \frac{1}{2})+(2\cdot \frac{3}{8})+(2\cdot \frac{1}{8}) = \frac{13}{8} = 1.625 $$

### Shannon-Fano Coding

Shannon-Fano coding is very hard to describe mathematically, It leverages the property that binary trees will create prefix codes if the leaves represent words. we will start with a longer code alphabet than before: $$ A={a,\ b,\ c,\ d,\ e,\ f} $$ $$ P={0.05,\ 0.1,\ 0.12,\ 0.13,\ 0.17,\ 0.43} $$

- The first step is to sort the alphabet by the probabilities: $$ A={f,\ e,\ d,\ c,\ b,\ a s} $$ $$ P={0.43,\ 0.17,\ 0.13,\ 0.12,\ 0.1,\ 0.05 } $$
- Now the probabilities are sorted, make this the "root" of the tree, then split the list in half by weighting the probabilities so they're equal on both sides. Now append 0 to the left groups members, and 1 to the right. $$ 0:[f:0.43,e:0.17]\ |\ 1:[d:0.13,c:0.12,b:0.1,a:0.05] $$
- Repeat step 2 for each of the groups in until you get groups of order 1, this is essentially constructing a tree: $$ 00:[f:0.43] |\ 01:[e:0.17]\ |\ 10:[d:0.13,c:0.12]\ |\ 11:[b:0.1,a:0.05] $$ $$ 00:[f:0.43]\ |\ 01:[e:0.17]\ |\ 100:[d:0.13]\ |\ 101:[c:0.12]\ |\ 110:[b:0.1]\ |\ 111:[a:0.05] $$ or, as a tree:

This gives a uniquely decodable prefix code but it may not be optimal, the average compression length is: $$ L_{avg}=\sum^{|A|}_{i=1} l_i p_i = (2\cdot 0.43)+(2\cdot 0.17)+(3\cdot 0.13)+(3\cdot 0.12)+(3\cdot 0.1)+(3\cdot 0.05)= 2.4 $$

### Hufman Coding

Hufman coding will always give an optimal tree, but again is hard to describe mathematically. It uses forrests of nodes that are joined in certain orders to create a tree the same as Shanon-Fano coding. We will use the same code as in the Shanon-Fano example: $$ A={a,\ b,\ c,\ d,\ e,\ f} $$ $$ P={0.05,\ 0.1,\ 0.12,\ 0.13,\ 0.17,\ 0.43} $$

- A node has 2 properties; the contained elements & the cumulative probability of the elemens in the node. For all elements in the alphabet, create a node: $$ (a,\ 0.05)\quad (b,\ 0.1)\quad (c,\ 0.12)\quad (d,\ 0.13)\quad (e,\ 0.17)\quad (f,\ 0.43) $$
- Create a single new node, this is the parent of the 2 nodes in your set with the lowest cumulative probabilities: $$ (ab:(a,\ 0.05)\ (b,\ 0.1),0.15)\quad (c,\ 0.12)\quad (d,\ 0.13)\quad (e,\ 0.17)\quad (f,\ 0.43) $$
- Repeat step 2 over and over until you have a tree: $$ (ab:(a,\ 0.05)\ (b,\ 0.1),0.15)\quad (cd:(c,\ 0.12)\ (d,\ 0.13),0.25)\quad (e,\ 0.17)\quad (f,\ 0.43) $$ $$ (abe:(ab:(a,\ 0.05)\ (b,\ 0.1),0.15)\ (e,\ 0.17),0.32)\quad (cd:(c,\ 0.12)\ (d,\ 0.13),0.25)\quad \quad (f,\ 0.43) $$ $$ \ldots $$
- add numbers to the tree branches and append these numbers until reaching a node. these are the codewords.

It is easier to see as a gif:

once the final tree is given numbers can be applied to the vertacies and the codewords found. Because of the optional 0-1 choice this will not produce unique codes. We can produce the following coded alphabet: $$ a=0101,\ b=0100,\ c=001,\ d=000,\ e=011,\ f=1 $$ (it can be seen by replacing 1s with 0s in the above that this is non unique). This wll also produce an optimal code: $$ L_{avg}=\sum^{|A|}_{i=1} l_i p_i = (4\cdot 0.05)+(4\cdot 0.1)+(3\cdot 0.12)+(3\cdot 0.13)+(3\cdot 0.17)+(1\cdot 0.43) = 2.29 $$