info lzip

5.2 The coding contexts

These contexts (‘Bit_model’ in the source), are integers or arrays of integers representing the probability of the corresponding bit being 0.

The indices used in these arrays are:

‘state’: A state machine (‘State’ in the source) with 12 states (0 to 11), coding the latest 2 to 4 types of sequences processed. The initial state is 0.
‘pos_state’: Value of the 2 least significant bits of the current position in the decoded data.
‘literal_state’: Value of the 3 most significant bits of the latest byte decoded.
‘len_state’: Coded value of length (length - 2), with a maximum of 3. The resulting value is in the range 0 to 3.

In the following table, ‘!literal’ is any sequence except a literal byte. ‘rep’ is any one of ‘rep0’, ‘rep1’, ‘rep2’ or ‘rep3’. The types of previous sequences corresponding to each state are:

State	Types of previous sequences
0	literal, literal, literal
1	match, literal, literal
2	(rep or shortrep), literal, literal
3	literal, shortrep, literal, literal
4	match, literal
5	(rep or shortrep), literal
6	literal, shortrep, literal
7	literal, match
8	literal, rep
9	literal, shortrep
10	!literal, match
11	!literal, (rep or shortrep)

The contexts for decoding the type of coding sequence are:

Name	Indices	Used when
bm_match	state, pos_state	sequence start
bm_rep	state	after sequence 1
bm_rep0	state	after sequence 11
bm_rep1	state	after sequence 111
bm_rep2	state	after sequence 1111
bm_len	state, pos_state	after sequence 110

The contexts for decoding distances are:

Name	Indices	Used when
bm_dis_slot	len_state, bit tree	distance start
bm_dis	reverse bit tree	after slots 4 to 13
bm_align	reverse bit tree	for distances >= 128, after fixed probability bits

There are two separate sets of contexts for lengths (‘Len_model’ in the source). One for normal matches, the other for repeated matches. The contexts in each Len_model are (see ‘decode_len’ in the source):

Name	Indices	Used when
choice1	none	length start
choice2	none	after sequence 1
bm_low	pos_state, bit tree	after sequence 0
bm_mid	pos_state, bit tree	after sequence 10
bm_high	bit tree	after sequence 11

The context array ‘bm_literal’ is special. In principle it acts as a normal bit tree context, the one selected by ‘literal_state’. But if the previous decoded byte was not a literal, two other bit tree contexts are used depending on the value of each bit in ‘match_byte’ (the byte at the latest used distance), until a bit is decoded that is different from its corresponding bit in ‘match_byte’. After the first difference is found, the rest of the byte is decoded using the normal bit tree context. (See ‘decode_matched’ in the source).

This document was generated on October 10, 2013 using texi2html 5.0.