LZW编解码算法实现及编码效率分析

最新推荐文章于 2024-11-27 14:12:33 发布

原创

最新推荐文章于 2024-11-27 14:12:33 发布 · 3.6k 阅读

本文详细介绍了LZW编码算法的原理、思想和实现过程，通过实例展示了编码和解码的操作，并分析了编码效率。文章提供了完整的C代码实现，并以TXT文件为例，探讨了不同格式文件的压缩效率。

一、编码原理

第二类算法的想法是企图从输入的数据中创建一个“短语词典（dictionary of the
phrases）”，这种短语可以是任意字符的组合。编码数据过程中当遇到已经在词典中出现的“短语”时，编码器就输出这个词典中的短语的“索引号”，而不是短语本身。这种编码方法被称为 LZW 压缩编码。
LZW只输出代表词典中的字符串（String）的码字（code
word)。这就意味着在开始时词典不能是空的，它必须包含可能在字符流中出现的所有单个字符。即在编码匹配时，至少可以在词典中找到长度为1的匹配串。

二、LZW编码算法的思想

1、算法总体思路

在这里插入图片描述

2、举例说明

假设输入的字符序列为： $a a a b b b b b b a a b a a b a$
编码过程如下：

C	P + C	P + C是否在字典中	P	字典新增项	输出码字
–	–	–	0	a=0, b=1	–
a	a	Y	a	–	–
a	aa	N	a	aa=2	0
a	aa	Y	aa	–	–
b	aab	N	b	aab=3	2
b	bb	N	b	bb=4	1
b	bb	Y	bb	–	–
b	bbb	N	b	bbb=5	4
b	bb	Y	bb	–	–
b	bbb	Y	bbb	–	–
a	bbba	N	a	bbba=6	5
a	aa	Y	aa	–	–
b	aab	Y	aab	–	–
a	aaba	N	a	aaba=7	3
a	aa	Y	aa	–	–
b	aab	Y	aab	–	–
a	aaba	Y	aaba	–	7

输出序列为：0214537

解码：

PW	P	CW	C	P + C	字典新增项	输出字符
–	–	0	a	a	a=0, b=1	a
0	a	2	a	aa	aa=2	aa
2	aa	1	b	aab	aab=3	b
1	b	4	b	bb	bb=4	bb
4	bb	5	b	bbb	bbb=5	bbb
5	bbb	3	a	bbba	bbba=6	aab
3	aab	7	a	aaba	aaba=7	aaba
7	aaba	–	–	aaba	–	–

输出字符流为aaabbbbbbaabaaba,与输入字符流一致。

注：如果遇到cw值在字典中查找不到的情况，设c=X，则P+C=P+X，该字符串必不在字典中，将其录入字典，当下一次cw值在字典中查找到该P+X字符串时，取其第一个字符为c，则此c为录入时的P的第一个字符，也即X，故X值即为P的第一个字符。
即，若cw值不在字典中，则c对应的字符记为P的第一个字符。

三、LZW算法的实现

1、数据结构分析

仍以上述字符串为例，假设输入的字符序列为aaabbbbbbaabaaba

在这里插入图片描述

树是动态建立的，且树中每个节点可能存在多个子节点。因此数据结构应该设计成一个节点可拥有任意个子节点，但无需为其预留空间。
将树驻留在一个节点数组中，每个节点至少有两个字段：一个字符和指向母节点的指针。
数据结构中没有指向子节点的指针，对沿着树从一个节点到其子节点的操作，实现方法如下：

尾缀字符（suffix）
母节点（parent）
第一个孩子节点( firstchild )
下一个兄弟节点（nextsibling）

树用数组dict[ ]表示，数组下标用pointer表示，所以dict[pointer]表示一个节点：
dict[pointer].suffix
dict[pointer].parent
dict[pointer].firstchild
dict[pointer].nextsibling

2、主要功能模块分析

初始化词典

void InitDictionary( void)
{
   
   
 int i;
 for( i=0; i<256; i++){
   
   
  dictionary[i].suffix = i;
  dictionary[i].parent = -1;
  dictionary[i].firstchild = -1;
  dictionary[i].nextsibling = i+1;
 }
 dictionary[255].nextsibling = -1;
 next_code = 256;
}

查找词典中是否有字符串

int InDictionary( int character, int string_code)
{
   
   
 int sibling;
 if( 0>string_code) return character;
 sibling = dictionary[string_code].firstchild;
 while( -1<sibling){
   
   
  if( character == dictionary[sibling].suffix) return sibling;
  sibling = dictionary[sibling].nextsibling;
 }
 return -1;
}

将新串加入词典

void AddToDictionary( int character, int string_code)
{
   
   
 int firstsibling, nextsibling;
 if( 0>string_code) return;
 dictionary[next_code].suffix = character;
 dictionary[next_code].parent = string_code;
 dictionary[next_code].nextsibling = -1;
 dictionary[next_code].firstchild = -1;
 firstsibling = dictionary[string_code].firstchild;
 if( -1<firstsibling){
   
    // the parent has child
  nextsibling = firstsibling;
  while( -1<dictionary[nextsibling].nextsibling ) 
   nextsibling = dictionary[nextsibling].nextsibling;
  dictionary[nextsibling].nextsibling = next_code;
 }else{
   
   // no child before, modify it to be the first
  dictionary[string_code].firstchild = next_code;
 }
 next_code ++;
}

四、完整代码

lzw.c

#include <stdlib.h>
#include <stdio.h>
#include "bitio.h"
#define MAX_CODE 65535
#pragma warning(disable:4996);

struct 
{
   
   
 int suffix;
 int parent, firstchild, nextsibling;
} dictionary[MAX_CODE+1]