
/*****

repost 8-22-96

entropy : 6.039

*****/

/*****

Hi all,

this is a C source code; it compiles and runs.

I am posting this to demonstrate the kind of traps
that people can fall into when trying to create a
compressor without really understanding the theory
of communication.

I'm sure the experts here will get a grin out of
this, it's worth a gander.

This file contains a WORKING compressor which will
pack random bytes to an entropy of 6 bits.  Yes,
it does work.  Does it violate the "laws of compression" ?
No.  An explanation can be found at the bottom of
this file.

*********/


/***

C-Code for a Compressor:

compressor & decompressor are the codec

the Com_ routines are the generalized
communication interace between the
coder and the decoder

main() is a stub for testing

***/

#include <stdlib.h>
#include <stdio.h>

/*** protos **/

void Com_WriteBits(int val,int len);
int Com_SizeOfInput(void);
int Com_ReadBits(int len);

/******* the codec *******/

int compressor( int b ) {
	int curbase,numbits;

	curbase = 0;
	numbits = 0;

	for(;;) {

		if ( (b - curbase) < (1<<numbits) ) {
			Com_WriteBits( b - curbase , numbits );
			return numbits;
		}
		curbase += (1<<numbits);
		numbits ++;
	}
}

int decompressor(void) {
	int ret;
	int numbits;
	int base,i;
	numbits = Com_SizeOfInput();
	base = 0;
	for(i=0;i<numbits;i++) base += (1<<i);
	ret = Com_ReadBits(numbits);

	return( base+ret);
}

/*** cheezy fake Com_ routines ***/

int stored_val = -1;
int stored_len = -1;

void Com_WriteBits(int val,int len) {
	printf("Com_WriteBits(%i,%i);\n",val,len);
	stored_val = val;
	stored_len = len;
}

int Com_SizeOfInput(void) {
	return(stored_len);
}

int Com_ReadBits(int len) {
	return(stored_val);
}

/********** main ************/

int main() {
	int byte,got;
	float Entropy;

	puts("Trying all bytes:");

	for(byte=0;byte<256;byte++) {
		compressor(byte);
		got = decompressor();

		printf("%i -- %i\n",byte,got);

		if ( got != byte ) {
			printf("failed on byte : %02X , got : %02X\n",byte,got);
			return 10;
		}
	}

	puts("Succeeded.");

	puts("\nComputing Entropy of random data, with our codec");

	Entropy = 0.0;
	for(byte=0;byte<256;byte++) {
		Entropy += compressor(byte);
	}
	Entropy /= 256; /* average of 256 possible */

	printf("Entropy = %f bits\n",Entropy);

	puts("Done");
	return 0;
}

/******************

The explanation:

How does this coder successfully send 8 bits of
data in an average of 6 bits?

It makes illegal assumptions about the Com_
communication channel.

In compression, you are not allowed to assume
that the Com_ channel preserves length.

Why not?  Well, many channels do not.  Almost
none preserve the length in bits (instead they
round to bytes) so that alone would kill this
coder.  But what if you have access to a chanel
that preserves the length in bits?  Then, yes
you can use this coder; you are not violating
entropy, you are simply using the bandwidth of
the channel provided in the length-transmission,
and in that case this is actually an Ok thing
to do.

You might ask - why not write a coder like
this that treats the whole file as one huge
number, and instead of doing numbits++ in
each loop, does numbits += 8, so that we
write byte-aligned output.  Yes, that would
work, as long as your transmission channel
preserved length.  Note, however, that
you must also specify the length of the input,
or you will be confused between a low-valued
long file and a high-valued short file, and
it takes more bits to specify the length of
input, than the number of bits saved.

******************/
