this article comes from hugi issue 12 ú released in september 1998 ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ IMAGE COMPRESSION: BTC (BLOCK TRUNCATION CODE) DAVE ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ BTC (Block Truncation Code) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Doc Written by David Palomar (DAVE) email: dave@bglinks.com 1. INTRODUCTION ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This doc explains the BTC scheme for compressing image data. Recently I was speaking with Biki and Aslan in #coders channel. We were talking about digital image compression. Biki and Aslan (as other coders) told me they were looking for docs explaining algorithms to compress image data... and they found NOTHING... I'm a lover of image compression algorithms and I know a lot of them... so I promised these coders to help with these algos. I think this DOC will be helpful for coders... I hope :) 2. BTC ÄÄÄÄÄÄ First of all, BTC is a (new?) image compression scheme that was proposed by Delp and Mitchell, whose perfomance is comparable to the DTC, but which has a much simpler hardware implmentation. In Block Truncation Code, an image is, firstly, segmented into NxN blocks of pixels. Well, in the most cases the size is 4x4, but we can choose another size, like 8x8... Sencondly, a two level output is choosen for every block and bitmap is also encoded using 1 bit for every pixel. The main idea is that the two levels are selected independently for every NxN block. So, encoding the block is a simple binarization process because every pixel of the block is truncated to 1 or 0 binary representation choosing the reconstruction level(the two level output). Example: encoding 16 bytes (4x4 pixels block) of an image, we should compute an 'statistics' of this block (the two level output) and then, we should binarize this block with our two level output assigning a 0 or 1 binary... the results (if we're working with 256 grays levels) compressed: 2 bytes indicating the two levels output 2 bytes indicating the encoded bitmap (2bytes = 8+8bits = 16bits) Decoding is very simple because we only place the appropiate reconstruction level for every bit indicating 0 or 1. Consider the following 4x4 block of pixels: ³ 79 80 75 77 ³ X = ³ 77 79 79 79 ³ <ÄÄÄ This is our original image ³ 72 72 79 80 ³ ³ 77 79 79 87 ³ Think that our "quantizer" computes the mean of the block and the two levels output ('a' and 'b'). The mean is 78 (in the block). So we can choose the two levels output by pure inspection (looking at the original image). To be realistic, the two levels are also computed using an algorithm. Look the next block and tell me if it is representative... Note that 1=a and 0=b. ³ 1 1 0 0 ³ Y = ³ 0 1 1 1 ³ <ÄÄÄ This is our 'encoded' image ³ 0 0 1 1 ³ ³ 0 1 1 1 ³ ... where 1 is indicating that the pixel is greater than the mean and 0 indicates that it's below the mean. Finally, take a look the reconstructed block: ³ 80 80 73 73 ³ X'= ³ 73 80 80 80 ³ <ÄÄÄ This is our 'decoded' image ³ 73 73 80 80 ³ ³ 73 80 80 80 ³ These are the statistics of the block (this was really computed): Mean = 78 Variance = 3.351772 Q = 10 Level 1(a) = 73 Level 2(b) = 80 As I said before, we have 256 possibles values (grays), 256=2^8... this means we have 8 bits to represent every pixel... this tells us that the total bit rate is: (8+8+16)/16 = 2.0 bits / pixel ==> 2.00 bpp And it also tells us that a 64k bitmap can be encoded to 16k only... yup :) ... but you should note that this basic BTC will introduce what I call artifacts (something similar to noise, but different), but don't worry because the high frequencies of the image (SHARP) will be encoded with high fidelity. Other algorithms I know typically blur sharp edges. Well, it's time to speak about Quantizer Design... let's go! 2.1 QUANTIZER DESIGN ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ In our example we made the thresholding process (to choose 2 levels output) by pure inspection. The quantizer we want to made MUST preserve the 1st and 2nd moments of a particular block minimizing the mean-squared reconstruction error. In fact, there are A LOT of ways to make a quantizer. Most BTC algos I know use 'moment preserving' quantizers. This means these quantizers preserve a number of moments in the block (i.e: the mean and the variance, desviation...). These quantizers are designed for finding the most important parameters in the image, the most important visual information... ya know: the MEAN. There is no problem to use the MEDIAN instead of the MEAN... this would be another quantizer that, probably, will work better than the algo. that uses the MEAN. Well, at this point a basic MATH will be used. I hope you dont worry for this... in fact, REAL PROGRAMMERS AREN'T AFRAID OF MATH. So here are the equations to preserve the MEAN and VARIANCE of a particular nxn block of a image: m = ný X1, X2, X3, ..., Xm are the pixel values Ä m X = (1/m) ä Xi i=1 ÄÄ m Xý = (1/m) ä Xiý i=1 Ä /Ä\ ý åý = Xý - ( X ) ³ \ / <ÄÄÄÄÄ This is the MEAN, and THEN SQUARE ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ First SQUARE, then MEAN Now, i'll define the threshold value (Xt) to binarize later: ³ a, if Xi < Xth Xi = ³ ³ b, if Xi >= Xth where i= 1, 2, ..., m To preserve the firsts moments, where q is the number of Xi's greater than or equal (>=) to the threshold (the mean)... in other words: Ä mX = (m-q)a + qb ÄÄ mXý = (m-q)aý + qbý ... and solving this system for 'a' and 'b' we have: (note that sqrt = square root) Ä a = X - åsqrt(q/(m-q)) Ä b = X + åsqrt((m-q)/q) Now, we must send the nxn bitmap saying which pixels to reconstruct with the 'a' outlevel and which with the 'b' outlevel and also we have to include some info saying the two levels output: 'a' and 'b'. At this point, we send 'a' and 'b' directly using 8 bits for everyone (if we're using 256 grays levels). As I said before, you'll obtain a 4:1 factor compression (2.00 bpp) because 8 bits are used for 'a' and 'b' and also 16 bits for the bitmap = 8+8+16 = 32 bits = 4 bytes. A 64K image will be compressed (ALWAYS) to 16K only. Ok, this is ALL you need to make a simple BTC encoder... it will work well, but... continue reading and you'll learn a lot MUCH MORE about BTC fascinating scheme for digital compression image. :) 2.2. DESIGNING A FAST QUANTIZER ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ I think that currently you have noticed that squaring and square root operations are required to implement the quantizer... and I think that you know that these kind of operations are slow (...and you would use the FPU... and this is NOT what one desires). An approach that avoids these operations is what is called as AMBTC, which stands for Absolute Moment Block Truncation Code. The quantizer is designed to preserve ABSOULUTE moments. I'll give you the equs: m = nxn = ný Ä m X = (1/m) ä Xi i=1 m Ä à = (1/m) ä ³Xi - X ³ i=1 Ä a = X - (mà)/2(m-q) Ä b = X + (mà)/2q Remember that q is the number of Xi's greater than or equal (>=) to the threshold (the mean). 3. INCREASING THE PERFOMANCE OF THE BTC ALGO. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ We'll see some non standard BTC scheme variations... these variations, of course, will increase the perfomance of our encoder (in bpp), but also will increases the costs and the complexity of our algorithm, naturally. A very simple technique to reduce the bit rate is to use fewer than 8 bits for the 2 level output (or the mean and the variance)... for example using only 6 bits for each. This will result lower quality in the reconstructed image. 3.1. BITMAP OMISSION ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This is a very simple way to reduce bit rate... this is based on the fact that in some images certain regions (blocks) exist, where the variance is very little (of no meaning for our eyes)... these regions are zones of the images where is not a considerable color change. In other words, in some blocks of the image, 'a' and 'b' are very approximated (variance is little) and, for this reason, there is NO need to send 'a' and 'b' together, we should only send 'a' or 'b' with the block bitmap encoded. These techniques will work very well with images with large uniform zones. 3.2. VQ FOR THE BITMAP ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ VQ stands for Vector Quantization, and this is another digital image compression scheme. I assume you know this scheme, if you dont know email me, maybe i'll write another DOC explaining this scheme. This method will consider the bitmap as a m-dimensional vector. I remind you that m=NxN. For example for 4x4 blocks, we have a 16-dimensional vector. It will contain 16 values (0 or 1): b0, b1, b2, b3, ..., b15. Well, this technique is based in considering that, in fact, when we're encoding the bitmap (the 16 bit vector encoded) we should note that in the whole image, these bitmaps will be repeated (or very approximated). This means you CAN GENERATE a CODEBOOK with a training set of bitmap vectors. Using this method you'll achieve 1.5 bpp, which I think is a good bit rate. I think this method (combined with my series calculus algorithm) is better than the others schemes i'll comment here. J&I proposed me to send the 'a' and 'b' of each block of the whole image first and then send the bitmaps of the whole image because doing it we would make a simple Huffman encoding for the bitmaps and the bit rate could be reduced a lot. Think that bitmaps could be repeated A LOT in the encoding process: 52,64, 23,54, 52,64, 23,67, 23,54................. ^ ^ repeated repeated Now, I'll propose a simple BTC combined with VQ to make the text more easy to understand it. Example: we're compressing a 64K image. We want to use BTC combined with VQ, so we start encoding the bitmap and also finding 'a' 'b' and when we have encoded the bitmap, then we'll compare this encoded bitmap with all of our bitmaps that are in our particular codebook (containing a 256 m-vectors, 8 bits). Finally, we'll select the BEST aproximation (usually this will be a 8 bits index indicating the bitmap in codebook). We'll send 'a' 'b' and the 8 bit index. Got the idea? Very easy, I think. 3.3 JOINT QUANTIZATION ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ I'll NOT explain extensively this method... but, take a look: this is based on the fact that exact value of the mean isn't important in a high variance area (when variance varies a lot), but it's important when variance does NOT vary. In other words, when variance is LITTLE we have a couple of possible values for the mean, but when variance is GREATER we have only a few values for the mean. Using this technique, the mean and the variance is jointly quantized using only 10 bits for both. Got the idea? I dislike this method.... using it you'll obtain a bit rate of 1.63 bpp. 4. ABS-BTC ÄÄÄÄÄÄÄÄÄÄ ABS-BTC stands for Adaptive Block Size - Block Truncation Code and, how the name says, this scheme consists on considering the block size when we're encoding the bitmap... coz fixing the block size to 4x4, for example, will work well... but it's also certain that some specific images could compress much more than using a fixed nxn for the entire bitmap. As I said before, the High Freq. zones in the image (sharp) will require more info encoding (precission) and the Low freq. zones in the images can be encoded which lower precission. This means we can vary the size of the block depending on what we're encoding. These are the steps to encode using ABS-BTC: þ Segment the image in 4x4 blocks. þ Compute 'a' and 'b'. þ Now, every block will be classified according with the 'a' and 'b' levels output. We'll use 3 mode operations: Mode 1³ ÄÄÄÄÄÄÙ Th1, where Th1 is a threshold if ³a-b³ <= Th1 We'll skip the bitmap and we send only the mean of the 4x4 block. Mode used in Low Freq. zones in the image. Mode 2³ ÄÄÄÄÄÄÙ Th2, Th3, where Th2 and Th3 are both thresholds but different if: Th1 < ³a-b³ < Th2 ... AND... ÚÄÄ ÄÄ¿ ³ ³ ³ ä ³Xi-a³ + ä³Xi-b³ ³ ³ Ä Ä ³ ³ zz Xi< X zz Xi >= X ³ <= Th3 ³ ³ ÀÄÄ ÄÄÙ where zz = for all the possible values. Then we'll send 'a', 'b' and the bitmap. Mode 3³ ÄÄÄÄÄÄÙ We'll select this mode when Mode 1 and Mode 3 are NOT satisfying. We'll segment the 4x4 block to 2x2 block, then we'll send 'a', 'b' and the bitmap for the two segmented blocks. (Used in the High Freq. zones in the image, we need precission.) ... and don't forget to send a header for every block indicating the mode that was selected... I suggest you to use 2 bits field into the bitmap header (you'll NOT introduce considerable artifacts in the resulting reconstructed image... :). If your encoded bitmap is: 1101000100101110 = 16 bits = 2 bytes and you selected mode 1 then: 0101000100101110 = 16 bits = 2 bytes Ä¿ ³ ÀÄ> 01 = 1 (field) ... and NO considerable artifacts were introduced (we changed only a bit). 5. SOME BASIC CODE? ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ I'll write some basic code for help you to get the basic idea... so don't expect so much. I'll write the standard BTC. I'm assuming we have read a file, and have stored it in a bitmap pointer with reserved mem. I also assuming our bitmap is 320x200 (64k). I put C language here coz many people know it, but I prefer Assembler, of coz ;) unsigned char *bitmap; unsigned int x,y,mean,a,b,q; float temp,a,b,sig; btc() { mean=0; es1=0; es2=0; for(y=0;y<4;y++){ for(x=0;x<4;x++){ mean+=bitmap[(y*320)+x]; es1=es1+bitmap[(y*320)+x]; es2=es2+pow(bitmap[(y*320)+x],2); } } mean=mean/16; es1=es1/16; es2=es2/16; sig=sqrt(es2-pow(es1,2)); /* Compute q */ q=0; for(y=0;y<4;y++){ for(x=0;x<4;x++){ if(bitmap[(y*320)+x]>=es1){ q++; } } } /* compute a and b output levels */ if(q!=0){ if(q!=16){ temp=sqrt(q/(float)(16-q)); } else { temp=4; } temp=temp*sig; a=(int)(es1-temp); temp=sqrt((float)(16-q)/q); temp=temp*sig; b=(int)(es1+temp); } /* Now, you have the 'a' and the 'b' output levels. So I leave for you to send the 'a' and the 'b' and also the encoded bitmap (2 bytes) to another buffer (or the same buffer, bitmap, if you want). */ } 6. LAST NOTES ÄÄÄÄÄÄÄÄÄÄÄÄÄ Well, at this moment I'm very tired... If you have a doubt or want to ask something, then email me: email: dave@bglinks.com or dave_es@yahoo.com If more than 10 coders email me answering for VQ or other known / unknown digital image compression schemes, I promise to release more DOCS... ...but, please, dont ask me about my series calculus algo. for compressing images coz for this moments it's a bit secret (and, of course, currently I'm working in it). I'm a regular on #coders, so maybe you'll find me in that channel for talking about it. 7. GREETS ÄÄÄÄÄÄÄÄÄ Special Greetings go to: Gandalf, Shock, Gyr, Kalms, Ravian, Altair Yoghurt, Legend, Biki, Aslan, DjBlaze Maza, Funktion, HarriH, ... ... and, of course, greets for all regular people in #coders and #trax channels. P.S.: Hey J&I! When We start a REAL project (a game, for example :). X-Static: I hope you'll make some cool tunes and graphics for next Euskal Party. ;) - DAVE/PhyMoSys dave@bglinks.com