Without further adieu, here is the program zpipe.c:
/* zpipe.c: example of proper use of zlib's inflate() and deflate() Not copyrighted -- provided to the public domain Version 1.4 11 December 2005 Mark Adler */ /* Version history: 1.0 30 Oct 2004 First version 1.1 8 Nov 2004 Add void casting for unused return values Use switch statement for inflate() return values 1.2 9 Nov 2004 Add assertions to document zlib guarantees 1.3 6 Apr 2005 Remove incorrect assertion in inf() 1.4 11 Dec 2005 Add hack to avoid MSDOS end-of-line conversions Avoid some compiler warnings for input and output buffers */
#include <stdio.h> #include <string.h> #include <assert.h> #include "zlib.h"
#if defined(MSDOS) || defined(OS2) || defined(WIN32) || defined(__CYGWIN__) # include <fcntl.h> # include <io.h> # define SET_BINARY_MODE(file) setmode(fileno(file), O_BINARY) #else # define SET_BINARY_MODE(file) #endif
#define CHUNK 16384
/* Compress from file source to file dest until EOF on source. def() returns Z_OK on success, Z_MEM_ERROR if memory could not be allocated for processing, Z_STREAM_ERROR if an invalid compression level is supplied, Z_VERSION_ERROR if the version of zlib.h and the version of the library linked do not match, or Z_ERRNO if there is an error reading or writing the files. */ int def(FILE *source, FILE *dest, int level) {
int ret, flush; unsigned have; z_stream strm; unsigned char in[CHUNK]; unsigned char out[CHUNK];
deflateInit() is called with a pointer to the structure to be initialized and the compression level, which is an integer in the range of -1 to 9. Lower compression levels result in faster execution, but less compression. Higher levels result in greater compression, but slower execution. The zlib constant Z_DEFAULT_COMPRESSION, equal to -1, provides a good compromise between compression and speed and is equivalent to level 6. Level 0 actually does no compression at all, and in fact expands the data slightly to produce the zlib format (it is not a byte-for-byte copy of the input). More advanced applications of zlib may use deflateInit2() here instead. Such an application may want to reduce how much memory will be used, at some price in compression. Or it may need to request a gzip header and trailer instead of a zlib header and trailer, or raw encoding with no header or trailer at all.
We must check the return value of deflateInit() against the zlib constant Z_OK to make sure that it was able to allocate memory for the internal state, and that the provided arguments were valid. deflateInit() will also check that the version of zlib that the zlib.h file came from matches the version of zlib actually linked with the program. This is especially important for environments in which zlib is a shared library.
Note that an application can initialize multiple, independent zlib streams, which can operate in parallel. The state information maintained in the structure allows the zlib routines to be reentrant.
/* allocate deflate state */ strm.zalloc = Z_NULL; strm.zfree = Z_NULL; strm.opaque = Z_NULL; ret = deflateInit(&strm, level); if (ret != Z_OK) return ret;
/* compress until end of file */ do {
If there is an error in reading from the input file, the process is aborted with deflateEnd() being called to free the allocated zlib state before returning the error. We wouldn't want a memory leak, now would we? deflateEnd() can be called at any time after the state has been initialized. Once that's done, deflateInit() (or deflateInit2()) would have to be called to start a new compression process. There is no point here in checking the deflateEnd() return code. The deallocation can't fail.
strm.avail_in = fread(in, 1, CHUNK, source); if (ferror(source)) { (void)deflateEnd(&strm); return Z_ERRNO; } flush = feof(source) ? Z_FINISH : Z_NO_FLUSH; strm.next_in = in;
/* run deflate() on input until output buffer not full, finish compression if all of source has been read in */ do {
strm.avail_out = CHUNK; strm.next_out = out;
The parameters to deflate() are a pointer to the strm structure containing the input and output information and the internal compression engine state, and a parameter indicating whether and how to flush data to the output. Normally deflate will consume several K bytes of input data before producing any output (except for the header), in order to accumulate statistics on the data for optimum compression. It will then put out a burst of compressed data, and proceed to consume more input before the next burst. Eventually, deflate() must be told to terminate the stream, complete the compression with provided input data, and write out the trailer check value. deflate() will continue to compress normally as long as the flush parameter is Z_NO_FLUSH. Once the Z_FINISH parameter is provided, deflate() will begin to complete the compressed output stream. However depending on how much output space is provided, deflate() may have to be called several times until it has provided the complete compressed stream, even after it has consumed all of the input. The flush parameter must continue to be Z_FINISH for those subsequent calls.
There are other values of the flush parameter that are used in more advanced applications. You can force deflate() to produce a burst of output that encodes all of the input data provided so far, even if it wouldn't have otherwise, for example to control data latency on a link with compressed data. You can also ask that deflate() do that as well as erase any history up to that point so that what follows can be decompressed independently, for example for random access applications. Both requests will degrade compression by an amount depending on how often such requests are made.
deflate() has a return value that can indicate errors, yet we do not check it here. Why not? Well, it turns out that deflate() can do no wrong here. Let's go through deflate()'s return values and dispense with them one by one. The possible values are Z_OK, Z_STREAM_END, Z_STREAM_ERROR, or Z_BUF_ERROR. Z_OK is, well, ok. Z_STREAM_END is also ok and will be returned for the last call of deflate(). This is already guaranteed by calling deflate() with Z_FINISH until it has no more output. Z_STREAM_ERROR is only possible if the stream is not initialized properly, but we did initialize it properly. There is no harm in checking for Z_STREAM_ERROR here, for example to check for the possibility that some other part of the application inadvertently clobbered the memory containing the zlib state. Z_BUF_ERROR will be explained further below, but suffice it to say that this is simply an indication that deflate() could not consume more input or produce more output. deflate() can be called again with more output space or more available input, which it will be in this code.
ret = deflate(&strm, flush); /* no bad return value */ assert(ret != Z_STREAM_ERROR); /* state not clobbered */
have = CHUNK - strm.avail_out; if (fwrite(out, 1, have, dest) != have || ferror(dest)) { (void)deflateEnd(&strm); return Z_ERRNO; }
The way we tell that deflate() has no more output is by seeing that it did not fill the output buffer, leaving avail_out greater than zero. However suppose that deflate() has no more output, but just so happened to exactly fill the output buffer! avail_out is zero, and we can't tell that deflate() has done all it can. As far as we know, deflate() has more output for us. So we call it again. But now deflate() produces no output at all, and avail_out remains unchanged as CHUNK. That deflate() call wasn't able to do anything, either consume input or produce output, and so it returns Z_BUF_ERROR. (See, I told you I'd cover this later.) However this is not a problem at all. Now we finally have the desired indication that deflate() is really done, and so we drop out of the inner loop to provide more input to deflate().
With flush set to Z_FINISH, this final set of deflate() calls will complete the output stream. Once that is done, subsequent calls of deflate() would return Z_STREAM_ERROR if the flush parameter is not Z_FINISH, and do no more processing until the state is reinitialized.
Some applications of zlib have two loops that call deflate() instead of the single inner loop we have here. The first loop would call without flushing and feed all of the data to deflate(). The second loop would call deflate() with no more data and the Z_FINISH parameter to complete the process. As you can see from this example, that can be avoided by simply keeping track of the current flush state.
} while (strm.avail_out == 0); assert(strm.avail_in == 0); /* all input will be used */
/* done when last data in file processed */ } while (flush != Z_FINISH); assert(ret == Z_STREAM_END); /* stream will be complete */
/* clean up and return */ (void)deflateEnd(&strm); return Z_OK; }
/* Decompress from file source to file dest until stream ends or EOF. inf() returns Z_OK on success, Z_MEM_ERROR if memory could not be allocated for processing, Z_DATA_ERROR if the deflate data is invalid or incomplete, Z_VERSION_ERROR if the version of zlib.h and the version of the library linked do not match, or Z_ERRNO if there is an error reading or writing the files. */ int inf(FILE *source, FILE *dest) {
int ret; unsigned have; z_stream strm; unsigned char in[CHUNK]; unsigned char out[CHUNK];
Here avail_in is set to zero and next_in is set to Z_NULL to indicate that no input data is being provided.
/* allocate inflate state */ strm.zalloc = Z_NULL; strm.zfree = Z_NULL; strm.opaque = Z_NULL; strm.avail_in = 0; strm.next_in = Z_NULL; ret = inflateInit(&strm); if (ret != Z_OK) return ret;
/* decompress until deflate stream ends or end of file */ do {
strm.avail_in = fread(in, 1, CHUNK, source); if (ferror(source)) { (void)inflateEnd(&strm); return Z_ERRNO; } if (strm.avail_in == 0) break; strm.next_in = in;
/* run inflate() on input until output buffer not full */ do {
Advanced applications may use deflateSetDictionary() to prime deflate() with a set of likely data to improve the first 32K or so of compression. This is noted in the zlib header, so inflate() requests that that dictionary be provided before it can start to decompress. Without the dictionary, correct decompression is not possible. For this routine, we have no idea what the dictionary is, so the Z_NEED_DICT indication is converted to a Z_DATA_ERROR.
inflate() can also return Z_STREAM_ERROR, which should not be possible here, but could be checked for as noted above for def(). Z_BUF_ERROR does not need to be checked for here, for the same reasons noted for def(). Z_STREAM_END will be checked for later.
ret = inflate(&strm, Z_NO_FLUSH); assert(ret != Z_STREAM_ERROR); /* state not clobbered */ switch (ret) { case Z_NEED_DICT: ret = Z_DATA_ERROR; /* and fall through */ case Z_DATA_ERROR: case Z_MEM_ERROR: (void)inflateEnd(&strm); return ret; }
have = CHUNK - strm.avail_out; if (fwrite(out, 1, have, dest) != have || ferror(dest)) { (void)inflateEnd(&strm); return Z_ERRNO; }
} while (strm.avail_out == 0);
/* done when inflate() says it's done */ } while (ret != Z_STREAM_END);
/* clean up and return */ (void)inflateEnd(&strm); return ret == Z_STREAM_END ? Z_OK : Z_DATA_ERROR; }
zerr() is used to interpret the possible error codes from def() and inf(), as detailed in their comments above, and print out an error message. Note that these are only a subset of the possible return values from deflate() and inflate().
/* report a zlib or i/o error */ void zerr(int ret) { fputs("zpipe: ", stderr); switch (ret) { case Z_ERRNO: if (ferror(stdin)) fputs("error reading stdin\n", stderr); if (ferror(stdout)) fputs("error writing stdout\n", stderr); break; case Z_STREAM_ERROR: fputs("invalid compression level\n", stderr); break; case Z_DATA_ERROR: fputs("invalid or incomplete deflate data\n", stderr); break; case Z_MEM_ERROR: fputs("out of memory\n", stderr); break; case Z_VERSION_ERROR: fputs("zlib version mismatch!\n", stderr); } }
/* compress or decompress from stdin to stdout */ int main(int argc, char **argv) { int ret; /* avoid end-of-line conversions */ SET_BINARY_MODE(stdin); SET_BINARY_MODE(stdout); /* do compression if no arguments */ if (argc == 1) { ret = def(stdin, stdout, Z_DEFAULT_COMPRESSION); if (ret != Z_OK) zerr(ret); return ret; } /* do decompression if -d specified */ else if (argc == 2 && strcmp(argv[1], "-d") == 0) { ret = inf(stdin, stdout); if (ret != Z_OK) zerr(ret); return ret; } /* otherwise, report usage */ else { fputs("zpipe usage: zpipe [-d] < source > dest\n", stderr); return 1; } }