From: Shuxin Yang Date: Sun, 20 Apr 2014 22:50:33 +0000 (-0700) Subject: Minor enhancement to put_short() macro. This change saw marginal speedup X-Git-Tag: 1.9.9-b1~933 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=666581bbc17eaea62acb47cf47ab35a037e9b9d0;p=thirdparty%2Fzlib-ng.git Minor enhancement to put_short() macro. This change saw marginal speedup (about 0% to 3% depending on the compression level and input). I guess the speedup likely arises from following facts: 1) "s->pending" now is loaded once, and stored once. In the original implementation, it needs to be loaded and stored twice as the compiler isn't able to disambiguate "s->pending" and "s->pending_buf[]" 2) better code generations: 2.1) no instruction are needed for extracting two bytes from a short. 2.2) need less registers 2.3) stores to adjacent bytes are merged into a single store, albeit at the cost of penalty of potentially unaligned access. Conflicts: trees.c --- diff --git a/deflate.h b/deflate.h index 5cf2ce126..28140d2ec 100644 --- a/deflate.h +++ b/deflate.h @@ -291,6 +291,30 @@ typedef enum { */ #define put_byte(s, c) {s->pending_buf[s->pending++] = (c);} +/* =========================================================================== + * Output a short LSB first on the stream. + * IN assertion: there is enough room in pendingBuf. + */ +#if defined(__x86_64) || defined(__i386_) +/* Compared to the else-clause's implementation, there are few advantages: + * - s->pending is loaded only once (else-clause's implementation needs to + * load s->pending twice due to the alias between s->pending and + * s->pending_buf[]. + * - no instructions for extracting bytes from short. + * - needs less registers + * - stores to adjacent bytes are merged into a single store, albeit at the + * cost of penalty of potentially unaligned access. + */ +#define put_short(s, w) { \ + s->pending += 2; \ + *(ush*)(&s->pending_buf[s->pending - 2]) = (w) ; \ +} +#else +#define put_short(s, w) { \ + put_byte(s, (uch)((w) & 0xff)); \ + put_byte(s, (uch)((ush)(w) >> 8)); \ +} +#endif #define MIN_LOOKAHEAD (MAX_MATCH+MIN_MATCH+1) /* Minimum amount of lookahead, except at the end of the input file.