Niels Möller [Tue, 8 Feb 2011 20:07:06 +0000 (21:07 +0100)]
* gcm.c (gcm_gf_shift): Added a separate result argument.
(gcm_gf_mul): Compile bitwise version only when GCM_TABLE_BITS ==
0. Simplified interface with just two arguments pointing to
complete blocks.
(gcm_gf_shift_4, gcm_gf_shift_8): Renamed table-based functions, from...
(gcm_gf_shift_chunk): ... old name.
(gcm_gf_mul): Renamed both table-based versions and made the
argument types compatible with the bitwise gcm_gf_mul.
(gcm_gf_mul_chunk): ... the old name.
(gcm_set_key): Initialize the table using adds and shifts only.
When GCM_TABLE_BITS > 0, this eliminates the only use of the
bitwise multiplication.
(gcm_hash): Simplified, now that we have the same interface for
gcm_gf_mul, regardless of table size.
Niels Möller [Tue, 8 Feb 2011 11:20:38 +0000 (12:20 +0100)]
* gcm.c (GHASH_POLYNOMIAL): Use unsigned long for this constant.
(gcm_gf_shift_chunk): Fixed bugs for the big endian 64-bit case,
e.g., sparc64. For both 4-bit and 8-bit tables.
Niels Möller [Mon, 7 Feb 2011 20:33:10 +0000 (21:33 +0100)]
* gcm.c (gcm_gf_mul_chunk): Special case first and last iteration.
(gcm_gf_add): New function, a special case of memxor. Use it for
all memxor calls with word-aligned 16 byte blocks. Improves
performance to 152 cycles/byte with no tables, 28 cycles per byte
with 4-bit tables and 10.5 cycles per byte with 8-bit tables.
Niels Möller [Sun, 6 Feb 2011 21:02:16 +0000 (22:02 +0100)]
Introduced 4-bit tables. Gives gmac performance of 45 cycles per
byte (still on intel x86_64).
* gcm.c (gcm_gf_shift): Renamed. Tweaked little-endian masks.
(gcm_rightshift): ... old name.
(gcm_gf_mul): New argument for the output. Added length argument
for one of the inputs (implicitly padding with zeros).
(shift_table): New table (in 4-bit and 8-bit versions), generated
by gcmdata.
(gcm_gf_shift_chunk): New function shifting 4 bits at
a time.
(gcm_gf_mul_chunk): New function processing 4 bits at a time.
(gcm_set_key): Generation of 4-bit key table.
(gcm_hash): Use tables, when available.
Niels Möller [Sun, 6 Feb 2011 17:15:04 +0000 (18:15 +0100)]
* gcm.c (gcm_rightshift): Moved the reduction of the shifted out
bit here.
(gcm_gf_mul): Updated for gcm_rightshift change. Improves gmac
performance to 181 cycles/byte.
Niels Möller [Sun, 6 Feb 2011 14:41:09 +0000 (15:41 +0100)]
(gcm_gf_mul): Rewrote. Still uses the bitwise algorithm from the
specification, but with separate byte and bit loops. Improves gmac
performance a bit further, to 227 cycles/byte.
Niels Möller [Sun, 6 Feb 2011 14:07:33 +0000 (15:07 +0100)]
(gcm_rightshift): Complete rewrite, to use word rather
than byte operations. Improves gmac performance from 830 cycles /
byte to (still poor) 268 cycles per byte on intel x86_64.
Niels Möller [Fri, 26 Nov 2010 22:25:02 +0000 (23:25 +0100)]
Reapplied optimizations (150% speedup on x86_32) and other fixes,
relicensing them as LGPL.
* blowfish.c (do_encrypt): Renamed, to...
(encrypt): ...new name.
(F): Added context argument. Shift input explicitly, instead of
reading individual bytes via memory.
(R): Added context argument.
(encrypt): Deleted a bunch of local variables. Using the context
pointer for everything should consume less registers.
(decrypt): Likewise.
(initial_ctx): Arrange constants into a struct, to simplify key setup.
(blowfish_set_key): Some simplification.
Niels Möller [Wed, 6 Oct 2010 20:53:59 +0000 (22:53 +0200)]
(memxor3): Optimized.
(memxor3_common_alignment): New function.
(memxor3_different_alignment_b): New function.
(memxor3_different_alignment_ab): New function.
(memxor3_different_alignment_all): New function.
Niels Möller [Wed, 6 Oct 2010 19:19:23 +0000 (21:19 +0200)]
(overhead): New global variable.
(time_function): Compensate for call overhead.
(bench_nothing, time_overhead): New functions.
(time_memxor): Tweaked src size, making it an integral number of
words.
(main): Call time_overhead.
(memxor_common_alignment): New function.
(memxor_different_alignment): New function.
(memxor): Optimized to do word-operations rather than byte
operations.
Partial revert of 2010-09-20 changes.
* camellia-set-encrypt-key.c (camellia_set_encrypt_key):
Reintroduce CAMELLIA_F_HALF_INV, for 32-bit machines.
* camellia-crypt-internal.c (CAMELLIA_ROUNDSM): Two variants,
differing in where addition of the key is done.
* x86/camellia-crypt-internal.asm: Moved addition of key.
(BENCH_INTERVAL): Reduced to 0.1 s.
(struct bench_memxor_info): New struct.
(bench_memxor): New function.
(time_memxor): New function.
(main): Use time_memxor. Added optional argument used to limit the
algorithms being benchmarked.
(BENCH_INTERVAL): Changed unit to
seconds.
(time_function): Use clock_gettime with CLOCK_PROCESS_CPUTIME_ID,
if available. This gives better accuracy, at least on recent
linux.
(CAMELLIA_F_HALF_INV): Deleted macro.
(camellia_set_encrypt_key): Deleted the CAMELLIA_F_HALF_INV
operations intended for moving the key xor into the middle of the
round.