Source: gnutls28 Version: 3.3.8-6+deb8u1 Severity: normal Tags: patch Dear Maintainer,
After upgrading a server with a VIA C3 (Nehemiah) processor to jessie, exim4 started to crash on pretty much every connection that negotiated AES-128-CBC or AES-256-CBC on TLS: > 2015-05-31T16:06:43.641909+02:00 info kernel: [ 3466.879332] exim4[2381]: > segfault at 4aeb2453 ip b6a4a5e2 sp bf7bfde0 error 4 in > libgnutls-deb0.so.28.41.0[b6996000+13a000] There's nothing in the exim4 log; it crashes before anything is logged. The problem is not limited to exim4: I could reproduce the crash with gnutls-cli. This is also how I discovered AES-CBC was to blame. It turns out that lib/accelerated/x86/elf/e_padlock-x86.s segfaults in the function padlock_cbc_encrypt() because it has not been written to handle the case where it is requested to encrypt 0 bytes of data; it expects a strictly positive number. Yet, it is called with a length of 0. The function overwrites 512 bytes on the top of its stack, and then dereferences a pointer in the overwritten data. The attached patch simply checks for a 0 length in the C functions that call the handwritten assembly PadLock AES encryption routines, so the call is avoided. The patch fixed the issue on my system. All the functions padlock_{ecb,cbc,cfb,ofb,ctr32}_encrypt() share almost all their code. The AES-CGM ciphers in GnuTLS don't seem to call the function with a 0 length, but my patch also checks for 0 in that cipher mode on the basis "once bitten, twice shy". The rest of the mail is a detailed description of the problem with length 0. I don't include a "steps to reproduce this" because it's likely you don't have access to a system with a VIA PadLock engine. But it's trivial to reproduce with gnutls-cli and an SMTP server to connect to. To my surprise, e_padlock-x86.s is a generated file[4]; the source is not included with GnuTLS. The source is the file engines/asm/e_padlock-x86.pl in openssl; from the Git repository for GnuTLS version 3.3.8, it can be concluded that the source is commit 34ccd24 in the OpenSSL Git. First, a small aside on return values. OpenSSL checks the return value (an int) of padlock_cbc_encrypt(), and indeed the function can abort if the struct padlock_cipher_data is not aligned on a 16-byte boundary, or if len is not a multiple of 16 (AES block size). But there's something odd: /if/ the function aborts, the return value (EAX) is not set; it is kept at the value it was when the function was called. On a succesful return, it's set to 1. GnuTLS never checks the return value. If it is possible that the length is not a multiple of 16, GnuTLS will not notice that the padlock_cbc_encrypt() call did nothing (unless it does some checks on the data later). I analysed the behaviour of padlock_cbc_encrypt() with GDB. During my debugging, the input and output data were never aligned on a 16-byte boundary. padlock_cbc_encrypt() checks for this because not all VIA processors can cope with this, or only at a tremendous speed penalty. To fix the alignment, an aligned block of room is allocated on the stack, and the data is copied there, encrypted in place, and copied to the destination. This part of the program is the problematic part: File openssl/engines/asm/e_padlock-x86.pl, line 204: ----------------------- 8< ------------------ >8 ----------------------- 204: &cmp ($len,$chunk); 205: &cmovc ($chunk,$len); # chunk=len>PADLOCK_CHUNK?PADLOCK_CHUNK:len 206: &and ("eax",$chunk); # out_misaligned?chunk:0 207: &mov ($chunk,$len); 208: &neg ("eax"); 209: &and ($chunk,$PADLOCK_CHUNK-1); # chunk=len%PADLOCK_CHUNK 210: &lea ("esp",&DWP(0,"eax","ebp")); # alloca 211: &mov ("eax",$PADLOCK_CHUNK); 212: &cmovz ($chunk,"eax"); # chunk=chunk?:PADLOCK_CHUNK ----------------------- 8< ------------------ >8 ----------------------- [1] At the start, EAX is all ones (binary), signifying the output is not aligned on a 16-byte boundary. Room is allocated on the stack for the data in line 210. Either `len` bytes or PADLOCK_CHUNK (512), whichever is less. The purpose of `chunk` is to take up to PADLOCK_CHUNK bytes from `len` to process. If `len` is not a multiple of 512, the remainder is processed first. After the remainder is processed, either a multiple of PADLOCK_CHUNK still needs to be processed, or we're done. To this end, it ANDs `len` with 511 in line 209. If `len` is a multiple of 512, the result is zero, so a full PADLOCK_CHUNK is processed (line 212). But if `len` was already zero to begin with, this test produces the wrong result, and results in `chunk` also being set to 512, to process 512 bytes of data. The confusion comes now. The code has just set aside 0 bytes on the stack for copying data. Then it copies 512 bytes of data from the input pointer to the stack (in line 265 of the source[2]), thereby overwriting the other information it is keeping on the stack. For good measure, this is then encrypted as well :). And here is where the SIGSEGV occurs: File openssl/engines/asm/e_padlock-x86.pl, line 279: ----------------------- 8< ------------------ >8 ----------------------- 279: &mov ($out,&DWP(0,"ebp")); # restore parameters 280: &mov ($chunk,&DWP(12,"ebp")); [...] 292: &test ($out,0x0f); 293: &jz (&label("${mode}_out_aligned")); 294: &mov ($len,$chunk); 295: &lea ($inp,&DWP(0,"esp")); 296: &shr ($len,2); 297: &data_byte(0xf3,0xa5); # rep movsl ----------------------- 8< ------------------ >8 ----------------------- [3] `out` and `chunk` are loaded with bogus data; it is from the part of the stack that was overwritten and subsequently encrypted. In line 297, the 'rep movsl' copies `len` = `chunk` bytes of data from the stack to the address `out`, but `chunk` and `out` are both garbage, resulting in a segmentation fault. That concludes my analysis. With regards, Peter. -- System Information: Debian Release: 8.1 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: i386 (i686) Kernel: Linux 3.16.0-4-586 Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) $ cat /proc/cpuinfo processor : 0 vendor_id : CentaurHauls cpu family : 6 model : 9 model name : VIA Nehemiah stepping : 10 cpu MHz : 999.583 cache size : 64 KB fdiv_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr cx8 sep mtrr pge cmov pat mmx fxsr sse rng rng_en ace ace_en bogomips : 1999.16 clflush size : 32 cache_alignment : 32 address sizes : 32 bits physical, 32 bits virtual power management: [1] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l204 [2] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l265 [3] https://git.openssl.org/?p=openssl.git;a=blob;f=engines/asm/e_padlock-x86.pl;h=4148468c41de695751e8731369a948dff171c1ca;hb=34ccd24d0e6#l279 [4] In my opinion, this is not in the spirit of the stipulation "the preferred form of the work for making modifications to it" of the (L)GPL license GnuTLS is released under. For one thing, all comments explaining the code are not in the generated output. -- I use the GNU Privacy Guard (GnuPG) in combination with Enigmail. You can send me encrypted mail if you want some privacy. My key is available at <http://digitalbrains.com/2012/openpgp-key-peter>
--- gnutls28-3.3.8.orig/lib/accelerated/x86/aes-gcm-padlock.c +++ gnutls28-3.3.8/lib/accelerated/x86/aes-gcm-padlock.c @@ -52,6 +52,9 @@ static void padlock_aes_encrypt(void *_c struct padlock_ctx *ctx = _ctx; struct padlock_cipher_data *pce; + if (length == 0) + return; + pce = ALIGN16(&ctx->expanded_key); padlock_ecb_encrypt(dst, src, pce, length); --- gnutls28-3.3.8.orig/lib/accelerated/x86/aes-padlock.c +++ gnutls28-3.3.8/lib/accelerated/x86/aes-padlock.c @@ -130,6 +130,9 @@ padlock_aes_cbc_encrypt(void *_ctx, cons struct padlock_ctx *ctx = _ctx; struct padlock_cipher_data *pce; + if (src_size == 0) + return 0; + pce = ALIGN16(&ctx->expanded_key); padlock_cbc_encrypt(dst, src, pce, src_size); @@ -145,6 +148,9 @@ padlock_aes_cbc_decrypt(void *_ctx, cons struct padlock_ctx *ctx = _ctx; struct padlock_cipher_data *pcd; + if (src_size == 0) + return 0; + pcd = ALIGN16(&ctx->expanded_key); padlock_cbc_encrypt(dst, src, pcd, src_size);