Re: [PATCH v2 4/6] gzip: refactor state tracking

Andrew Cooper Wed, 17 Apr 2024 11:45:58 -0700

On 17/04/2024 3:37 pm, Daniel P. Smith wrote:
> diff --git a/xen/common/gzip/gunzip.c b/xen/common/gzip/gunzip.c
> index 1b448d6e3655..8178a05a0190 100644
> --- a/xen/common/gzip/gunzip.c
> +++ b/xen/common/gzip/gunzip.c
> @@ -4,18 +4,25 @@
>  #include <xen/lib.h>
>  #include <xen/mm.h>
>  
> -static unsigned char *__initdata window;
> -
>  #define WSIZE           0x80000000U
>  
> -static unsigned char *__initdata inbuf;
> -static unsigned int __initdata insize;
> +struct gzip_state {
> +    unsigned char *window;
> +
> +    unsigned char *inbuf;
> +    unsigned int insize;
> +
> +    /* Index of next byte to be processed in inbuf: */
> +    unsigned int inptr;


perform_gunzip() passes in an unsigned long size, which means it's
truncated in this state.

Please at least make these size_t.  Life is too short to deal with the
fallout of this going wrong.

>  
> -/* Index of next byte to be processed in inbuf: */
> -static unsigned int __initdata inptr;
> +    /* Bytes in output buffer: */
> +    unsigned int outcnt;

See later, but I think the comment for outcnt is wrong.

>  
> -/* Bytes in output buffer: */
> -static unsigned int __initdata outcnt;
> +    long bytes_out;

See later, but I don't think this can be signed.  It's only used to
cross-check the header-reported length, which is done as an unsigned long.

> +
> +    unsigned long bb;      /* bit buffer */
> +    unsigned bk;           /* bits in bit buffer */

As this is effectively new code, "unsigned int" please.

> @@ -27,7 +34,7 @@ typedef unsigned char   uch;
>  typedef unsigned short  ush;
>  typedef unsigned long   ulg;
>  
> -#define get_byte()      (inptr < insize ? inbuf[inptr++] : fill_inbuf())
> +#define get_byte()     (s->inptr < s->insize ? s->inbuf[s->inptr++] : 
> fill_inbuf())

This is a bit weird:
> $ git grep -w get_byte common/gzip
> common/gzip/gunzip.c:40:#define get_byte()     (s->inptr < s->insize ?
> s->inbuf[s->inptr++] : fill_inbuf())
> common/gzip/inflate.c:224:#define NEXTBYTE(s)  ({ int v = get_byte();
> if (v < 0) goto underrun; (uch)v; })
> common/gzip/inflate.c:1131:    flags  = (uch)get_byte();

NEXTBYTE() gets an s parameter, but doesn't pass it to get_byte() and
instead relies on state always having the name 's' in scope.

I'd suggest turning get_byte() into a proper static function.  It will
be more readable that throwing extra ()'s around s in the existing macro.

Except...  fill_inbuf() is a fatal error anyway, so that can be folded
together to simplify the result.


> @@ -72,16 +78,16 @@ static __init void flush_window(void)
>      unsigned int n;
>      unsigned char *in, ch;
>  
> -    in = window;
> -    for ( n = 0; n < outcnt; n++ )
> +    in = s->window;
> +    for ( n = 0; n < s->outcnt; n++ )
>      {
>          ch = *in++;
>          c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
>      }
>      crc = c;
>  
> -    bytes_out += (unsigned long)outcnt;
> -    outcnt = 0;
> +    s->bytes_out += (unsigned long)s->outcnt;

I can't figure out why this was written this way, even originally.

AFAICT, outcnt doesn't even match it's comment.

/* Bytes in output buffer: */

It looks like it's the number of bytes in window.  This also matches the
"wp" define which I guess is "window position".

Anyway, irrespective of naming, the cast is useless so lets drop it
while modifying the line.

> diff --git a/xen/common/gzip/inflate.c b/xen/common/gzip/inflate.c
> index 512d9bf0ee2e..5735bbcf7eb4 100644
> --- a/xen/common/gzip/inflate.c
> +++ b/xen/common/gzip/inflate.c
> @@ -119,7 +119,7 @@ static char rcsid[] = "#Id: inflate.c,v 0.14 1993/06/10 
> 13:27:04 jloup Exp #";
>  
>  #endif /* !__XEN__ */
>  
> -#define slide window
> +#define slide s->window

Given only 8 uses, I'd expand this in place and drop the macro.

But, there's an entire comment block discussing "slide" which I think is
wrong for Xen's implementation.  That wants to go too, I think.

>  
>  /*
>   * Huffman code lookup table entry--this entry is four bytes for machines
> @@ -143,12 +143,12 @@ struct huft {
>  static int huft_build OF((unsigned *, unsigned, unsigned,
>                            const ush *, const ush *, struct huft **, int *));
>  static int huft_free OF((struct huft *));
> -static int inflate_codes OF((struct huft *, struct huft *, int, int));
> -static int inflate_stored OF((void));
> -static int inflate_fixed OF((void));
> -static int inflate_dynamic OF((void));
> -static int inflate_block OF((int *));
> -static int inflate OF((void));
> +static int inflate_codes OF((struct gzip_state *, struct huft *, struct huft 
> *, int, int));
> +static int inflate_stored OF((struct gzip_state *));
> +static int inflate_fixed OF((struct gzip_state *));
> +static int inflate_dynamic OF((struct gzip_state *));
> +static int inflate_block OF((struct gzip_state *, int *));
> +static int inflate OF((struct gzip_state *));

These are the only users of OF, and it turns out it's a no-op macro. 
This code would be far-less WTF-worthy if it was expanded as part of
patch 1.

>  
>  /*
>   * The inflate algorithm uses a sliding 32 K byte window on the uncompressed
> @@ -162,8 +162,8 @@ static int inflate OF((void));
>   * must be in unzip.h, included above.
>   */
>  /* unsigned wp;             current position in slide */
> -#define wp outcnt
> -#define flush_output(w) (wp=(w),flush_window())
> +#define wp s->outcnt
> +#define flush_output(s, w) (wp=(w),flush_window(s))

The combination of these two isn't safe, I don't think, especially as
one caller passes wp into flush_output().

There are very few users of wp, so expand it in line, and turn
flush_output() into a static function.  In particular, it will make ...

> @@ -560,8 +557,8 @@ static int __init inflate_codes(
>  
>  
>      /* make local copies of globals */
> -    b = bb;                       /* initialize bit buffer */
> -    k = bk;
> +    b = s->bb;                   /* initialize bit buffer */
> +    k = s->bk;
>      w = wp;                       /* initialize window position */

... this look less weird.  I'd suggest dropping the remark about "globals".


> @@ -1157,18 +1157,18 @@ static int __init gunzip(void)
>          error("Input has invalid flags");
>          return -1;
>      }
> -    NEXTBYTE(); /* Get timestamp */
> -    NEXTBYTE();
> -    NEXTBYTE();
> -    NEXTBYTE();
> +    NEXTBYTE(s); /* Get timestamp */
> +    NEXTBYTE(s);
> +    NEXTBYTE(s);
> +    NEXTBYTE(s);
>  
> -    (void)NEXTBYTE();  /* Ignore extra flags for the moment */
> -    (void)NEXTBYTE();  /* Ignore OS type for the moment */
> +    (void)NEXTBYTE(s);  /* Ignore extra flags for the moment */
> +    (void)NEXTBYTE(s);  /* Ignore OS type for the moment */

I'd drop these (void) casts. They're not different to the timestamp above.

> @@ -1224,13 +1224,13 @@ static int __init gunzip(void)
>          error("crc error");
>          return -1;
>      }
> -    if (orig_len != bytes_out) {
> +    if (orig_len != s->bytes_out) {
>          error("length error");
>          return -1;
>      }
>      return 0;
>  
> - underrun:   /* NEXTBYTE() goto's here if needed */
> + underrun:   /* NEXTBYTE(s) goto's here if needed */
>      error("out of input data");
>      return -1;

Just for completeness, this is a dead path because fill_inbuf() is
implemented as panic().

Fixing this wants putting on a todo list somewhere.

~Andrew

Re: [PATCH v2 4/6] gzip: refactor state tracking

Reply via email to