On 2021/11/01 10:36, Scott Cheloha wrote:
> How did you generate this input?  Is it just ten million lines with a
> single 'z' character?  `jot -bz 10000000`?

That one was lots of copies of ports/infrastructure/bsd.port.mk catted together.

> Updated patch.
> 
> I screwed up.  We don't need to free(3) prevline when we print a line.
> We can just swap the pointers (and now, sizes) as we did before and
> let getline(3) handle reallocation.

Aha! That's a lot better and restores performance to how it was before.
On a different machine so times aren't directly comparable with the
previous, but:

$  hyperfine 'uniq /tmp/z' 'obj/uniq /tmp/z'
Benchmark 1: uniq /tmp/z
  Time (mean ± σ):      2.564 s ±  0.105 s    [User: 2.380 s, System: 0.173 s]
  Range (min … max):    2.423 s …  2.748 s    10 runs
 
Benchmark 2: obj/uniq /tmp/z
  Time (mean ± σ):      2.575 s ±  0.141 s    [User: 2.350 s, System: 0.214 s]
  Range (min … max):    2.478 s …  2.940 s    10 runs
 
Summary
  'uniq /tmp/z' ran
    1.00 ± 0.07 times faster than 'obj/uniq /tmp/z'

Which I think is pretty much a noop in terms of time.

OK sthen@



> Index: uniq.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/uniq/uniq.c,v
> retrieving revision 1.27
> diff -u -p -r1.27 uniq.c
> --- uniq.c    31 Jul 2018 02:55:57 -0000      1.27
> +++ uniq.c    1 Nov 2021 15:30:55 -0000
> @@ -45,8 +45,6 @@
>  #include <wchar.h>
>  #include <wctype.h>
>  
> -#define      MAXLINELEN      (8 * 1024)
> -
>  int cflag, dflag, iflag, uflag;
>  int numchars, numfields, repeats;
>  
> @@ -59,10 +57,10 @@ __dead void       usage(void);
>  int
>  main(int argc, char *argv[])
>  {
> -     char *t1, *t2;
> +     char *prevline, *t1, *t2, *thisline;
>       FILE *ifp = NULL, *ofp = NULL;
> +     size_t prevsize, thissize, tmpsize;
>       int ch;
> -     char *prevline, *thisline;
>  
>       setlocale(LC_CTYPE, "");
>  
> @@ -133,15 +131,18 @@ main(int argc, char *argv[])
>       if (pledge("stdio", NULL) == -1)
>               err(1, "pledge");
>  
> -     prevline = malloc(MAXLINELEN);
> -     thisline = malloc(MAXLINELEN);
> -     if (prevline == NULL || thisline == NULL)
> -             err(1, "malloc");
> -
> -     if (fgets(prevline, MAXLINELEN, ifp) == NULL)
> +     prevline = NULL;
> +     prevsize = 0;
> +     if (getline(&prevline, &prevsize, ifp) == -1) {
> +             free(prevline);
> +             if (ferror(ifp))
> +                     err(1, "getline");
>               exit(0);
> -
> -     while (fgets(thisline, MAXLINELEN, ifp)) {
> +     }
> +     
> +     thisline = NULL;
> +     thissize = 0;
> +     while (getline(&thisline, &thissize, ifp) != -1) {
>               /* If requested get the chosen fields + character offsets. */
>               if (numfields || numchars) {
>                       t1 = skip(thisline);
> @@ -157,11 +158,20 @@ main(int argc, char *argv[])
>                       t1 = prevline;
>                       prevline = thisline;
>                       thisline = t1;
> +                     tmpsize = prevsize;
> +                     prevsize = thissize;
> +                     thissize = tmpsize;
>                       repeats = 0;
>               } else
>                       ++repeats;
>       }
> +     free(thisline);
> +     if (ferror(ifp))
> +             err(1, "getline");
> +
>       show(ofp, prevline);
> +     free(prevline);
> +
>       exit(0);
>  }
>  
> 

Reply via email to