On Sat, Jun 06, 2020 at 03:51:58PM -0700, Jordan Geoghegan wrote:
> Hello,
> 
> I was hoping the fine folks here could give me a quick sanity check, I'm by
> no means an awk guru, so I'm likely missing something obvious. I wanted to
> ask here quickly before I started flapping my gums on bugs@.
> 
> I'm working on a simple awk snippet to convert the IP range data listed in
> the Extended Delegation Statistics data from ARIN [1] and convert it into
> CIDR blocks. I have a snippet that works perfectly fine on mawk and gawk,
> but not on the base system awk. I'm 99% sure I'm not using any GNUisms, as
> when I break the command up into two parts, it works perfectly.
> 
> The snippet below does not work with base awk, but does work with gawk and
> mawk: (Running on 6.6 -stable system)
> 
>   awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4,
> 32-log($5)/log(2))}' delegated-arin-extended-latest.txt
> 
> 
> The command does output data, but it also throws errors for certain lines:
> 
>   awk: log result out of range
>   input record number 94027, file delegated-arin-extended-latest.txt
>   source line number 1
> 
> Most CIDR blocks are calculated correctly, but about 10% of them have errors
> (ie something that should calculated to be a /24 is instead calculated to be
> a /30).
> 
> However, when I break it up into two parts, it produces the expected output:
> 
>   awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") print($4, $5)}'
> delegated-arin-extended-latest.txt | awk  '{printf("%s/%d\n", $1,
> 32-log($2)/log(2)) }'
> 
> As you can see, the same number of lines are printed, but the hashes are
> different.
> 
>   luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n",
> $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
>      56446
>   luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n",
> $4, 32-log($5)/log(2))}' delegated-*-latest.txt | wc -l
>      56446
>   luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4,
> 32-log($5)/log(2))}' delegated-*-latest.txt 2>/dev/null | wc -l
>      56446
> 
>   luna$ awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4,
> 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null | md5
>     6f549bbc0799bc202c12695f8530d1df
>   luna$ gawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n",
> $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null |
> md5
>     40c28b8ebfd2796e1ae15d9f6401c0c1
>   luna$ mawk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n",
> $4, 32-log($5)/log(2))}' delegated-arin-extended-latest.txt 2>/dev/null |
> md5
>     40c28b8ebfd2796e1ae15d9f6401c0c1
> 
> 
> Example of the differences:
> 
> --- mawk.txt    Sat Jun  6 18:43:30 2020
> +++ awk.txt     Sat Jun  6 18:43:38 2020
> @@ -29,7 +29,7 @@
>  9.64.0.0/10
>  9.128.0.0/9
>  11.0.0.0/8
> -12.0.0.0/8
> +12.0.0.0/30
>  13.0.0.0/11
>  13.32.0.0/12
>  13.48.0.0/14
> @@ -415,7 +415,7 @@
>  23.90.64.0/20
>  23.90.80.0/21
>  23.90.88.0/22
> -23.90.92.0/22
> +23.90.92.0/30
>  23.90.96.0/19
>  23.91.0.0/19
>  23.91.32.0/19
> @@ -545,8 +545,8 @@
>  23.133.224.0/24
>  23.133.240.0/24
>  23.134.0.0/24
> -23.134.16.0/24
> -23.134.17.0/24
> +23.134.16.0/30
> +23.134.17.0/30
> 
> 
> Any insight or advice would be much appreciated.
> 
> Regards,
> 
> Jordan
> 
> [1] https://ftp.arin.net/pub/stats/arin/delegated-arin-extended-latest
> 
> 

I have no idea about what is going on, but FWIW I can reproduce this on
i386 6.7-stable and amd64 6.7-current (well, current-ish, #232).
Truncating the file to a single offending line produces the same result:
log($5) is out of range.

It appears to have something to do with the last field.  Removing it or
changing some of its characters seems to work, e.g.:

arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5e58386636aa775c2106140445cf2c30
arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5a58386636aa775c2106140445cf2c30
                                                    ^
Fails on the first line but works on the second.

-- 
 

Reply via email to