Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

Caronte Estigia Fri, 23 Feb 2018 00:58:16 -0800

Good morning Santiago.
You're right, I upgraded from jessie to stretch and grep package is:
ii  grep                                  2.27-2                                
      amd64        GNU grep, egrep and fgrep
I have a text file, identified as a html document by "file" command which only 
contains (from what I can see on the file) text characters. In that file there 
are numerous strings containing "2018", but when I use grep to find that string 
I get:
          <li><a href="/diario_boe/calendarios.php?a=2018">Calendario</a></li>
<li><a href="/boe/dias/2018/02/21/index.php?s=1"><span 
class="linkBack">anterior</span></a></li>
<li><a href="/boe/dias/2018/02/23/index.php?s=1"><span 
class="linkFwd">siguiente</span></a></li>
                  <p><strong>Sumario</strong> BOE-S-2018-47:</p>
                    <a href="/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf" 
title="BOE-S-2018-47 en formato PDF firmado " onclick="javascript: 
pageTracker._trackPageview('/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf');">PDF</a>
                    <a href="/diario_boe/xml.php?id=BOE-S-20180222" 
title="Sumario jueves 22 de febrero de 2018 como documento XML">XML</a>
                <li><a 
href="../../../../../boe_n/dias/2018/02/22/index.php?d=47&amp;s=N">Notificaciones</a></li>
--->Coincidencia en el fichero binario ayer.html<----


Using previous grep version all strings were found, but now if I want grep to 
work as before I need to use "grep -a".
I guess the previous version of grep took "-a" behaviour as the default one, 
which treated all files as text unless specified otherwise (which in my opinion 
is the right way to go), I can't happen to see the security issues in this 
behaviour and how those security issues dissapear if I specify the "-a" 
parameter. Looks to me (without reviewing grep's code) that it is trying to 
identify what kind of file it is checking while searching the file (a couple of 
lines are found before the binary message), and I guess it shouldn't do that. I 
think it just have to treat files as text unless specified otherwise with the  
--binary-files parameter.
Regards.Francisco
 

    El Jueves 22 de febrero de 2018 15:33, Santiago R.R. 
<santiag...@riseup.net> escribió:
 

 El 22/02/18 a las 11:18, rodrifra escribió:
> Package: grep
> Version: 2.27-2
> Severity: normal
> 
> Dear Maintainer,
> 
> 
>    * What led up to the situation?
> 
>    Scripts working with grep stopped working after the update. No patterns 
>where detected ant the message informing of coincidences in the binary file 
>was displayed. The file is a downloaded html and "file" command returns:
>    
>    selecc.html: HTML document, ISO-8859 text, with CRLF, LF line terminators
> 
>    * What exactly did you do (or not do) that was effective (or
>      ineffective)?
> 
>    Explicitly indicating grep to treat the file as text solved the problem: 
>"grep -a ...."
> 
> -- System Information:
> Debian Release: 9.3
>  APT prefers stable-updates
>  APT policy: (500, 'stable-updates'), (500, 'stable')
> Architecture: amd64 (x86_64)
> 
> Kernel: Linux 4.9.0-5-amd64 (SMP w/1 CPU core)
> Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8), 
> LANGUAGE=es_ES.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: sysvinit (via /sbin/init)
> 

I suppose you upgraded from jessie to stretch.
I am not sure of fully understanding your message. Could you please
clarify what version of grep didn't detect the patterns?

Anyway, as far as I understand from upstream's comments, grep's previous
behaviour when detecting "binary files" was not suitable.  The change
was made to avoid security issues, or undetermined behaviours, that
could be related to invalid characters. In your case, the .html file
could include invalid chars at the beginning, or the encoding is maybe
wrong.

This is probably not a bug.

Cheers,

 -- Santiago

Bug#891086: grep: Previous versios took -a as default if files where binary or part binary.

Reply via email to