El 23/02/18 a las 08:42, Caronte Estigia escribió:
> I have a text file, identified as a html document by "file" command which only
> contains (from what I can see on the file) text characters. In that file there

could you share that file (privately if you prefer)?

> are numerous strings containing "2018", but when I use grep to find that 
> string
> I get:
> 
>           <li><a href="/diario_boe/calendarios.php?a=2018">Calendario</a></li>
> <li><a href="/boe/dias/2018/02/21/index.php?s=1"><span class="linkBack">
> anterior</span></a></li>
> <li><a href="/boe/dias/2018/02/23/index.php?s=1"><span class="linkFwd">
> siguiente</span></a></li>
>                   <p><strong>Sumario</strong> BOE-S-2018-47:</p>
>                     <a href="/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf" 
> title
> ="BOE-S-2018-47 en formato PDF firmado " onclick="javascript:
> pageTracker._trackPageview('/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf');">PDF
> </a>
>                     <a href="/diario_boe/xml.php?id=BOE-S-20180222" title=
> "Sumario jueves 22 de febrero de 2018 como documento XML">XML</a>
>                 <li><a href="../../../../../boe_n/dias/2018/02/22/index.php?d=
> 47&amp;s=N">Notificaciones</a></li>
> --->Coincidencia en el fichero binario ayer.html<----

Could you try to grep the file previously setting LC_ALL='C'? (and
without the -a option)

What is the output of `locale -a`

> Using previous grep version all strings were found, but now if I want grep to
> work as before I need to use "grep -a".
> 
> I guess the previous version of grep took "-a" behaviour as the default one,

That is not exact. Take a look at /usr/share/doc/grep/NEWS.gz, to
changes made in 2.21 and 2.23 versions. You would find some explanations
there.

> which treated all files as text unless specified otherwise (which in my 
> opinion
> is the right way to go), I can't happen to see the security issues in this
> behaviour and how those security issues dissapear if I specify the "-a"
> parameter. Looks to me (without reviewing grep's code) that it is trying to
> identify what kind of file it is checking while searching the file (a couple 
> of
> lines are found before the binary message), and I guess it shouldn't do that. 
> I
> think it just have to treat files as text unless specified otherwise with the 
> --binary-files parameter.

Attachment: signature.asc
Description: PGP signature

Reply via email to