El 23/02/18 a las 08:42, Caronte Estigia escribió: > I have a text file, identified as a html document by "file" command which only > contains (from what I can see on the file) text characters. In that file there
could you share that file (privately if you prefer)? > are numerous strings containing "2018", but when I use grep to find that > string > I get: > > <li><a href="/diario_boe/calendarios.php?a=2018">Calendario</a></li> > <li><a href="/boe/dias/2018/02/21/index.php?s=1"><span class="linkBack"> > anterior</span></a></li> > <li><a href="/boe/dias/2018/02/23/index.php?s=1"><span class="linkFwd"> > siguiente</span></a></li> > <p><strong>Sumario</strong> BOE-S-2018-47:</p> > <a href="/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf" > title > ="BOE-S-2018-47 en formato PDF firmado " onclick="javascript: > pageTracker._trackPageview('/boe/dias/2018/02/22/pdfs/BOE-S-2018-47.pdf');">PDF > </a> > <a href="/diario_boe/xml.php?id=BOE-S-20180222" title= > "Sumario jueves 22 de febrero de 2018 como documento XML">XML</a> > <li><a href="../../../../../boe_n/dias/2018/02/22/index.php?d= > 47&s=N">Notificaciones</a></li> > --->Coincidencia en el fichero binario ayer.html<---- Could you try to grep the file previously setting LC_ALL='C'? (and without the -a option) What is the output of `locale -a` > Using previous grep version all strings were found, but now if I want grep to > work as before I need to use "grep -a". > > I guess the previous version of grep took "-a" behaviour as the default one, That is not exact. Take a look at /usr/share/doc/grep/NEWS.gz, to changes made in 2.21 and 2.23 versions. You would find some explanations there. > which treated all files as text unless specified otherwise (which in my > opinion > is the right way to go), I can't happen to see the security issues in this > behaviour and how those security issues dissapear if I specify the "-a" > parameter. Looks to me (without reviewing grep's code) that it is trying to > identify what kind of file it is checking while searching the file (a couple > of > lines are found before the binary message), and I guess it shouldn't do that. > I > think it just have to treat files as text unless specified otherwise with the > --binary-files parameter.
signature.asc
Description: PGP signature