Warren Young wrote: >On Dec 25, 2014, at 11:41 AM, Thomas Wolff <t...@towo.net> wrote: > >> In any case the argument is quite artificial since the new behaviour >> hits many files that are in fact text files. > >Please define the term text file in a way that allows a C programmer >to write a program that automatically does the correct thing for all >members of the class text file without involving locales, or an >equivalent mechanism. ... >If grep runs into a byte sequence that makes it think it is not legal >for your current locale, it must treat the file as raw bytes, unless you >give it -a. > >If you dont like this behavior, say alias grep=grep -a in your >~/.bashrc, and forget the change ever happened. Itll be on you when >some non-text file gets treated as text and grep spams your terminal >with binary garbage, though.
It's better to use the "alias grep='LC_ALL=C grep'" method. It keeps the old way of detecting binaries (for example it detects an .EXE as binary) while allowing you to match mostly-ASCII files with some mismatched-locale characters. The definition you ask for is already in the code. For us non-english people detecting what is "mostly ASCII" is mostly right, at least interactively. I ran into this, actually. I keep a list of my directories and it is in CP1252 for reasons of interfacing with CMD.EXE. Suddenly grep couldn't match it. But I figured something was up and set my locale to CP1252 and then it worked. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple