* Paul Wise <p...@debian.org>, 2015-09-06, 09:26:
For source code, we might use ack(1).
ack seems to use only extension and first-line matching, I think I
would prefer to only use file/magic.
I find file(1) unreliable for programming languages.
For example, this is how it classifies *.py files in my
/usr/lib/python2.7:
4649 text/x-python
1725 text/plain
218 text/x-c++
204 inode/x-empty
12 text/x-c
12 text/troff
10 text/x-tex
10 text/html
3 application/octet-stream
2 text/x-ruby
Almost 30% of detected types are wrong. :\
For binary files, I came up with this monster:
find . -type f -exec sh -c 'file -i --print0 "$1" | cut -d "" -f 2 | grep -q ": image/png;" &&
printf "%s\0" "$1"' sh {} \;
*sigh* Now I realized that it doesn't do the right thing for filenames
with embedded \n.
Probably you want these arguments to file instead of -i?
--mime-type --no-pad --no-buffer --separator ""
Life is too short to read the whole file(1) manpage. :-P
--no-pad and --no-buffer would make no difference in this case, because
I run one file(1) per file.
I'll probably write a new tool, so that you can write:
find . -type f -print0 | mimegrep -0 image/png
I think we would want the file tool to do this?
find . -type f -print0 | xargs -0 file -0 --find 'image/png' --find 'text/*'
This way you can paste the command into chroots too.
I don't mind, although I won't be me who writes code for file, or
persuade file upstream to implement it for us.
--
Jakub Wilk