* Paul Wise <p...@debian.org>, 2015-09-06, 09:26:
For source code, we might use ack(1).

ack seems to use only extension and first-line matching, I think I would prefer to only use file/magic.

I find file(1) unreliable for programming languages.

For example, this is how it classifies *.py files in my /usr/lib/python2.7:

  4649  text/x-python
  1725  text/plain
   218  text/x-c++
   204  inode/x-empty
    12  text/x-c
    12  text/troff
    10  text/x-tex
    10  text/html
     3  application/octet-stream
     2  text/x-ruby

Almost 30% of detected types are wrong. :\

For binary files, I came up with this monster:

find . -type f -exec sh -c 'file -i --print0 "$1" | cut -d "" -f 2 | grep -q ": image/png;" && 
printf "%s\0" "$1"' sh {} \;

*sigh* Now I realized that it doesn't do the right thing for filenames with embedded \n.

Probably you want these arguments to file instead of -i?

--mime-type --no-pad --no-buffer --separator ""

Life is too short to read the whole file(1) manpage. :-P

--no-pad and --no-buffer would make no difference in this case, because I run one file(1) per file.

I'll probably write a new tool, so that you can write:

find . -type f -print0 | mimegrep -0 image/png

I think we would want the file tool to do this?

find . -type f -print0 | xargs -0 file -0 --find 'image/png' --find 'text/*'

This way you can paste the command into chroots too.

I don't mind, although I won't be me who writes code for file, or persuade file upstream to implement it for us.

--
Jakub Wilk

Reply via email to