Re: Detecting Binary content in files

2009-04-01 Thread John Machin
On Apr 2, 8:39 am, John Machin wrote: > On Apr 1, 4:59 pm, Dennis Lee Bieber wrote: > > > > > On Tue, 31 Mar 2009 14:26:08 -0700 (PDT), ritu > > declaimed the following in > > gmane.comp.python.general: > > > > if ( ( -B $filename || > > >            $filename =~ /\.pdf$/ ) && > > >          -s

Re: Detecting Binary content in files

2009-04-01 Thread John Machin
On Apr 1, 4:59 pm, Dennis Lee Bieber wrote: > On Tue, 31 Mar 2009 14:26:08 -0700 (PDT), ritu > declaimed the following in > gmane.comp.python.general: > > > > > if ( ( -B $filename || > >            $filename =~ /\.pdf$/ ) && > >          -s $filename > 0 ) { > >         return(1); > >     } > >

Re: Detecting Binary content in files

2009-03-31 Thread Steven D'Aprano
On Tue, 31 Mar 2009 09:23:05 -0700, ritu wrote: > Hi, > > I'm wondering if Python has a utility to detect binary content in files? Define binary content. > Or if anyone has any ideas on how that can be accomplished? Step one: read the file. Step two: does any of the data you have read match

Re: Detecting Binary content in files

2009-03-31 Thread ritu
On Mar 31, 10:19 am, Josh Dukes wrote: > There might be another way but off the top of my head: > > #!/usr/bin/env python > > def isbin(filename): >    fd=open(filename,'rb') >    for b in fd.read(): >        if ord(b) > 127: >            fd.close() >            return True >    fd.close() >    re

Re: Detecting Binary content in files

2009-03-31 Thread Christian Heimes
Josh Dukes wrote: > Of course this would detect unicode files as being binary and maybe > that's not what you want. How are you thinking about doing it in > perl exactly? There is no such thing as a unicode file. You most likely mean UTF-8 or UTF-16 coded text. Christian -- http://mail.python.o

Re: Detecting Binary content in files

2009-03-31 Thread Grant Edwards
On 2009-03-31, ritu wrote: > I'm wondering if Python has a utility to detect binary content in > files? Yes, check the file size. If it's non-zero, then it has binary content. > Or if anyone has any ideas on how that can be accomplished? I > haven't been able to find any useful information to

Re: Detecting Binary content in files

2009-03-31 Thread Dave Angel
All files are binary, but probably by binary you mean non-text. There are lots of ways to decide if a file is non-text, but I don't know of any "standard" way. You can detect a file as not-ascii by simply searching for any character greater than 0x7f. But that doesn't handle a UTF-8 file, wh

Re: Detecting Binary content in files

2009-03-31 Thread Dave Angel
There are lots of ways to decide if a file is non-text, but I don't know of any "standard" way. You can detect a file as not-ascii by simply searching for any character greater than 0x7f. But that doesn't handle a UTF-8 file, which is an 8bit text file representing Unicode. The way I've see

Re: Detecting Binary content in files

2009-03-31 Thread Josh Dukes
or rather: #!/usr/bin/env python import string def isbin(filename): fd=open(filename,'rb') for b in fd.read(): if not b in string.printable and b not in string.whitespace: fd.close() return True fd.close() return False for f in ['/bin/bash', '/etc/passwd']:

Re: Detecting Binary content in files

2009-03-31 Thread Josh Dukes
s/if ord(b) > 127/if ord(b) > 127 or ord(b) < 32/ On Tue, 31 Mar 2009 10:19:44 -0700 Josh Dukes wrote: > There might be another way but off the top of my head: > > #!/usr/bin/env python > > def isbin(filename): >fd=open(filename,'rb') >for b in fd.read(): >if ord(b) > 127: >

Re: Detecting Binary content in files

2009-03-31 Thread Josh Dukes
There might be another way but off the top of my head: #!/usr/bin/env python def isbin(filename): fd=open(filename,'rb') for b in fd.read(): if ord(b) > 127: fd.close() return True fd.close() return False for f in ['/bin/bash', '/etc/passwd']: print "%

Re: Detecting Binary content in files

2009-03-31 Thread Benjamin Kaplan
On Tue, Mar 31, 2009 at 12:23 PM, ritu wrote: > Hi, > > I'm wondering if Python has a utility to detect binary content in > files? Or if anyone has any ideas on how that can be accomplished? I > haven't been able to find any useful information to accomplish this > (my other option is to fire off

Re: Detecting Binary content in files

2009-03-31 Thread Matt Nordhoff
ritu wrote: > Hi, > > I'm wondering if Python has a utility to detect binary content in > files? Or if anyone has any ideas on how that can be accomplished? I > haven't been able to find any useful information to accomplish this > (my other option is to fire off a perl script from within m python