Checking Common File Types

2013-12-01 Thread jade
Hello, 
I'm trying to create a script that checks all the files in my 'downloaded' 
directory against common file types and then tells me how many of the files in 
that directory aren't either a GIF or a JPG file. I'm familiar with basic 
Python but this is the first time I've attempted anything like this and I'm 
looking for a little help or a point in the right direction? 

file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'),  '\x47\x49\x46':('GIF','gif')}
def readFile():filename = r'c:/temp/downloads'  fh = open(filename, 
'r') file_sig = fh.read(4) print '[*] check_sig() File:',filename #, 'Hash 
Sig:', binascii.hexlify(file_sig) 
RegardsJade



 
  -- 
https://mail.python.org/mailman/listinfo/python-list


RE: Checking Common File Types

2013-12-01 Thread jade


> To: [email protected]
> From: [email protected]
> Subject: Re: Checking Common File Types
> Date: Sun, 1 Dec 2013 18:23:22 -0500
> 
> On Sun, 1 Dec 2013 18:27:16 +, jade  declaimed the
> following:
> 
> >Hello, 
> >I'm trying to create a script that checks all the files in my 'downloaded' 
> >directory against common file types and then tells me how many of the files 
> >in that directory aren't either a GIF or a JPG file. I'm familiar with basic 
> >Python but this is the first time I've attempted anything like this and I'm 
> >looking for a little help or a point in the right direction? 
> >
> >file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'),  '\x47\x49\x46':('GIF','gif')}
> 
>   Apparently you presume the file extensions are inaccurate, as you are
> digging into the files for signatures.
> 
> >def readFile():filename = r'c:/temp/downloads'  fh = open(filename, 
> >'r') file_sig = fh.read(4) print '[*] check_sig() File:',filename #, 
> >'Hash Sig:', binascii.hexlify(file_sig) 
> 
>   Note: if you are hardcoding forward slashes, you don't need the raw
> indicator...
> 
>   That said, what is "c:/temp/downloads"? You apparently are opening IT
> as the file to be examined. Is it supposed to be a directory containing
> many files, a file containing a list of files, ???
> 
>   What is "check_sig" -- it looks like a function you haven't defined --
> but it's inside the quotes making a string literal that will never be
> called anyway.
> 
>   If you are just concerned with one directory of files, you might want
> to read the help file on the glob module, along with os.path
> (join/splitext/etc). Or just string methods...
> 
> >>> import glob
> >>> import os.path
> >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> ...   "documents/BW-conversion/*")
> >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> ...   "documents/BW-conversion/*")
> >>> files = glob.glob(TARGET)
> >>> for fn in files:
> ...   fp, fx = os.path.splitext(fn)
> ...   print "File %s purports to be of type %s" % (fn, fx.upper())
> ... 
> File C:\Users\Wulfraed\documents/BW-conversion\BW-1.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-2.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-3.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-4.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BWConv.html purports to be
> of type .HTML
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b1.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b2.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b3.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b4.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b5.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b6.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_col.jpg purports to be
> of type .JPG
> >>> 
> -- 
>   Wulfraed Dennis Lee Bieber AF6VN
> [email protected]://wlfraed.home.netcom.com/
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list



Hi, thanks for all your replies. I realised pretty soon after I asked for help 
that I was trying to read the wrong amount of bytes and set about completely 
rewriting my code (after a coffee break)
import sys, os, binascii
def readfile():

dictionary = {'474946':('GIF', 'gif'), 'ffd8ff':('JPEG', 'jpeg')}try:   
 files = os.listdir('C:\\Temp\\downloads')for item in 
files:f = open('C:\\Temp\\downloads\\'+ item, 'r')
file_sig = f.read(3)file_sig_hex = binascii.hexlify(file_sig)   
 if file_sig_hex in dictionary:
print item + ' is a image file, it is a ' + file_sig
else:print item + ' is not an image file, it is' 
+file_sig
print file_sig_hex

except:print 'Error. Try again'
finally:if 'f' in locals():f.close()
def