> To: [email protected]
> From: [email protected]
> Subject: Re: Checking Common File Types
> Date: Sun, 1 Dec 2013 18:23:22 -0500
>
> On Sun, 1 Dec 2013 18:27:16 +, jade declaimed the
> following:
>
> >Hello,
> >I'm trying to create a script that checks all the files in my 'downloaded'
> >directory against common file types and then tells me how many of the files
> >in that directory aren't either a GIF or a JPG file. I'm familiar with basic
> >Python but this is the first time I've attempted anything like this and I'm
> >looking for a little help or a point in the right direction?
> >
> >file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'), '\x47\x49\x46':('GIF','gif')}
>
> Apparently you presume the file extensions are inaccurate, as you are
> digging into the files for signatures.
>
> >def readFile():filename = r'c:/temp/downloads' fh = open(filename,
> >'r') file_sig = fh.read(4) print '[*] check_sig() File:',filename #,
> >'Hash Sig:', binascii.hexlify(file_sig)
>
> Note: if you are hardcoding forward slashes, you don't need the raw
> indicator...
>
> That said, what is "c:/temp/downloads"? You apparently are opening IT
> as the file to be examined. Is it supposed to be a directory containing
> many files, a file containing a list of files, ???
>
> What is "check_sig" -- it looks like a function you haven't defined --
> but it's inside the quotes making a string literal that will never be
> called anyway.
>
> If you are just concerned with one directory of files, you might want
> to read the help file on the glob module, along with os.path
> (join/splitext/etc). Or just string methods...
>
> >>> import glob
> >>> import os.path
> >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> ... "documents/BW-conversion/*")
> >>> TARGET = os.path.join(os.environ["USERPROFILE"],
> ... "documents/BW-conversion/*")
> >>> files = glob.glob(TARGET)
> >>> for fn in files:
> ... fp, fx = os.path.splitext(fn)
> ... print "File %s purports to be of type %s" % (fn, fx.upper())
> ...
> File C:\Users\Wulfraed\documents/BW-conversion\BW-1.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-2.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-3.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BW-4.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\BWConv.html purports to be
> of type .HTML
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b1.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b2.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b3.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b4.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b5.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_b6.jpg purports to be of
> type .JPG
> File C:\Users\Wulfraed\documents/BW-conversion\roo_col.jpg purports to be
> of type .JPG
> >>>
> --
> Wulfraed Dennis Lee Bieber AF6VN
> [email protected]://wlfraed.home.netcom.com/
>
> --
> https://mail.python.org/mailman/listinfo/python-list
Hi, thanks for all your replies. I realised pretty soon after I asked for help
that I was trying to read the wrong amount of bytes and set about completely
rewriting my code (after a coffee break)
import sys, os, binascii
def readfile():
dictionary = {'474946':('GIF', 'gif'), 'ffd8ff':('JPEG', 'jpeg')}try:
files = os.listdir('C:\\Temp\\downloads')for item in
files:f = open('C:\\Temp\\downloads\\'+ item, 'r')
file_sig = f.read(3)file_sig_hex = binascii.hexlify(file_sig)
if file_sig_hex in dictionary:
print item + ' is a image file, it is a ' + file_sig
else:print item + ' is not an image file, it is'
+file_sig
print file_sig_hex
except:print 'Error. Try again'
finally:if 'f' in locals():f.close()
def