Extracting multiple zip files in a directory
I've been working on this code somewhat succesfully, however I'm unable
to get it to iterate through all the zip files in the directory. As of
now it only extracts the first one it finds. If anyone could lend some
tips on how my iteration scheme should look, it would be hugely
appreciated. Here is what I have so far:
import zipfile, glob, os
from os.path import isfile
fname = filter(isfile, glob.glob('*.zip'))
for fname in fname:
zf =zipfile.ZipFile (fname, 'r')
for file in zf.namelist():
newFile = open ( file, "wb")
newFile.write (zf.read (file))
newFile.close()
zf.close()
--
http://mail.python.org/mailman/listinfo/python-list
Re: Extracting multiple zip files in a directory
Thanks John, this works great! Was wondering what your reasoning is behind replacing "filter" with the x for x statement? Appreciate the help, thanks again. Lorn -- http://mail.python.org/mailman/listinfo/python-list
Re: Extracting multiple zip files in a directory
Ok, I probably should have seen this coming. Working with small zip files is no problem with the above script. However, when opening a 120+ MB compressed file that uncompresses to over 1GB, I unfortunately get memory errors. Is this because python is holding the extracted file in memory, as opposed to spooling it to disk, before writing? Does anyone know any way around this... as of now, I'm out of ideas :( ?? -- http://mail.python.org/mailman/listinfo/python-list
Memory errors with large zip files
Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:
zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()
Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.
Best regards,
Lorn
--
http://mail.python.org/mailman/listinfo/python-list
Re: Memory errors with large zip files
Ok, I'm not sure if this helps any, but in debugging it a bit I see the script stalls on: newFile.write (zf.read (zfilename)) The memory error generated references line 357 of the zipfile.py program at the point of decompression: elif zinfo.compress_type == ZIP_DEFLATED: if not zlib: raise RuntimeError, \ "De-compression requires the (missing) zlib module" # zlib compress/decompress code by Jeremy Hylton of CNRI dc = zlib.decompressobj(-15) bytes = dc.decompress(bytes) ### <-- right here Is there anyway to modify how my code is approaching this or perhaps how the zipfile code is handling it or do I need to just invest in more RAM? I currently have 512 MB and thought that would be plenty perhaps I was wrong :-(. If anyone has any ideas it would truly be very helpful. Lorn -- http://mail.python.org/mailman/listinfo/python-list
String manipulations
I'm trying to work on a dataset that has it's primary numbers saved as floats in string format. I'd like to work with them as integers with an implied decimal to the hundredth. The problem is that the current precision is variable. For instance, some numbers have 4 decimal places while others have 2, etc. (10.7435 vs 1074.35)... all numbers are of fixed length. I have some ideas of how to do this, but I'm wondering if there's a better way. My current way is to brute force search where the decimal is by slicing and then cutoff the extraneous numbers, however, it would be nice to stay away from a bunch of if then's. Does anyone have any ideas on how to do this more efficiently? Many Thanks, Lorn -- http://mail.python.org/mailman/listinfo/python-list
Re: String manipulations
Yes, that would get rid of the decimals... but it wouldn't get rid of the extraneous precision. Unfortunately, the precision out to the ten thousandth is noise... I don't need to round it either as the numbers are artifacts of an integer to float conversion. Basically, I need to know how many decimal places there are and then make the necessary deletions before I can normalize by adding zeros, multiplying, etc. Thanks for your suggestion, though. -- http://mail.python.org/mailman/listinfo/python-list
Re: String manipulations
Thank you Elliot, this solution is the one I was trying to come up with. Thank you for your help and thank you to everyone for their suggestions. Best regards, Lorn -- http://mail.python.org/mailman/listinfo/python-list
Dynamic Lists, or...?
I'm trying to figure out a way to create dynamic lists or possibly antother solution for the following problem. I have multiple lines in a text file (every line is the same format) that are iterated over and which need to be compared to previous lines in the file in order to perform some simple math. Each line contains 3 fileds: a descriptor and two integers. Here is an example: rose, 1, 500 lilac, 1, 300 lilly, 1, 400 rose, 0, 100 The idea is that the 0/1 values are there to let the program know wether to add or subtract the second integer value for a specific descriptor (flower in this case). So the program comes upon rose, adds the 500 to an empty list, waits for the next appearance of the rose descriptor and then (in this case) subtracts 100 from 500 and prints the value. If the next rose was a 1 then it would have added 100. I'm uncertain on how to approach doing this though. My idea was to somehow be able to create lists dynamically upon each new occurence of a descriptor that currently has no list and then perform the calculations from there. Unfortunately, the list of descriptors is potentially infinte, so I'm unable to previously create lists with the descriptor names. Could anyonw give any suggestions on how to best approach this problem, hopefully I've been clear enough? Any help would be very gratly appreciated. Best regards, Lorn -- http://mail.python.org/mailman/listinfo/python-list
Working with Huge Text Files
Hi there, I'm a Python newbie hoping for some direction in working with text files that range from 100MB to 1G in size. Basically certain rows, sorted by the first (primary) field maybe second (date), need to be copied and written to their own file, and some string manipulations need to happen as well. An example of the current format: XYZ,04JAN1993,9:30:27,28.87,7600,40,0,Z,N XYZ,04JAN1993,9:30:28,28.87,1600,40,0,Z,N | | followed by like a million rows similar to the above, with | incrementing date and time, and then on to next primary field | ABC,04JAN1993,9:30:27,28.875,7600,40,0,Z,N | | etc., there are usually 10-20 of the first field per file | so there's a lot of repetition going on | The export would ideally look like this where the first field would be written as the name of the file (XYZ.txt): 19930104, 93027, 2887, 7600, 40, 0, Z, N Pretty ambitious for a newbie? I really hope not. I've been looking at simpleParse, but it's a bit intense at first glance... not sure where to start, or even if I need to go that route. Any help from you guys in what direction to go or how to approach this would be hugely appreciated. Best regards, Lorn -- http://mail.python.org/mailman/listinfo/python-list
Re: Working with Huge Text Files
Thank you all very much for your suggestions and input... they've been
very helpful. I found the easiest apporach, as a beginner to this, was
working with Chirag's code. Thanks Chirag, I was actually able to read
and make some edit's to the code and then use it... woohooo!
My changes are annotated with ##:
data_file = open('G:\pythonRead.txt', 'r')
data_file.readline() ## this was to skip the first line
months = {'JAN':'01', 'FEB':'02', 'MAR':'03', 'APR':'04', 'MAY':'05',
'JUN':'06', 'JUL':'07', 'AUG':'08', 'SEP':'09', 'OCT':'10', 'NOV':'11',
'DEC':'12'}
output_files = {}
for line in data_file:
fields = line.strip().split(',')
length = len(fields[3]) ## check how long the field is
N = 'P','N'
filename = fields[0]
if filename not in output_files:
output_files[filename] = open(filename+'.txt', 'w')
if (fields[8] == 'N' or 'P') and (fields[6] == '0' or '1'):
## This line above doesn't work, can't figure out how to struct?
fields[1] = fields[1][5:] + months[fields[1][2:5]] +
fields[1][:2]
fields[2] = fields[2].replace(':', '')
if length == 6:## check for 6 if not add a 0
fields[3] = fields[3].replace('.', '')
else:
fields[3] = fields[3].replace('.', '') + '0'
print >>output_files[filename], ', '.join(fields[1:5])
for filename in output_files:
output_files[filename].close()
data_file.close()
The main changes were to create a check for the length of fields[3], I
wanted to normalize it at 6 digits... the problem I can seee with it
potentially is if I come across lengths < 5, but I have some ideas to
fix that. The other change I attempted was a criteria for what to print
based on the value of fields[8] and fields[6]. It didn't work so well.
I'm a little confused at how to structure booleans like that... I come
from a little experience in a Pascal type scripting language where "x
and y" would entail both having to be true before continuing and "x or
y" would mean either could be true before continuing. Python, unless
I'm misunderstanding (very possible), doesn't organize it as such. I
thought of perhaps using a set of if, elif, else statements for
processing the fileds, but didn't think that would be the most
elegant/efficient solution.
Anyway, any critiques/ideas are welcome... they'll most definitely help
me understand this language a bit better. Thank you all again for your
great replies and thank you Chirag for getting me up and going.
Lorn
--
http://mail.python.org/mailman/listinfo/python-list
