Extracting multiple zip files in a directory

2005-05-18 Thread Lorn
I've been working on this code somewhat succesfully, however I'm unable
to get it to iterate through all the zip files in the directory. As of
now it only extracts the first one it finds. If anyone could lend some
tips on how my iteration scheme should look, it would be hugely
appreciated. Here is what I have so far:

import zipfile, glob, os

from os.path import isfile
fname = filter(isfile, glob.glob('*.zip'))
for fname in fname:
zf =zipfile.ZipFile (fname, 'r')
for file in zf.namelist():
newFile = open ( file, "wb")
newFile.write (zf.read (file))
newFile.close() 
zf.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Extracting multiple zip files in a directory

2005-05-18 Thread Lorn
Thanks John, this works great!

Was wondering what your reasoning is behind replacing "filter" with the
x for x statement?

Appreciate the help, thanks again.

Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Extracting multiple zip files in a directory

2005-05-19 Thread Lorn
Ok, I probably should have seen this coming. Working with small zip
files is no problem with the above script. However, when opening a 120+
MB compressed file that uncompresses to over 1GB, I unfortunately get
memory errors. Is this because python is holding the extracted file in
memory, as opposed to spooling it to disk, before writing? Does anyone
know any way around this... as of now, I'm out of ideas :( ??

-- 
http://mail.python.org/mailman/listinfo/python-list


Memory errors with large zip files

2005-05-20 Thread Lorn
Is there a limitation with python's zipfile utility that limits the
size of a file that can be extracted? I'm currently trying to extract
125MB zip files with files that are uncompressed to > 1GB and am
receiving memory errors. Indeed my ram gets maxed during extraction and
then the script quits. Is there a way to spool to disk on the fly, or
is necessary that python opens the entire file before writing? The code
below iterates through a directory of zip files and extracts them
(thanks John!), however for testing I've just been using one file:

zipnames = [x for x in glob.glob('*.zip') if isfile(x)]
for zipname in zipnames:
zf =zipfile.ZipFile (zipname, 'r')
for zfilename in zf.namelist():
newFile = open ( zfilename, "wb")
newFile.write (zf.read (zfilename))
newFile.close()
zf.close()


Any suggestions or comments on how I might be able to work with zip
files of this size would be very helpful.

Best regards,
Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory errors with large zip files

2005-05-20 Thread Lorn
Ok, I'm not sure if this helps any, but in debugging it a bit I see the
script stalls on:

newFile.write (zf.read (zfilename))

The memory error generated references line 357 of  the zipfile.py
program at the point of decompression:

elif zinfo.compress_type == ZIP_DEFLATED:
   if not zlib:
  raise RuntimeError, \
  "De-compression requires the (missing) zlib module"
  # zlib compress/decompress code by Jeremy Hylton of CNRI
dc = zlib.decompressobj(-15)
bytes = dc.decompress(bytes)  ###  <-- right here

Is there anyway to modify how my code is approaching this or perhaps
how the zipfile code is handling it or do I need to just invest in more
RAM? I currently have 512 MB and thought that would be plenty
perhaps I was wrong :-(. If anyone has any ideas it would truly be very
helpful.

Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


String manipulations

2005-05-28 Thread Lorn
I'm trying to work on a dataset that has it's primary numbers saved as
floats in string format. I'd like to work with them as integers with an
implied decimal to the hundredth. The problem is that the current
precision is variable. For instance, some numbers have 4 decimal places
while others have 2, etc. (10.7435 vs 1074.35)... all numbers are of
fixed length.

I have some ideas of how to do this, but I'm wondering if there's a
better way. My current way is to brute force search where the decimal
is by slicing and then cutoff the extraneous numbers, however, it would
be nice to stay away from a bunch of if then's.

Does anyone have any ideas on how to do this more efficiently?

Many Thanks,
Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String manipulations

2005-05-28 Thread Lorn
Yes, that would get rid of the decimals... but it wouldn't get rid of
the extraneous precision. Unfortunately, the precision out to the ten
thousandth is noise... I don't need to round it either as the numbers
are artifacts of an integer to float conversion. Basically, I need to
know how many decimal places there are and then make the necessary
deletions before I can normalize by adding zeros, multiplying, etc.

Thanks for your suggestion, though.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: String manipulations

2005-05-28 Thread Lorn
Thank you Elliot, this solution is the one I was trying to come up
with. Thank you for your help and thank you to everyone for their
suggestions.

Best regards,
Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Dynamic Lists, or...?

2005-06-11 Thread Lorn
I'm trying to figure out a way to create dynamic lists or possibly
antother solution for the following problem. I have multiple lines in a
text file (every line is the same format) that are iterated over and
which need to be compared to previous lines in the file in order to
perform some simple math. Each line contains 3 fileds: a descriptor and
two integers. Here is an example:

rose, 1, 500
lilac, 1, 300
lilly, 1, 400
rose, 0, 100

The idea is that the 0/1 values are there to let the program know
wether to add or subtract the second integer value for a specific
descriptor (flower in this case). So the program comes upon rose, adds
the 500 to an empty list, waits for the next appearance of the rose
descriptor and then (in this case) subtracts 100 from 500 and prints
the value. If the next rose was a 1 then it would have added 100.

I'm uncertain on how to approach doing this though. My idea was to
somehow be able to create lists dynamically upon each new occurence of
a descriptor that currently has no list and then perform the
calculations from there. Unfortunately, the list of descriptors is
potentially infinte, so I'm unable to previously create lists with the
descriptor names. Could anyonw give any suggestions on how to best
approach this problem, hopefully I've been clear enough? Any help would
be very gratly appreciated.

Best regards,
Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Working with Huge Text Files

2005-03-18 Thread Lorn Davies
Hi there, I'm a Python newbie hoping for some direction in working with
text files that range from 100MB to 1G in size. Basically certain rows,
sorted by the first (primary) field maybe second (date), need to be
copied and written to their own file, and some string manipulations
need to happen as well. An example of the current format:

XYZ,04JAN1993,9:30:27,28.87,7600,40,0,Z,N
XYZ,04JAN1993,9:30:28,28.87,1600,40,0,Z,N
 |
 | followed by like a million rows similar to the above, with
 | incrementing date and time, and then on to next primary field
 |
ABC,04JAN1993,9:30:27,28.875,7600,40,0,Z,N
 |
 | etc., there are usually 10-20 of the first field per file
 | so there's a lot of repetition going on
 |

The export would ideally look like this where the first field would be
written as the name of the file (XYZ.txt):

19930104, 93027, 2887, 7600, 40, 0, Z, N

Pretty ambitious for a newbie? I really hope not. I've been looking at
simpleParse, but it's a bit intense at first glance... not sure where
to start, or even if I need to go that route. Any help from you guys in
what direction to go or how to approach this would be hugely
appreciated.

Best regards,
Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Working with Huge Text Files

2005-03-19 Thread Lorn Davies
Thank you all very much for your suggestions and input... they've been
very helpful. I found the easiest apporach, as a beginner to this, was
working with Chirag's code. Thanks Chirag, I was actually able to read
and make some edit's to the code and then use it... woohooo!

My changes are annotated with ##:

data_file = open('G:\pythonRead.txt', 'r')
data_file.readline()  ## this was to skip the first line
months = {'JAN':'01', 'FEB':'02', 'MAR':'03', 'APR':'04', 'MAY':'05',
'JUN':'06', 'JUL':'07', 'AUG':'08', 'SEP':'09', 'OCT':'10', 'NOV':'11',
'DEC':'12'}
output_files = {}
for line in data_file:
fields = line.strip().split(',')
length = len(fields[3])  ## check how long the field is
N = 'P','N'
filename = fields[0]
if filename not in output_files:
output_files[filename] = open(filename+'.txt', 'w')
if  (fields[8] == 'N' or 'P') and (fields[6] == '0' or '1'):
   ## This line above doesn't work, can't figure out how to struct?
   fields[1] = fields[1][5:] + months[fields[1][2:5]] +
fields[1][:2]
fields[2] = fields[2].replace(':', '')
if length == 6:## check for 6 if not add a 0
fields[3] = fields[3].replace('.', '')
else:
fields[3] = fields[3].replace('.', '') + '0'
print >>output_files[filename], ', '.join(fields[1:5])
for filename in output_files:
output_files[filename].close()
data_file.close()

The main changes were to create a check for the length of fields[3], I
wanted to normalize it at 6 digits... the problem I can seee with it
potentially is if I come across lengths < 5, but I have some ideas to
fix that. The other change I attempted was a criteria for what to print
based on the value of fields[8] and fields[6]. It didn't work so well.
I'm a little confused at how to structure booleans like that... I come
from a little experience in a Pascal type scripting language where "x
and y" would entail both having to be true before continuing and "x or
y" would mean either could be true before continuing. Python, unless
I'm misunderstanding (very possible), doesn't organize it as such. I
thought of perhaps using a set of if, elif, else statements for
processing the fileds, but didn't think that would be the most
elegant/efficient solution.

Anyway, any critiques/ideas are welcome... they'll most definitely help
me understand this language a bit better. Thank you all again for your
great replies and thank you Chirag for getting me up and going.

Lorn

-- 
http://mail.python.org/mailman/listinfo/python-list