Generator Expressions and CSV

2009-07-17 Thread Zaki
Hey all,

I'm really new to Python and this may seem like a really dumb
question, but basically, I wrote a script to do the following, however
the processing time/memory usage is not what I'd like it to be. Any
suggestions?


Outline:
1. Read tab delim files from a directory, files are of 3 types:
install, update, and q. All 3 types contain ID values that are the
only part of interest.
2. Using set() and set.add(), generate a list of unique IDs from
install and update files.
3. Using the set created in (2), check the q files to see if there are
matches for IDs. Keep all matches, and add any non matches (which only
occur once in the q file) to a queue of lines to be removed from teh q
files.
4. Remove the lines in the q for each file. (I haven't quite written
the code for this, but I was going to implement this using csv.writer
and rewriting all the lines in the file except for the ones in the
removal queue).

Now, I've tried running this and it takes much longer than I'd like. I
was wondering if there might be a better way to do things (I thought
generator expressions might be a good way to attack this problem, as
you could generate the set, and then check to see if there's a match,
and write each line that way).


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generator Expressions and CSV

2009-07-17 Thread Zaki
On Jul 17, 2:49 pm, MRAB  wrote:
> Zaki wrote:
> > Hey all,
>
> > I'm really new to Python and this may seem like a really dumb
> > question, but basically, I wrote a script to do the following, however
> > the processing time/memory usage is not what I'd like it to be. Any
> > suggestions?
>
> > Outline:
> > 1. Read tab delim files from a directory, files are of 3 types:
> > install, update, and q. All 3 types contain ID values that are the
> > only part of interest.
> > 2. Using set() and set.add(), generate a list of unique IDs from
> > install and update files.
> > 3. Using the set created in (2), check the q files to see if there are
> > matches for IDs. Keep all matches, and add any non matches (which only
> > occur once in the q file) to a queue of lines to be removed from teh q
> > files.
> > 4. Remove the lines in the q for each file. (I haven't quite written
> > the code for this, but I was going to implement this using csv.writer
> > and rewriting all the lines in the file except for the ones in the
> > removal queue).
>
> > Now, I've tried running this and it takes much longer than I'd like. I
> > was wondering if there might be a better way to do things (I thought
> > generator expressions might be a good way to attack this problem, as
> > you could generate the set, and then check to see if there's a match,
> > and write each line that way).
>
> Why are you checking and removing lines in 2 steps? Why not copy the
> matching lines to a new q file and then replace the old file with the
> new one (or, maybe, delete the new q file if no lines were removed)?

That's what I've done now.

Here is the final code that I have running. It's very much 'hack' type
code and not at all efficient or optimized and any help in optimizing
it would be greatly appreciated.

import csv
import sys
import os
import time

begin = time.time()

#Check minutes elapsed
def timeElapsed():
current = time.time()
elapsed = current-begin
return round(elapsed/60)


#USAGE: python logcleaner.py  

inputdir = sys.argv[1]
outputdir = sys.argv[2]

logfilenames = os.listdir(inputdir)



IDs = set() #IDs from update and install logs
foundOnceInQuery = set()
#foundTwiceInQuery = set()
#IDremovalQ = set() Note: Unnecessary, duplicate of foundOnceInQuery;
Queue of IDs to remove from query logs (IDs found only once in query
logs)

#Generate Filename Queues For Install/Update Logs, Query Logs
iNuQ = []
queryQ = []

for filename in logfilenames:
if filename.startswith("par1.install") or filename.startswith
("par1.update"):
iNuQ.append(filename)
elif filename.startswith("par1.query"):
queryQ.append(filename)

totalfiles = len(iNuQ) + len(queryQ)
print "Total # of Files to be Processed:" , totalfiles
print "Install/Update Logs to be processed:" , len(iNuQ)
print "Query logs to be processed:" , len(queryQ)

#Process install/update queue to generate list of valid IDs
currentfile = 1
for file in iNuQ:
print "Processing", currentfile, "install/update log out of", len
(iNuQ)
print timeElapsed()
reader = csv.reader(open(inputdir+file),delimiter = '\t')
for row in reader:
IDs.add(row[2])
currentfile+=1

print "Finished processing install/update logs"
print "Unique IDs found:" , len(IDs)
print "Total Time Elapsed:", timeElapsed()

currentfile = 1
for file in queryQ:
print "Processing", currentfile, "query log out of", len(queryQ)
print timeElapsed()
reader = csv.reader(open(inputdir+file), delimiter = '\t')
outputfile = csv.writer(open(outputdir+file), 'w')
for row in reader:
if row[2] in IDs:
ouputfile.writerow(row)
else:
if row[2] in foundOnceInQuery:
foundOnceInQuery.remove(row[2])
outputfile.writerow(row)
#IDremovalQ.remove(row[2])
#foundTwiceInQuery.add(row[2])

else:
foundOnceInQuery.add(row[2])
#IDremovalQ.add(row[2])


currentfile+=1

print "Finished processing query logs and writing new files"
print "# of Query log entries removed:" , len(foundOnceInQuery)
print "Total Time Elapsed:", timeElapsed()


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Generator Expressions and CSV

2009-07-17 Thread Zaki
On Jul 17, 5:31 pm, MRAB  wrote:
> Zaki wrote:
> > On Jul 17, 2:49 pm, MRAB  wrote:
> >> Zaki wrote:
> >>> Hey all,
> >>> I'm really new to Python and this may seem like a really dumb
> >>> question, but basically, I wrote a script to do the following, however
> >>> the processing time/memory usage is not what I'd like it to be. Any
> >>> suggestions?
> >>> Outline:
> >>> 1. Read tab delim files from a directory, files are of 3 types:
> >>> install, update, and q. All 3 types contain ID values that are the
> >>> only part of interest.
> >>> 2. Using set() and set.add(), generate a list of unique IDs from
> >>> install and update files.
> >>> 3. Using the set created in (2), check the q files to see if there are
> >>> matches for IDs. Keep all matches, and add any non matches (which only
> >>> occur once in the q file) to a queue of lines to be removed from teh q
> >>> files.
> >>> 4. Remove the lines in the q for each file. (I haven't quite written
> >>> the code for this, but I was going to implement this using csv.writer
> >>> and rewriting all the lines in the file except for the ones in the
> >>> removal queue).
> >>> Now, I've tried running this and it takes much longer than I'd like. I
> >>> was wondering if there might be a better way to do things (I thought
> >>> generator expressions might be a good way to attack this problem, as
> >>> you could generate the set, and then check to see if there's a match,
> >>> and write each line that way).
> >> Why are you checking and removing lines in 2 steps? Why not copy the
> >> matching lines to a new q file and then replace the old file with the
> >> new one (or, maybe, delete the new q file if no lines were removed)?
>
> > That's what I've done now.
>
> > Here is the final code that I have running. It's very much 'hack' type
> > code and not at all efficient or optimized and any help in optimizing
> > it would be greatly appreciated.
>
> > import csv
> > import sys
> > import os
> > import time
>
> > begin = time.time()
>
> > #Check minutes elapsed
> > def timeElapsed():
> >     current = time.time()
> >     elapsed = current-begin
> >     return round(elapsed/60)
>
> > #USAGE: python logcleaner.py  
>
> > inputdir = sys.argv[1]
> > outputdir = sys.argv[2]
>
> > logfilenames = os.listdir(inputdir)
>
> > IDs = set() #IDs from update and install logs
> > foundOnceInQuery = set()
> > #foundTwiceInQuery = set()
> > #IDremovalQ = set() Note: Unnecessary, duplicate of foundOnceInQuery;
> > Queue of IDs to remove from query logs (IDs found only once in query
> > logs)
>
> > #Generate Filename Queues For Install/Update Logs, Query Logs
> > iNuQ = []
> > queryQ = []
>
> > for filename in logfilenames:
> >     if filename.startswith("par1.install") or filename.startswith
> > ("par1.update"):
>
>       if filename.startswith(("par1.install", "par1.update")):
>
> >         iNuQ.append(filename)
> >     elif filename.startswith("par1.query"):
> >         queryQ.append(filename)
>
> > totalfiles = len(iNuQ) + len(queryQ)
> > print "Total # of Files to be Processed:" , totalfiles
> > print "Install/Update Logs to be processed:" , len(iNuQ)
> > print "Query logs to be processed:" , len(queryQ)
>
> > #Process install/update queue to generate list of valid IDs
> > currentfile = 1
> > for file in iNuQ:
>
>  >     print "Processing", currentfile, "install/update log out of", len
>  > (iNuQ)
>  >     print timeElapsed()
>  >     reader = csv.reader(open(inputdir+file),delimiter = '\t')
>  >     for row in reader:
>  >         IDs.add(row[2])
>  >     currentfile+=1
>
> Best not to call it 'file'; that's a built-in name.
>
> Also you could use 'enumerate', and joining filepaths is safer with
> os.path.join().
>
> for currentfile, filename in enumerate(iNuQ, start=1):
>      print "Processing", currentfile, "install/update log out of", len(iNuQ)
>      print timeElapsed()
>      current_path = os.path.join(inputdir, filename)
>      reader = csv.reader(open(current_path), delimiter = '\t')
>      for row in reader:
>          IDs.add(row[2])
&

Re: Generator Expressions and CSV

2009-07-17 Thread Zaki
On Jul 17, 6:40 pm, Jon Clements  wrote:
> On 17 July, 21:08, Zaki  wrote:
>
>
>
> > On Jul 17, 2:49 pm, MRAB  wrote:
>
> > > Zaki wrote:
> > > > Hey all,
>
> > > > I'm really new to Python and this may seem like a really dumb
> > > > question, but basically, I wrote a script to do the following, however
> > > > the processing time/memory usage is not what I'd like it to be. Any
> > > > suggestions?
>
> > > > Outline:
> > > > 1. Read tab delim files from a directory, files are of 3 types:
> > > > install, update, and q. All 3 types contain ID values that are the
> > > > only part of interest.
> > > > 2. Using set() and set.add(), generate a list of unique IDs from
> > > > install and update files.
> > > > 3. Using the set created in (2), check the q files to see if there are
> > > > matches for IDs. Keep all matches, and add any non matches (which only
> > > > occur once in the q file) to a queue of lines to be removed from teh q
> > > > files.
> > > > 4. Remove the lines in the q for each file. (I haven't quite written
> > > > the code for this, but I was going to implement this using csv.writer
> > > > and rewriting all the lines in the file except for the ones in the
> > > > removal queue).
>
> > > > Now, I've tried running this and it takes much longer than I'd like. I
> > > > was wondering if there might be a better way to do things (I thought
> > > > generator expressions might be a good way to attack this problem, as
> > > > you could generate the set, and then check to see if there's a match,
> > > > and write each line that way).
>
> > > Why are you checking and removing lines in 2 steps? Why not copy the
> > > matching lines to a new q file and then replace the old file with the
> > > new one (or, maybe, delete the new q file if no lines were removed)?
>
> > That's what I've done now.
>
> > Here is the final code that I have running. It's very much 'hack' type
> > code and not at all efficient or optimized and any help in optimizing
> > it would be greatly appreciated.
>
> > import csv
> > import sys
> > import os
> > import time
>
> > begin = time.time()
>
> > #Check minutes elapsed
> > def timeElapsed():
> >     current = time.time()
> >     elapsed = current-begin
> >     return round(elapsed/60)
>
> > #USAGE: python logcleaner.py  
>
> > inputdir = sys.argv[1]
> > outputdir = sys.argv[2]
>
> > logfilenames = os.listdir(inputdir)
>
> > IDs = set() #IDs from update and install logs
> > foundOnceInQuery = set()
> > #foundTwiceInQuery = set()
> > #IDremovalQ = set() Note: Unnecessary, duplicate of foundOnceInQuery;
> > Queue of IDs to remove from query logs (IDs found only once in query
> > logs)
>
> > #Generate Filename Queues For Install/Update Logs, Query Logs
> > iNuQ = []
> > queryQ = []
>
> > for filename in logfilenames:
> >     if filename.startswith("par1.install") or filename.startswith
> > ("par1.update"):
> >         iNuQ.append(filename)
> >     elif filename.startswith("par1.query"):
> >         queryQ.append(filename)
>
> > totalfiles = len(iNuQ) + len(queryQ)
> > print "Total # of Files to be Processed:" , totalfiles
> > print "Install/Update Logs to be processed:" , len(iNuQ)
> > print "Query logs to be processed:" , len(queryQ)
>
> > #Process install/update queue to generate list of valid IDs
> > currentfile = 1
> > for file in iNuQ:
> >     print "Processing", currentfile, "install/update log out of", len
> > (iNuQ)
> >     print timeElapsed()
> >     reader = csv.reader(open(inputdir+file),delimiter = '\t')
> >     for row in reader:
> >         IDs.add(row[2])
> >     currentfile+=1
>
> > print "Finished processing install/update logs"
> > print "Unique IDs found:" , len(IDs)
> > print "Total Time Elapsed:", timeElapsed()
>
> > currentfile = 1
> > for file in queryQ:
> >     print "Processing", currentfile, "query log out of", len(queryQ)
> >     print timeElapsed()
> >     reader = csv.reader(open(inputdir+file), delimiter = '\t')
> >     outputfile = csv.writer(open(outputdir+file), 'w')
> >     for row in reader:
> >     

Re: python setup problems

2019-01-11 Thread Enas Ahmed Zaki
I want help in solving this problem please
thanks
Enas

On Fri, Jan 11, 2019 at 2:36 PM Enas Ahmed Zaki  wrote:

> Dear sir,
> when I setup the python there is a problem in attached file. I hope I
> found the solution of it.
> thanks for attention
> Eng. Enas Ahmed Zaky
>
> عدم التعرض للفيروسات www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
> <#m_6099717795028892700_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
-- 
https://mail.python.org/mailman/listinfo/python-list