---- John Gunderman <[EMAIL PROTECTED]> wrote: > I am parsing the output of the mork.pl, which is a DORK (the mozilla format) > parser. I don't know Perl, so I decided to write a Python script to do what I > wanted, which basically is to create a dictionary listing each site and its > corresponding values instead of outputting into plaintext. Unfortunately, the > output of mork.pl is 5000+ lines so reading the whole document wouldn't be > that efficient.
OK, I looked briefly at mork.pl. You should be able to process it line-by-line with something like this: for line in history_file: if not line.strip(): continue # skip blank lines; may not be needed time, count, url = line.split() # do something with time, count, url Kent Currently it uses: > for line in history_file.readlines(): > but I dont know if this has to read all lines before it goes through it. if > it does, then would it be more efficient to use > while line != '/t': > line = history_file.readline() > I was thinking of just appending each character to the string until it sees > '/t', and then using int() on the string, but is there an easier way? > > John > > ----- Original Message ---- > From: Kent Johnson <[EMAIL PROTECTED]> > To: John Gunderman <[EMAIL PROTECTED]> > Cc: tutor@python.org > Sent: Saturday, February 23, 2008 3:43:44 AM > Subject: Re: [Tutor] how to parse a multiple character words from plaintext > > John Gunderman wrote: > > I am looking to parse a plaintext from a document. However, I am > > confused about the actual methodology of it. This is because some of the > > words will be multiple digits or characters. However, I don't know the > > length of the words before the parse. Is there a way to somehow have > > open() grab something until it sees a /t or ' '? I was thinking I could > > have it count ahead the number of spaces till the stopping point and > > then parse till that point using read(), but that seems sort of > > inefficient. Is there a better way to pull this off? Thanks in advance. > > How big is the file? Can you just read the whole document and parse the > resulting string? Or read by lines? > > Depending on how complex your parsing is, you might want to use > pyparsing or one of the other Python parser libraries. > http://pyparsing.wikispaces.com/ > http://nedbatchelder.com/text/python-parsers.html > > Kent > > > > > > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? > Find them fast with Yahoo! Search. > http://tools.search.yahoo.com/newsearch/category.php?category=shopping _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor