Re: [Tutor] parse text file

Dave Angel Tue, 02 Feb 2010 04:28:54 -0800

Norman Khine wrote:

thanks denis,


On Tue, Feb 2, 2010 at 9:30 AM, spir <denis.s...@free.fr> wrote:

On Mon, 1 Feb 2010 16:30:02 +0100
Norman Khine <nor...@khine.net> wrote:

On Mon, Feb 1, 2010 at 1:19 PM, Kent Johnson <ken...@tds.net> wrote:

On Mon, Feb 1, 2010 at 6:29 AM, Norman Khine <nor...@khine.net> wrote:

thanks, what about the whitespace problem?

\s* will match any amount of whitespace includin newlines.

thank you, this worked well.

here is the code:

###
import re
file=en('producers_google_map_code.txt', 'r')
data =repr( file.read().decode('utf-8') )

block =e.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
b =lock.findall(data)
block_list =]
for html in b:
      namespace =}
      t =e.compile(r"""<strong>(.*)<\/strong>""")
      title =.findall(html)
      for item in title:
              namespace['title'] =tem
      u =e.compile(r"""a href=\"\/(.*)\">En savoir plus""")
      url =.findall(html)
      for item in url:
              namespace['url'] =tem
      g =e.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
      lat =.findall(html)
      for item in lat:
              namespace['LatLng'] =tem
      block_list.append(namespace)

###

can this be made better?

The 3 regex patterns are constants: they can be put out of the loop.

You may also rename b to blocks, and find a more a more accurate name for 
block_list; eg block_records, where record =et of (named) fields.

A short desc and/or example of the overall and partial data formats can greatly 
help later review, since regex patterns alone are hard to decode.


here are the changes:

import re
file=en('producers_google_map_code.txt', 'r')
data =repr( file.read().decode('utf-8') )

get_record =e.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
get_title =e.compile(r"""<strong>(.*)<\/strong>""")
get_url =e.compile(r"""a href=\"\/(.*)\">En savoir plus""")
get_latlng =e.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")

records =et_record.findall(data)
block_record =]
for record in records:
        namespace =}
        titles =et_title.findall(record)
        for title in titles:
                namespace['title'] =itle
        urls =et_url.findall(record)
        for url in urls:
                namespace['url'] =rl
        latlngs =et_latlng.findall(record)
        for latlng in latlngs:
                namespace['latlng'] =atlng
        block_record.append(namespace)

print block_record

The def of "namespace" would be clearer imo in a single line:
   namespace =title:t, url:url, lat:g}


i am not sure how this will fit into the code!

This also reveals a kind of name confusion, doesn't it?


Denis

Your variable 'file' is hiding a built-in name for the file type. Noharm in this example, but it's a bad habit to get into.

What did you intend to happen if the number of titles, urls, and latIngsare not each exactly one? As you have it now, if there's more than one,you spend time adding them all to the dictionary, but only the last onesurvives. And if there aren't any, you don't make an entry in thedictionary.

If that's the exact behavior you want, then you could replace the loopwith an if statement: (untested)


        if titles:
                namespace['title'] = titles[-1]

On the other hand, if you want a None in your dictionary for missinginformation, then something like: (untested)


for record in records:


        titles = get_title.findall(record)
        title = titles[-1] if titles else None
        urls = get_url.findall(record)
        url = urls[-1] if urls else None
        latlngs = get_latlng.findall(record)
        lating = latings[-1] if latings else None
        block_record.append( {'title':title, 'url':url, 'lating':lating{ )


DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] parse text file

Reply via email to