Norman Khine wrote:
thanks denis,
On Tue, Feb 2, 2010 at 9:30 AM, spir <denis.s...@free.fr> wrote:
On Mon, 1 Feb 2010 16:30:02 +0100
Norman Khine <nor...@khine.net> wrote:
On Mon, Feb 1, 2010 at 1:19 PM, Kent Johnson <ken...@tds.net> wrote:
On Mon, Feb 1, 2010 at 6:29 AM, Norman Khine <nor...@khine.net> wrote:
thanks, what about the whitespace problem?
\s* will match any amount of whitespace includin newlines.
thank you, this worked well.
here is the code:
###
import re
file=en('producers_google_map_code.txt', 'r')
data =repr( file.read().decode('utf-8') )
block =e.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
b =lock.findall(data)
block_list =]
for html in b:
namespace =}
t =e.compile(r"""<strong>(.*)<\/strong>""")
title =.findall(html)
for item in title:
namespace['title'] =tem
u =e.compile(r"""a href=\"\/(.*)\">En savoir plus""")
url =.findall(html)
for item in url:
namespace['url'] =tem
g =e.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
lat =.findall(html)
for item in lat:
namespace['LatLng'] =tem
block_list.append(namespace)
###
can this be made better?
The 3 regex patterns are constants: they can be put out of the loop.
You may also rename b to blocks, and find a more a more accurate name for
block_list; eg block_records, where record =et of (named) fields.
A short desc and/or example of the overall and partial data formats can greatly
help later review, since regex patterns alone are hard to decode.
here are the changes:
import re
file=en('producers_google_map_code.txt', 'r')
data =repr( file.read().decode('utf-8') )
get_record =e.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""")
get_title =e.compile(r"""<strong>(.*)<\/strong>""")
get_url =e.compile(r"""a href=\"\/(.*)\">En savoir plus""")
get_latlng =e.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""")
records =et_record.findall(data)
block_record =]
for record in records:
namespace =}
titles =et_title.findall(record)
for title in titles:
namespace['title'] =itle
urls =et_url.findall(record)
for url in urls:
namespace['url'] =rl
latlngs =et_latlng.findall(record)
for latlng in latlngs:
namespace['latlng'] =atlng
block_record.append(namespace)
print block_record
The def of "namespace" would be clearer imo in a single line:
namespace =title:t, url:url, lat:g}
i am not sure how this will fit into the code!
This also reveals a kind of name confusion, doesn't it?
Denis
Your variable 'file' is hiding a built-in name for the file type. No
harm in this example, but it's a bad habit to get into.
What did you intend to happen if the number of titles, urls, and latIngs
are not each exactly one? As you have it now, if there's more than one,
you spend time adding them all to the dictionary, but only the last one
survives. And if there aren't any, you don't make an entry in the
dictionary.
If that's the exact behavior you want, then you could replace the loop
with an if statement: (untested)
if titles:
namespace['title'] = titles[-1]
On the other hand, if you want a None in your dictionary for missing
information, then something like: (untested)
for record in records:
titles = get_title.findall(record)
title = titles[-1] if titles else None
urls = get_url.findall(record)
url = urls[-1] if urls else None
latlngs = get_latlng.findall(record)
lating = latings[-1] if latings else None
block_record.append( {'title':title, 'url':url, 'lating':lating{ )
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor