duikboot a écrit :
> Hello,
>
> I am trying to extract a list of strings from a text. I am looking it
> for hours now, googling didn't help either.
> Could you please help me?
>
>>>> s = """
>>>> \n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>"""
>>>> regex = re.compile(r'<organisatie.*</organisatie>', re.S)
>>>> L = regex.findall(s)
>>>> print L
> ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie']
>
> I expected:
> [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>
> \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</
> organisatie')]
>
> I must be missing something very obvious.
wrt/ regexp, Jason gave you the answer. Another point is that, when
dealing with XML, it's sometime better to use an XML parser.
Q&D :
>>> from xml.etree import ElementTree as ET
>>> s = "<root>" + s + "</root>"
>>> tree = ET.fromstring(s)
>>> tree
<Element root at b795b2ac>
>>> tree.findall("organisatie/Profiel_Id")
[<Element Profiel_Id at b795b32c>, <Element Profiel_Id at b795b3ec>]
>>> _[0].text
'28996'
>>> [it.text for it in tree.findall("organisatie/Profiel_Id")]
['28996', '28997']
>>>
HTH
--
http://mail.python.org/mailman/listinfo/python-list