setting the length of the backtrace output in pdb
Hi, this might be a rather silly question, but i cannot figure out how to make pdb give me more than 10 lines of output whenever i issue a backtrace command. Is there an easy way to do this? thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Help with regular expression in python
Hi,
i am sorry if this doesn't quite match the subject of the list. If someone
takes offense please point me to where this question should go. Anyway, i have
a problem using regular expressions. I would like to match the line:
1.002000e+01 2.037000e+01 2.128000e+01 1.908000e+01 1.871000e+01 1.914000e+01
2.007000e+01 1.664000e+01 2.204000e+01 2.109000e+01 2.209000e+01 2.376000e+01
2.158000e+01 2.177000e+01 2.152000e+01 2.267000e+01 1.084000e+01 1.671000e+01
1.888000e+01 1.854000e+01 2.064000e+01 2.00e+01 2.20e+01 2.139000e+01
2.137000e+01 2.178000e+01 2.179000e+01 2.123000e+01 2.201000e+01 2.15e+01
2.15e+01 2.199000e+01 : (instance: 0) : some description
The number of floats can vary (in this example there are 32). So what i thought
i'd do is the following:
instance_linetype_pattern_str = '([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?)
{32}'
instance_linetype_pattern = re.compile(instance_linetype_pattern_str)
Basically the expression in the first major set of paranthesis matches a
scientific number format. The '{32}' is supposed to match the previous 32
times. However, it doesn't. I can't figure out why this does not work. I'd
really like to understand it if someone can shed light on it.
thanks
matt
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
Hi guys,
thanks for the suggestions. I had tried the white space before as well (to no
avail). So here is the expression i am using (based on suggestions), but still
no success:
instance_linetype_pattern_str =\
r'(([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+))?\s+){32}(.+)'
instance_linetype_pattern = re.compile(instance_linetype_pattern_str)
results = instance_linetype_pattern.findall(line)
print "results: "; print results
The match i get is:
results:
[('2.199000e+01 ', '2.199000', '.199000', 'e+01', ': (instance: 0)\t:\tsome
description')]
btw: The line to be matched (given below) is ONE line. There are no line
breaks (even though my email client adds them).
matt
On Thursday, August 18, 2011, Vlastimil Brom wrote:
> 2011/8/18 Matt Funk :
> > Hi,
> > i am sorry if this doesn't quite match the subject of the list. If
> > someone takes offense please point me to where this question should go.
> > Anyway, i have a problem using regular expressions. I would like to
> > match the line:
> >
> > 1.002000e+01 2.037000e+01 2.128000e+01 1.908000e+01 1.871000e+01
> > 1.914000e+01 2.007000e+01 1.664000e+01 2.204000e+01 2.109000e+01
> > 2.209000e+01 2.376000e+01 2.158000e+01 2.177000e+01 2.152000e+01
> > 2.267000e+01 1.084000e+01 1.671000e+01 1.888000e+01 1.854000e+01
> > 2.064000e+01 2.00e+01 2.20e+01 2.139000e+01 2.137000e+01
> > 2.178000e+01 2.179000e+01 2.123000e+01 2.201000e+01 2.15e+01
> > 2.15e+01 2.199000e+01 : (instance: 0) : some description
> >
> > The number of floats can vary (in this example there are 32). So what i
> > thought i'd do is the following:
> > instance_linetype_pattern_str =
> > '([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?) {32}'
> > instance_linetype_pattern = re.compile(instance_linetype_pattern_str)
> > Basically the expression in the first major set of paranthesis matches a
> > scientific number format. The '{32}' is supposed to match the previous 32
> > times. However, it doesn't. I can't figure out why this does not work.
> > I'd really like to understand it if someone can shed light on it.
> >
> > thanks
> > matt
> > --
> > http://mail.python.org/mailman/listinfo/python-list
>
> Hi,
> the already suggested handling of whitespace with \s+ etc. at the end
> of the parenthesised patern should help;
> furhtermore, if you are using this pattern in the python source, you
> should either double all backslashes or use a raw string for the
> pattern - with r prepended before the opening quotation mark:
> pattern_str = r"..."
> in order to handle backslashes literally and not as escape character.
> hth,
> vbr
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
Hi Josh,
thanks for the reply. I am no expert so please bear with me:
I thought that the {32} was supposed to match the previous expression 32
times?
So how can i have all matches accessible to me?
matt
On Thursday, August 18, 2011, Josh Benner wrote:
> On Thu, Aug 18, 2011 at 4:03 PM, Matt Funk wrote:
> > Hi guys,
> >
> > thanks for the suggestions. I had tried the white space before as well
> > (to no
> > avail). So here is the expression i am using (based on suggestions), but
> > still
> > no success:
> >
> > instance_linetype_pattern_str =\
> >
> >r'(([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+))?\s+){32}(.+)'
> >
> > instance_linetype_pattern = re.compile(instance_linetype_pattern_str)
> > results = instance_linetype_pattern.findall(line)
> > print "results: "; print results
> >
> >
> > The match i get is:
> > results:
> > [('2.199000e+01 ', '2.199000', '.199000', 'e+01', ': (instance:
> > 0)\t:\tsome description')]
> >
> >
> > btw: The line to be matched (given below) is ONE line. There are no line
> > breaks (even though my email client adds them).
> >
> >
> > matt
>
> If a group matches multiple times, only the last match is accessible. The
> matches returned represent the inner groupings of the last match found.
>
> JB-)
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
Hi,
thanks for the suggestion. I guess i had found another way around the
problem as well. But i really wanted to match the line exactly and i
wanted to know why it doesn't work. That is less for the purpose of
getting the thing to work but more because it greatly annoys me off that
i can't figure out why it doesn't work. I.e. why the expression is not
matches {32} times. I just don't get it.
anyway, thanks though
matt
On 8/19/2011 8:41 AM, Jason Friedman wrote:
>> Hi Josh,
>> thanks for the reply. I am no expert so please bear with me:
>> I thought that the {32} was supposed to match the previous expression 32
>> times?
>>
>> So how can i have all matches accessible to me?
> $ python
> Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
> [GCC 4.4.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
data
> '1.002000e+01 2.037000e+01 2.128000e+01 1.908000e+01 1.871000e+01
> 1.914000e+01 2.007000e+01 1.664000e+01 2.204000e+01 2.109000e+01
> 2.209000e+01 2.376000e+01 2.158000e+01 2.177000e+01 2.152000e+01
> 2.267000e+01 1.084000e+01 1.671000e+01 1.888000e+01 1.854000e+01
> 2.064000e+01 2.00e+01 2.20e+01 2.139000e+01 2.137000e+01
> 2.178000e+01 2.179000e+01 2.123000e+01 2.201000e+01 2.15e+01
> 2.15e+01 2.199000e+01 : (instance: 0) : some
> description'
import re
re.findall(r"\d\.\d+e\+\d+", data)
> ['1.002000e+01', '2.037000e+01', '2.128000e+01', '1.908000e+01',
> '1.871000e+01', '1.914000e+01', '2.007000e+01', '1.664000e+01',
> '2.204000e+01', '2.109000e+01', '2.209000e+01', '2.376000e+01',
> '2.158000e+01', '2.177000e+01', '2.152000e+01', '2.267000e+01',
> '1.084000e+01', '1.671000e+01', '1.888000e+01', '1.854000e+01',
> '2.064000e+01', '2.00e+01', '2.20e+01', '2.139000e+01',
> '2.137000e+01', '2.178000e+01', '2.179000e+01', '2.123000e+01',
> '2.201000e+01', '2.15e+01', '2.15e+01', '2.199000e+01']
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
On Friday, August 19, 2011, Alain Ketterlin wrote:
> Matt Funk writes:
> > thanks for the suggestion. I guess i had found another way around the
> > problem as well. But i really wanted to match the line exactly and i
> > wanted to know why it doesn't work. That is less for the purpose of
> > getting the thing to work but more because it greatly annoys me off that
> > i can't figure out why it doesn't work. I.e. why the expression is not
> > matches {32} times. I just don't get it.
>
> Because a line is not 32 times a number, it is a number followed by 31
> times "a space followed by a number". Using Jason's regexp, you can
> build the regexp step by step:
>
> number = r"\d\.\d+e\+\d+"
> numbersequence = r"%s( %s){31}" % (number,number)
That didn't work either. Using the (modified (where the (.+) matches the end of
the line)) expression as:
number = r"\d\.\d+e\+\d+"
numbersequence = r"%s( %s){31}(.+)" % (number,number)
instance_linetype_pattern = re.compile(numbersequence)
The results obtained are:
results:
[(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')]
so this matches the last number plus the string at the end of the line, but no
retaining the previous numbers.
Anyway, i think at this point i will go another route. Not sure where the
issues lies at this point.
thanks for all the help
matt
>
> There are better ways to build your regexp, but I think this one is
> convenient to answer your question. You still have to append what will
> match the end of the line.
>
> -- Alain.
>
> P/S: please do not top-post
>
> >> $ python
> >> Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
> >> [GCC 4.4.3] on linux2
> >> Type "help", "copyright", "credits" or "license" for more information.
> >>
> >>>>> data
> >>
> >> '1.002000e+01 2.037000e+01 2.128000e+01 1.908000e+01 1.871000e+01
> >> 1.914000e+01 2.007000e+01 1.664000e+01 2.204000e+01 2.109000e+01
> >> 2.209000e+01 2.376000e+01 2.158000e+01 2.177000e+01 2.152000e+01
> >> 2.267000e+01 1.084000e+01 1.671000e+01 1.888000e+01 1.854000e+01
> >> 2.064000e+01 2.00e+01 2.20e+01 2.139000e+01 2.137000e+01
> >> 2.178000e+01 2.179000e+01 2.123000e+01 2.201000e+01 2.15e+01
> >> 2.15e+01 2.199000e+01 : (instance: 0) : some
> >> description'
> >>
> >>>>> import re
> >>>>> re.findall(r"\d\.\d+e\+\d+", data)
> >>
> >> ['1.002000e+01', '2.037000e+01', '2.128000e+01', '1.908000e+01',
> >> '1.871000e+01', '1.914000e+01', '2.007000e+01', '1.664000e+01',
> >> '2.204000e+01', '2.109000e+01', '2.209000e+01', '2.376000e+01',
> >> '2.158000e+01', '2.177000e+01', '2.152000e+01', '2.267000e+01',
> >> '1.084000e+01', '1.671000e+01', '1.888000e+01', '1.854000e+01',
> >> '2.064000e+01', '2.00e+01', '2.20e+01', '2.139000e+01',
> >> '2.137000e+01', '2.178000e+01', '2.179000e+01', '2.123000e+01',
> >> '2.201000e+01', '2.15e+01', '2.15e+01', '2.199000e+01']
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
On Friday, August 19, 2011, jmfauth wrote:
> On 19 août, 19:33, Matt Funk wrote:
> > The results obtained are:
> > results:
> > [(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')]
> > so this matches the last number plus the string at the end of the line,
> > but no retaining the previous numbers.
> >
> > Anyway, i think at this point i will go another route. Not sure where the
> > issues lies at this point.
>
> Seen on this list:
>
> And always keep this in mind:
> 'Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.'
> --Jamie Zawinski, comp.lang.emacs
>
>
> I proposed a solution which seems to corresponds to your problem
> if it were better formulated...
Agreed, and i will probably take your proposed route or a similar one.
However, i still won't know WHY it didn't work. I would really LIKE to know
why, simply because it tickles me.
matt
>
> jmf
--
http://mail.python.org/mailman/listinfo/python-list
Re: Help with regular expression in python
On Friday, August 19, 2011, Carl Banks wrote:
> On Friday, August 19, 2011 10:33:49 AM UTC-7, Matt Funk wrote:
> > number = r"\d\.\d+e\+\d+"
> > numbersequence = r"%s( %s){31}(.+)" % (number,number)
> > instance_linetype_pattern = re.compile(numbersequence)
> >
> > The results obtained are:
> > results:
> > [(' 2.199000e+01', ' : (instance: 0)\t:\tsome description')]
> > so this matches the last number plus the string at the end of the line,
> > but no retaining the previous numbers.
> >
> > Anyway, i think at this point i will go another route. Not sure where the
> > issues lies at this point.
>
> I think the problem is that repeat counts don't actually repeat the
> groupings; they just repeat the matchings. Take this expression:
>
> r"(\w+\s*){2}"
I see
>
> This will match exactly two words separated by whitespace. But the match
> result won't contain two groups; it'll only contain one group, and the
> value of that group will match only the very last thing repeated:
>
> Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
> [GCC 4.5.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>
> >>> import re
> >>> m = re.match(r"(\w+\s*){2}","abc def")
> >>> m.group(1)
>
> 'def'
>
> So you see, the regular expression is doing what you think it is, but the
> way it forms groups is not.
>
>
> Just a little advice (I know you've found a different method, and that's
> good, this is for the general reader).
>
> The functions re.findall and re.finditer could have helped here, they find
> all the matches in a string and let you iterate through them. (findall
> returns the strings matched, and finditer returns the sequence of match
> objects.) You could have done something like this:
I did use findall but when i tried to match the everything (including the 'some
description' part) it did not work. But i think the explanation you gave above
matches this case and explains why it did not.
>
> row = [ float(x) for x in re.findall(r'\d+\.\d+e\+d+',line) ]
>
> And regexp matching is often overkill for a particular problem; this may be
> of them. line.split() could have been sufficient:
>
> row = [ float(x) for x in line.split() ]
>
> Of course, these solutions don't account for the case where you have lines,
> some of which aren't 32 floating-point numbers. You need extra error
> handling for that, but you get the idea.
thanks
matt
>
>
> Carl Banks
--
http://mail.python.org/mailman/listinfo/python-list
Re: help with regex matching multiple %e
Thanks, works great. matt On 3/3/2011 10:53 AM, MRAB wrote: > On 03/03/2011 17:33, [email protected] wrote: >> Hi, >> >> i have a line that looks something like: >> 2.234e+04 3.456e+02 7.234e+07 1.543e+04: some description >> >> I would like to extract all the numbers. From the python website i >> got the >> following expression for matching what in c is %e (i.e. scientific >> format): >> (see http://docs.python.org/library/re.html) >> [-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)? >> And when i apply the pattern (using extra parenthesis around the whole >> expression) it does match the first number in the line. >> >> Is there any way to repeat this pattern to get me all the numbers in the >> line? >> I though the following might work, but i doesn't: >> ([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?){numToRepeat) >> > You're forgetting that the numbers are separated by a space. > >> Or will i have to split the line first, then iterate and the apply >> the match? >> >> Any help is greatly appreciated. >> > Use re.findall to find all the matches. -- http://mail.python.org/mailman/listinfo/python-list
question about endswith()
Hi, i have a list of files, some of which end with .hdf and one of them end with hdf5. I want to filter the hdf5 file. Thereforei set extensions: hdf5 I try to filter as below: if (any(filename.endswith(x) for x in extensions)): The problem is that i let's all files though rather than just the hdf5 file. Is there anything i am doing wrong? thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Re: question about endswith()
Hi Grant, first of all sorry for the many typos in my previous email. To clarify, I have a python list full of file names called 'files'. Every single filename has extension='.hdf' except for one file which has an '.hdf5' extension. When i do (and yes, this is pasted): for filename in files: if (any(filename.endswith(x) for x in extensions)): print filename However, it will print all the files in list 'files' (that is all files with file extension '.hdf'). My question is why it doesn't just print the filename with extensions '.hdf5'? thanks matt On 3/3/2011 4:50 PM, Grant Edwards wrote: > On 2011-03-03, Matt Funk wrote: > >> i have a list of files, some of which end with .hdf and one of them end >> with hdf5. I want to filter the hdf5 file. Thereforei set extensions: hdf5 >> I try to filter as below: >> if (any(filename.endswith(x) for x in extensions)): >> >> The problem is that i let's all files though rather than just the hdf5 >> file. Is there anything i am doing wrong? > Yes, you are doing something wrong. > > But, in order for somebody to tell you what you're doing wrong, you'll > have to post some actual, runnable code and tell us 1) what you > expect it to do, 2) what you see it do. > > IMPORTANT: Do _not_ retype code, input or output into your posting. >Cut/paste both code and input/output into your posting. > -- http://mail.python.org/mailman/listinfo/python-list
Re: question about endswith()
Hi,
thanks guys. This is it. The following code will match both hdf and hdf5
for reasons explained in the email from Ethan.
extensions = 'hdf5' #doesn't work
files =
('MOD03.A2010002.1810.005.2010258062733.hdf','MOD03.A2010002.1950.005.2010258063105.hdf','MOD03.A2010002.1950.005.2010258063105.hdf5')
for filename in files:
if (any(filename.endswith(x) for x in extensions)):
print filename
The following code will work:
extensions = ['hdf5'] #works
files =
('MOD03.A2010002.1810.005.2010258062733.hdf','MOD03.A2010002.1950.005.2010258063105.hdf','MOD03.A2010002.1950.005.2010258063105.hdf5')
for filename in files:
if (any(filename.endswith(x) for x in extensions)):
print filename
thanks for the help
matt
On 3/4/2011 2:26 AM, Tom Zych wrote:
> Ethan Furman wrote:
>> What is extensions? A string or a tuple? I'm guessing a string,
>> because then you're looking at:
>>
>> --> filename.endswith(x) for x in 'hdf5'
>>
>> which is the same as
>>
>> --> filename.endswith('h') or filename.endswith('d') or
>> filename.endswith('f') or filename.endswith('5')
>>
>> and then both .hdf and .hdf5 files will get matched.
> Score:5, Insightful.
>
> "Oh, I'm sorry, this is Python. Slashdot is room 12A, next door."
>
--
http://mail.python.org/mailman/listinfo/python-list
using python ftp
Hi, i was wondering whether someone can point me whether the following already exists. I want to connect to a server , download various files (for whose name i want to be able to use a wildcard), and store those files in a given location on the hard drive. If the file already exists i do not want to download it. This seems fairly trivial and i would assume that there should be some sort of implementation that does this easily but i didn't find anything googling it. Otherwise i was going to do it "by hand" using ftplib: 1) connect to server, 2) change to directory on server 3) get listing 4) match the file pattern i want to the listing 5) check if file already exists 6) download file if matched and doesn't exist Can anyone offer any advice whether this already done somewhere? thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Re: using python ftp
Hi,
thanks for the response. I kind of was thinking along those lines.
The thing is though is that 'grop' appears to work on the local
directory only (and not on the remote one which i need)
Anyway, i think i'll just do via iterating through a the remote
directory listing and then match it via regular pattern.
Not sure if this is the best/most elegant way. But it should work.
thanks
matt
On 12/23/2010 1:46 AM, Octavian Rasnita wrote:
> Can this lib also work with ftps?
>
> Thanks.
>
> Octavian
>
> - Original Message -
> From: "Anurag Chourasia"
> To: "Matt Funk"
> Cc:
> Sent: Thursday, December 23, 2010 4:12 AM
> Subject: Re: using python ftp
>
>
>> Hi Matt,
>>
>> I have a snippet to "upload" files (that match a particular search
>> pattern) to a remote server.
>>
>> Variable names are self explanatory. You could tweak this a little to
>> "download" files instead.
>>
>> from ftplib import FTP
>> ftp = FTP(hostname)
>> ftp.login(user_id,passwd)
>> ftp.cwd(remote_directory)
>> files_list=glob.glob(file_search_pattern)
>> for file in files_list:
>>try:
>>ftp.storlines('STOR ' + file, open(file))
>>except Exception, e:
>>print ('Failed to FTP file: %s' %(file))
>> ftp.close()
>>
>> Regards,
>> Anurag
>>
>> On Thu, Dec 23, 2010 at 5:33 AM, Matt Funk wrote:
>>> Hi,
>>>
>>> i was wondering whether someone can point me whether the following
>>> already exists.
>>>
>>> I want to connect to a server , download various files (for whose name i
>>> want to be able to use a wildcard), and store those files in a given
>>> location on the hard drive. If the file already exists i do not want to
>>> download it.
>>>
>>> This seems fairly trivial and i would assume that there should be some
>>> sort of implementation that does this easily but i didn't find anything
>>> googling it.
>>>
>>> Otherwise i was going to do it "by hand" using ftplib:
>>> 1) connect to server,
>>> 2) change to directory on server
>>> 3) get listing
>>> 4) match the file pattern i want to the listing
>>> 5) check if file already exists
>>> 6) download file if matched and doesn't exist
>>>
>>> Can anyone offer any advice whether this already done somewhere?
>>>
>>> thanks
>>> matt
>>> --
>>> http://mail.python.org/mailman/listinfo/python-list
>>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
--
http://mail.python.org/mailman/listinfo/python-list
numpy/matlab compatibility
Hi, i am fairly new to python. I was wondering of the following is do-able in python: 1) a = rand(10,1) 2) Y = a 3) mask = Y > 100; 4) Y(mask) = 100; 5) a = a+Y Basically i am getting stuck on line 4). I was wondering if it is possible or not with python? (The above is working matlab code) thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Re: numpy/matlab compatibility
Hi, thank you Andrea. That is exactly what i was looking for. Great. Andrea explained what the Matlab code does below. Sorry about the confusion. I was under the impression that numpy was leaning very heavily on Matlab for its syntax and thus i assumed that Matlab was mostly known for those using numpy. Andrea: you are right about the value 100. It should have been 0.5. The original code has a different vector which is tested against 100. I tried to simply reproduce the functionality with a random vector. Thus the confusion. Again, thanks for the input. matt On 1/25/2011 2:36 PM, Andrea Ambu wrote: > > > On Tue, Jan 25, 2011 at 9:13 PM, Matt Funk <mailto:[email protected]>> wrote: > > Hi, > > i am fairly new to python. I was wondering of the following is do-able > in python: > > 1) a = rand(10,1) > 2) Y = a > 3) mask = Y > 100; > 4) Y(mask) = 100; > 5) a = a+Y > > > No. Not like that. > > You do literally: > a = rand(10, 1) > Y = a > mask = Y>100 > Y = where(mask, 100, Y) > a = a+Y > > > More Pythonically: > a = rand(10, 1) > a = where(a > 100, a + 100, a + a) > > > For those who don't speak Matlab: > > 1) a = rand(10,1) ; generates a 10x1 matrix for random number 0 < n < 1 > 2) Y = a > 3) mask = Y > 100; similar to: mask = [i>100 for i in Y] > 4) Y(mask) = 100; sets to 100 elements of Y with index i for which > mask[i] = True > 5) a = a+Y ; sums the two matrices element by element (like you do in > linear algebra) > > > Anyway... rand generates number from 0 up to 1 (both in python and > matlab)... when are they > 100? > > > > Basically i am getting stuck on line 4). I was wondering if it is > possible or not with python? > (The above is working matlab code) > > thanks > matt > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list
python and parsing an xml file
Hi, I was wondering if someone had some advice: I want to create a set of xml input files to my code that look as follows: Alg1 ./Alg1.in c:\tmp 1 So there are comments, whitespace etc ... in it. I would like to be able to put everything into some sort of structure such that i can access it as: structure['Algorithm']['Type'] == Alg1 I was wondering if there is something out there that does this. I found and tried a few things: 1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/ It simply doesn't work. I get the following error: raise exception xml.sax._exceptions.SAXParseException: :1:2: not well-formed (invalid token) But i removed everything from the file except: and i still got the error. Anyway, i looked at ElementTree, but that error out with: xml.parsers.expat.ExpatError: junk after document element: line 19, column 0 Anyway, if anyone can give me advice of point me somewhere i'd greatly appreciate it. thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Re: python and parsing an xml file
Hi Terry, On 2/21/2011 11:22 AM, Terry Reedy wrote: > On 2/21/2011 12:30 PM, Matt Funk wrote: >> Hi, >> I was wondering if someone had some advice: >> I want to create a set of xml input files to my code that look as >> follows: > > Why? mmmh. not sure how to answer this question exactly. I guess it's a design decision. I am not saying that it is best one, but it seemed suitable to me. I am certainly open to suggestions. But here are some requirements: 1) My boss needs to be able to read the input and make sense out of it. XML seems fairly self explanatory, at least when you choose suitable names for the properties/tags etc ... 2) I want reproducability of a given run without changes to the code. I.e. all the inputs need to be stored external to the code such that the state of the run is captured from the input files entirely. > > > ... >> So there are comments, whitespace etc ... in it. >> I would like to be able to put everything into some sort of structure >> such that i can access it as: >> structure['Algorithm']['Type'] == Alg1 > > Unless I have a very good reason otherwise, I would just put > everything in nested dicts in a .py file to begin with. That is certainly a possibility and it should work. However, eventually the input files might be written by a webinterface (likely php). Even though what you are suggesting is still possible it just feels a little awkward (i.e. to use php to write a python file). > Or if I needed cross language portability, a JSON file, which is close > to the same thing. > again, i am certainly not infallible and open to suggestions of there is a better solution. thanks matt -- http://mail.python.org/mailman/listinfo/python-list
Re: python and parsing an xml file
HI Stefan, thank you for your advice. I am running into an issue though (which is likely a newbie problem): My xml file looks like (which i got from the internet): Gambardella, Matthew XML Developer's Guide Computer 44.95 2000-10-01 An in-depth look at creating applications with XML. Then i try to access it as: from lxml import etree from lxml import objectify inputfile="../input/books.xml" parser = etree.XMLParser(ns_clean=True) parser = etree.XMLParser(remove_comments=True) root = objectify.parse(inputfile,parser) I try to access elements by (for example): print root.catalog.book.author.text But it errors out with: AttributeError: 'lxml.etree._ElementTree' object has no attribute 'catalog' So i guess i don't understand why this is. Also, how can i print all the attributes of the root for examples or obtain keys? I really appreciate your help thanks matt On 2/21/2011 10:43 AM, Stefan Behnel wrote: > Matt Funk, 21.02.2011 18:30: >> I want to create a set of xml input files to my code that look as >> follows: >> >> >> >> >> >> >> >> >> >> Alg1 >> >> >> ./Alg1.in >> >> >> >> >> >> >> >> >> c:\tmp >> >> >> >> >> >> 1 >> >> > > That's not XML. XML documents have exactly one root element, i.e. you > need an enclosing element around these two tags. > > >> So there are comments, whitespace etc ... in it. >> I would like to be able to put everything into some sort of structure > > Including the comments or without them? Note that ElementTree will > ignore comments. > > >> such that i can access it as: >> structure['Algorithm']['Type'] == Alg1 > > Have a look at lxml.objectify. It allows you to write > > alg_type = root.Algorithm.Type.text > > and a couple of other niceties. > > http://lxml.de/objectify.html > > >> I was wondering if there is something out there that does this. >> I found and tried a few things: >> 1) >> http://code.activestate.com/recipes/534109-xml-to-python-data-structure/ >> It simply doesn't work. I get the following error: >> raise exception >> xml.sax._exceptions.SAXParseException::1:2: not well-formed >> (invalid token) > > "not well formed" == "not XML". > > >> But i removed everything from the file except:> encoding="UTF-8"?> >> and i still got the error. > > That's not XML, either. > > >> Anyway, i looked at ElementTree, but that error out with: >> xml.parsers.expat.ExpatError: junk after document element: line 19, >> column 0 > > In any case, ElementTree is preferable over a SAX based solution, both > for performance and maintainability reasons. > > Stefan > -- http://mail.python.org/mailman/listinfo/python-list
Re: python and parsing an xml file
Hi Stefan, i don't mean to be annoying so sorry if i am. According to your instructions i do: parser = objectify.makeparser(ns_clean=True, remove_comments=True) root = objectify.parse(inputfile,parser).getroot() print root.catalog.book.author.text which still gives the following error: AttributeError: no such child: catalog matt On 2/21/2011 3:28 PM, Stefan Behnel wrote: > Matt Funk, 21.02.2011 23:07: >> thank you for your advice. >> I am running into an issue though (which is likely a newbie problem): >> >> My xml file looks like (which i got from the internet): >> >> >> >>Gambardella, Matthew >>XML Developer's Guide >>Computer >>44.95 >>2000-10-01 >>An in-depth look at creating applications >>with XML. >> >> >> >> Then i try to access it as: >> >> from lxml import etree >> from lxml import objectify >> >> inputfile="../input/books.xml" >> parser = etree.XMLParser(ns_clean=True) >> parser = etree.XMLParser(remove_comments=True) > > Change that to > > parser = objectify.makeparser(ns_clean=True, remove_comments=True) > >> root = objectify.parse(inputfile,parser) > > Change that to > > root = objectify.parse(inputfile,parser).getroot() > > Stefan > -- http://mail.python.org/mailman/listinfo/python-list
Re: python and parsing an xml file
Hi, first of all thanks everyone for the (at least to me) valuable discussion about xml and its usage domain. Also thanks for all the hints and suggestions. In terms of my problems, from what i can tell right now the ConfigObj4 (see: http://www.voidspace.org.uk/python/configobj.html#reading-a-config-file) will suit my needs. Again, thanks for the time and effort you put in to answer my questions (and, in Stefan's case for writing tools and making them available to everyone) and pointing me in the better direction. matt On 2/22/2011 4:01 AM, Ian wrote: > On 21/02/2011 22:08, Matt Funk wrote: >>> Why? >> mmmh. not sure how to answer this question exactly. I guess it's a >> design decision. I am not saying that it is best one, but it seemed >> suitable to me. I am certainly open to suggestions. But here are some >> requirements: >> 1) My boss needs to be able to read the input and make sense out of it. >> XML seems fairly self explanatory, at least when you choose suitable >> names for the properties/tags etc ... >> 2) I want reproducability of a given run without changes to the code. >> I.e. all the inputs need to be stored external to the code such that the >> state of the run is captured from the input files entirely. >> >> > Hi Mark, > > Having tried XML for something similar, I would strongly advise > against it. It has been nothing but a nightmare. > > XML is acceptable for machine to machine communication where the two > sides cannot agree a common > language in advance or they can't coordinate format changes. Even then > it is slow and verbose. > > Use the config module if the configuration is simple to moderately > complex. > > Consider JSON or Python (source) if your requirements are really > complicated. > > Regards > > Ian > > > > > -- http://mail.python.org/mailman/listinfo/python-list
Re: python and parsing an xml file
One thing i forgot, in case anyone is at this point: the reason i chose ConfigObj over ConfigParser is that it allows subsections. matt On 2/22/2011 4:01 AM, Ian wrote: > On 21/02/2011 22:08, Matt Funk wrote: >>> Why? >> mmmh. not sure how to answer this question exactly. I guess it's a >> design decision. I am not saying that it is best one, but it seemed >> suitable to me. I am certainly open to suggestions. But here are some >> requirements: >> 1) My boss needs to be able to read the input and make sense out of it. >> XML seems fairly self explanatory, at least when you choose suitable >> names for the properties/tags etc ... >> 2) I want reproducability of a given run without changes to the code. >> I.e. all the inputs need to be stored external to the code such that the >> state of the run is captured from the input files entirely. >> >> > Hi Mark, > > Having tried XML for something similar, I would strongly advise > against it. It has been nothing but a nightmare. > > XML is acceptable for machine to machine communication where the two > sides cannot agree a common > language in advance or they can't coordinate format changes. Even then > it is slow and verbose. > > Use the config module if the configuration is simple to moderately > complex. > > Consider JSON or Python (source) if your requirements are really > complicated. > > Regards > > Ian > > > > > -- http://mail.python.org/mailman/listinfo/python-list
