Regular expression that skips single line comments?

2009-01-19 Thread martinjamesevans
I am trying to parse a set of files that have a simple syntax using
RE. I'm interested in counting '$' expansions in the files, with one
minor consideration. A line becomes a comment if the first non-white
space character is a semicolon.

e.g.  tests 1 and 2 should be ignored

sInput = """
; $1 test1
; test2 $2
test3 ; $3 $3 $3
test4
$5 test5
   $6
  test7 $7 test7
"""

Required output:['$3', '$3', '$3', '$5', '$6', '$7']


The following RE works fine but does not deal with the commented
lines:

re.findall(r"(\$.)", sInput, re.I)

e.g. ['$1', '$2', '$3', '$3', '$3', '$5', '$6', '$7']


My attempts at trying to use (?!;) type expressions keep failing.

I'm not convinced this is suitable for a single expression, so I have
also attempted to first find-replace any commented lines out without
much luck.

e.g. re.sub(r"^[\t ]*?;.*?$", r"", sInput, re.I+re.M)


Any suggestions would be appreciated. Thanks

Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: Regular expression that skips single line comments?

2009-01-19 Thread martinjamesevans

Firstly, a huge thanks to all for the solutions!  Just what I was
looking for.



> (Aside: why are you doing a case-insensitive match for a non-letter? Are
> there different upper- and lower-case dollar signs?)

As you can probably imagine, I had simplified the problem slightly,
the language uses a couple of different introducers and also uses both
numbers and letters (but only single characters).

I was going to go with a similar idea of parsing per line but was
trying to give RE another chance. I've used RE often in the past but
for some reason this one had got under my skin.

I found this to be quite an interesting little tool:
http://www.gskinner.com/RegExr/

Martin

--
http://mail.python.org/mailman/listinfo/python-list


EBCDIC <--> ASCII

2008-12-04 Thread martinjamesevans
I'm having a problem trying to use the codecs package to aid me in
converting some bytes from EBCDIC into ASCII.

I have some 8bit text that is in mixed format. I extract the bytes
that are coded for EBCDIC and would like to display them correctly.
The bytes that are EBCDIC could values 0-255, I'm only really
interested in the printable portions and could say leave the rest as
dots.

I've tried starting with something like this, but I assume it is
expecting the source to be in unicode already?

e.g. (pretend the second half are EBCDIC characters)

sAll = "This bit is ASCII, "
sSource = sAll[19:]

sEBCDIC = unicode(sSource, 'cp500', 'ignore')
sASCII = sEBCDIC.encode('ascii')

Obviously I could just knock up a 255 character lookup table and do it
myself, I was just trying to be a little more Pythonic and use that
built in table.

Thanks,

Martin
--
http://mail.python.org/mailman/listinfo/python-list


Re: EBCDIC <--> ASCII

2008-12-08 Thread martinjamesevans
On Dec 5, 2:13 pm, Michael Ströder <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > On Dec 4, 4:45 pm, Michael Ströder <[EMAIL PROTECTED]> wrote:
> >> [EMAIL PROTECTED] wrote:
> >>> I'm having a problem trying to use the codecs package to aid me in
> >>> converting some bytes from EBCDIC into ASCII.
> >> Which EBCDIC variant?
>
> >>> sEBCDIC = unicode(sSource, 'cp500', 'ignore')
> >> Are you sure CP500 is the EBCDIC variant for the language you want?
>
> >>http://www.ietf.org/rfc/rfc1345.txtlistsit as:
>
> >>    &charset IBM500
> >>    &rem source: IBM NLS RM Vol2 SE09-8002-01, March 1990
> >>    &alias CP500
> >>    &alias ebcdic-cp-be
> >>    &alias ebcdic-cp-ch
>
> >>> Obviously I could just knock up a 255 character lookup table and do it
> >>> myself, I was just trying to be a little more Pythonic and use that
> >>> built in table.
> >> It's pythonic to implement a Unicode codec for unknown character tables.
> >> I've put these two on my web site:
>
> >>http://www.stroeder.com/pylib/encodings/ebcdicatde.pyhttp://www.stroe...ebcdicatde)
>
> > Thanks for the tables, ebcdicatde.py does look more suitable.
>
> > My problem appears to be that my source is a byte string. In a
> > nutshell I need "\x81\x82\x83\xf1\xf2\xf3" to become "abc123" in a
> > byte string.
>
> Python 2.5.2 (r252:60911, Aug  1 2008, 00:43:38)
>  >>> import ebcdicatde
>  >>> "\x81\x82\x83\xf1\xf2\xf3".decode('ebcdic-at-de').encode('ascii')
> 'abc123'
>  >>>
>
> Ciao, Michael.- Hide quoted text -
>
> - Show quoted text -


Many thanks for all your posts!
Just what I needed.
--
http://mail.python.org/mailman/listinfo/python-list