Regular expression that skips single line comments?
I am trying to parse a set of files that have a simple syntax using RE. I'm interested in counting '$' expansions in the files, with one minor consideration. A line becomes a comment if the first non-white space character is a semicolon. e.g. tests 1 and 2 should be ignored sInput = """ ; $1 test1 ; test2 $2 test3 ; $3 $3 $3 test4 $5 test5 $6 test7 $7 test7 """ Required output:['$3', '$3', '$3', '$5', '$6', '$7'] The following RE works fine but does not deal with the commented lines: re.findall(r"(\$.)", sInput, re.I) e.g. ['$1', '$2', '$3', '$3', '$3', '$5', '$6', '$7'] My attempts at trying to use (?!;) type expressions keep failing. I'm not convinced this is suitable for a single expression, so I have also attempted to first find-replace any commented lines out without much luck. e.g. re.sub(r"^[\t ]*?;.*?$", r"", sInput, re.I+re.M) Any suggestions would be appreciated. Thanks Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Regular expression that skips single line comments?
Firstly, a huge thanks to all for the solutions! Just what I was looking for. > (Aside: why are you doing a case-insensitive match for a non-letter? Are > there different upper- and lower-case dollar signs?) As you can probably imagine, I had simplified the problem slightly, the language uses a couple of different introducers and also uses both numbers and letters (but only single characters). I was going to go with a similar idea of parsing per line but was trying to give RE another chance. I've used RE often in the past but for some reason this one had got under my skin. I found this to be quite an interesting little tool: http://www.gskinner.com/RegExr/ Martin -- http://mail.python.org/mailman/listinfo/python-list
EBCDIC <--> ASCII
I'm having a problem trying to use the codecs package to aid me in
converting some bytes from EBCDIC into ASCII.
I have some 8bit text that is in mixed format. I extract the bytes
that are coded for EBCDIC and would like to display them correctly.
The bytes that are EBCDIC could values 0-255, I'm only really
interested in the printable portions and could say leave the rest as
dots.
I've tried starting with something like this, but I assume it is
expecting the source to be in unicode already?
e.g. (pretend the second half are EBCDIC characters)
sAll = "This bit is ASCII, "
sSource = sAll[19:]
sEBCDIC = unicode(sSource, 'cp500', 'ignore')
sASCII = sEBCDIC.encode('ascii')
Obviously I could just knock up a 255 character lookup table and do it
myself, I was just trying to be a little more Pythonic and use that
built in table.
Thanks,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
Re: EBCDIC <--> ASCII
On Dec 5, 2:13 pm, Michael Ströder <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > On Dec 4, 4:45 pm, Michael Ströder <[EMAIL PROTECTED]> wrote:
> >> [EMAIL PROTECTED] wrote:
> >>> I'm having a problem trying to use the codecs package to aid me in
> >>> converting some bytes from EBCDIC into ASCII.
> >> Which EBCDIC variant?
>
> >>> sEBCDIC = unicode(sSource, 'cp500', 'ignore')
> >> Are you sure CP500 is the EBCDIC variant for the language you want?
>
> >>http://www.ietf.org/rfc/rfc1345.txtlistsit as:
>
> >> &charset IBM500
> >> &rem source: IBM NLS RM Vol2 SE09-8002-01, March 1990
> >> &alias CP500
> >> &alias ebcdic-cp-be
> >> &alias ebcdic-cp-ch
>
> >>> Obviously I could just knock up a 255 character lookup table and do it
> >>> myself, I was just trying to be a little more Pythonic and use that
> >>> built in table.
> >> It's pythonic to implement a Unicode codec for unknown character tables.
> >> I've put these two on my web site:
>
> >>http://www.stroeder.com/pylib/encodings/ebcdicatde.pyhttp://www.stroe...ebcdicatde)
>
> > Thanks for the tables, ebcdicatde.py does look more suitable.
>
> > My problem appears to be that my source is a byte string. In a
> > nutshell I need "\x81\x82\x83\xf1\xf2\xf3" to become "abc123" in a
> > byte string.
>
> Python 2.5.2 (r252:60911, Aug 1 2008, 00:43:38)
> >>> import ebcdicatde
> >>> "\x81\x82\x83\xf1\xf2\xf3".decode('ebcdic-at-de').encode('ascii')
> 'abc123'
> >>>
>
> Ciao, Michael.- Hide quoted text -
>
> - Show quoted text -
Many thanks for all your posts!
Just what I needed.
--
http://mail.python.org/mailman/listinfo/python-list
