Re: Behavior of re.split on empty strings is unexpected

2010-08-05 Thread jhermann
On Aug 2, 7:34 pm, John Nagle wrote: >  >>> s2 = "   HELLO   THERE  " >  >>> kresplit4 = re.compile(r'\W+', re.UNICODE) >  >>> kresplit4.split(s2) > ['', 'HELLO', 'THERE', ''] > > I still get empty strings. >>> re.findall(r"\w+", " a b c ") ['a', 'b', 'c'] -- http://mail.python.org/mailman/list

Re: Behavior of re.split on empty strings is unexpected

2010-08-03 Thread John Nagle
On 8/2/2010 5:53 PM, samwyse wrote: On Aug 2, 12:34 pm, John Nagle wrote: The regular expression "split" behaves slightly differently than string split: I'm going to argue that it's the string split that's behaving oddly. I tend to agree. It doesn't seem to be possible to get the sam

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread rantingrick
On Aug 2, 7:53 pm, samwyse wrote: > It's the same results; however many people don't like these results > because they feel that whitespace occupies a privileged role.  People > generally agree that a string of consecutive commas means missing > values, but a string of consecutive spaces just mea

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread samwyse
On Aug 2, 12:34 pm, John Nagle wrote: > The regular expression "split" behaves slightly differently than string > split: I'm going to argue that it's the string split that's behaving oddly. To see why, let's first look at some simple CSV values: cat,dog ,missing,,values, How many fields are on e

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread Thomas Jollans
On 08/02/2010 11:22 PM, John Nagle wrote: >> [ s in rexp.split(long_s) if s ] > >Of course I can discard the blank strings afterward, but > is there some way to do it in the "split" operation? If > not, then the default case for "split()" is too non-standard. > >(Also, "if s" won't work;

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread Benjamin Kaplan
On Mon, Aug 2, 2010 at 2:22 PM, John Nagle wrote: > On 8/2/2010 12:52 PM, Thomas Jollans wrote: > >> On 08/02/2010 09:41 PM, John Nagle wrote: >> >>> On 8/2/2010 11:02 AM, MRAB wrote: >>> John Nagle wrote: > The regular expression "split" behaves slightly differently than > stri

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread John Nagle
On 8/2/2010 12:52 PM, Thomas Jollans wrote: On 08/02/2010 09:41 PM, John Nagle wrote: On 8/2/2010 11:02 AM, MRAB wrote: John Nagle wrote: The regular expression "split" behaves slightly differently than string split: occurrences of pattern", which is not too helpful. It's the plain str.spl

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread Thomas Jollans
On 08/02/2010 09:41 PM, John Nagle wrote: > On 8/2/2010 11:02 AM, MRAB wrote: >> John Nagle wrote: >>> The regular expression "split" behaves slightly differently than >>> string split: > occurrences of pattern", which is not too helpful. >>> >> It's the plain str.split() which is unusual in that:

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread John Nagle
On 8/2/2010 11:02 AM, MRAB wrote: John Nagle wrote: The regular expression "split" behaves slightly differently than string split: occurrences of pattern", which is not too helpful. It's the plain str.split() which is unusual in that: 1. it splits on sequences of whitespace instead of one p

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread Peter Otten
John Nagle wrote: > The regular string split operation doesn't yield empty strings: > > >>> " HELLO THERE ".split() > ['HELLO', 'THERE'] Note that invocation without separator argument (or None as the separator) is special in that respect: >>> " hello there ".split(" ") ['', 'hello', 'ther

Re: Behavior of re.split on empty strings is unexpected

2010-08-02 Thread MRAB
John Nagle wrote: The regular expression "split" behaves slightly differently than string split: >>> import re >>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE) >>> kresplit2.split(" HELLOTHERE ") ['', 'HELLO', 'THERE', ''] >>> kresplit2.split("VERISIGN INC.") ['VERISIGN', 'IN

Behavior of re.split on empty strings is unexpected

2010-08-02 Thread John Nagle
The regular expression "split" behaves slightly differently than string split: >>> import re >>> kresplit = re.compile(r'[^\w\&]+',re.UNICODE) >>> kresplit2.split(" HELLOTHERE ") ['', 'HELLO', 'THERE', ''] >>> kresplit2.split("VERISIGN INC.") ['VERISIGN', 'INC', ''] I'd thought that