This pattern split tokens *only* in the presence of parentheses with adjoining whitespace, and includes the parentheses with the tokens:
(?<=\))\s+|\s+(?=\() So you'll get this kind of behavior: Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast) to Tottenham Hotspur (London) F.C. Internationale (milan) FC Midtjylland (Herning) (Ikast) Steve > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, April 15, 2011 1:51 PM > To: solr-user@lucene.apache.org > Subject: Re: Split token > > What you've shown would be handled with WhitespaceTokenizer, but you'd > have > to > prevent filters from stripping the parens. If you have to handle things > like > blah ( stuff ) > WhitespaceTokenizer wouldn't work. > > PatternTokenizerFactory might work for you, see: > http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokeniz > erFactory.html > > Best > Erick > > On Tue, Apr 12, 2011 at 6:02 AM, roySolr <royrutten1...@gmail.com> wrote: > > > Hello, > > > > I want to split my string when it contains "(". Example: > > > > spurs (London) > > Internationale (milan) > > > > to > > > > spurs > > (london) > > Internationale > > (milan) > > > > What tokenizer can i use to fix this problem? > > > > -- > > View this message in context: > > http://lucene.472066.n3.nabble.com/Split-token-tp2810772p2810772.html > > Sent from the Solr - User mailing list archive at Nabble.com. > >