subject:"Re\: Basic sentence parsing with the regex highlighter fragmenter"

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-07 Thread Otis Gospodnetic

Regular expressions won't work well for sentence boundary detection. If you want something free, you could plug in OpenNLP or GATE. Or LingPipe, but that's not free. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch - Original Message > From: Caleb Land > To: solr-u

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-07 Thread Caleb Land

On Wed, Jan 6, 2010 at 4:30 PM, Erick Erickson wrote: > Hmmm, I'll have to defer to the highlighter experts here > > I've looked at the source code for the highlighter, and I think I know what's going on. I haven't had time to play with this yet, so I could be wrong, but this is my impression.

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, I'll have to defer to the highlighter experts here Erick On Wed, Jan 6, 2010 at 3:23 PM, Caleb Land wrote: > I've looked at the docs/source for WordDelimiterFilter, and I understand > what it does now. > > Here is my configuration: > > http://gist.github.com/270590 > > I've tried the

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Caleb Land

I've looked at the docs/source for WordDelimiterFilter, and I understand what it does now. Here is my configuration: http://gist.github.com/270590 I've tried the StandardTokenizerFactory instead of the WhitespaceTokenizerFactory, but I get the same problem as before, a the period from the previo

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-06 Thread Erick Erickson

Hmmm, the name WordDelimiterFilterFactory might be leading you astray. Its purpose isn't to break things up into "words" that have anything to do with grammatical rules. Rather, it's purpose is to break up strings of funky characters into searchable stuff. see: http://wiki.apache.org/solr/Analyzers

Re: Basic sentence parsing with the regex highlighter fragmenter

2010-01-05 Thread Caleb Land

I've tracked this problem down to the fact that I'm using the WordDelimiterFilter. I don't quite understand what's happening, but if I add preserveOriginal="1" as an option, everything looks fine. I think it has to do with the period being stripped in the token stream. On Tue, Jan 5, 2010 at 2:05

Re: Basic sentence parsing with the regex highlighter fragmenter

Re: Basic sentence parsing with the regex highlighter fragmenter

Re: Basic sentence parsing with the regex highlighter fragmenter

Re: Basic sentence parsing with the regex highlighter fragmenter

Re: Basic sentence parsing with the regex highlighter fragmenter

Re: Basic sentence parsing with the regex highlighter fragmenter

6 matches

Site Navigation

Mail list logo

Footer information