I have made progress on this by writing my own Analyzer. I basically added the TokenFilters that are under each of the solr factory classes. I had to copy and paste the WordDelimiterFilter because, of course, it was package protected.
On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch <ihas...@gmail.com> wrote: > Hi, > I asked this question a month ago on lucene-user and was referred here. > > I have content being analyzed in Solr using these tokenizers and filters: > > <fieldType name="text_standard" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="0" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="English" > protected="protwords.txt"/> > </analyzer> > </fieldType> > > Basically I want to be able to search against this index in Lucene with one > of my background searching applications. > > My main reason for using Lucene over Solr for this is that I use the > highlighter to keep track of exactly which terms were found which I use for > my own scoring system and I always collect the whole set of found > documents. I've messed around with using Boosts but it wasn't fine grained > enough and I wasn't able to effectively create a score threshold (would > creating my own scorer be a better idea?) > > Is it possible to use this analyzer from Lucene, or at least re-create it > in code? > > Thanks. > >