I guess I missed the init() method. I was looking at the factory and thought I saw config loading stuff (like getInt) which I assumed meant it need to have schema.xml available.
Thanks! -Max On Tue, Oct 5, 2010 at 2:36 PM, Mathias Walter <mathias.wal...@gmx.net>wrote: > Hi Max, > > why don't you use WordDelimiterFilterFactory directly? I'm doing the same > stuff inside my own analyzer: > > final Map<String, String> args = new HashMap<String, String>(); > > args.put("generateWordParts", "1"); > args.put("generateNumberParts", "1"); > args.put("catenateWords", "0"); > args.put("catenateNumbers", "0"); > args.put("catenateAll", "0"); > args.put("splitOnCaseChange", "1"); > args.put("splitOnNumerics", "1"); > args.put("preserveOriginal", "1"); > args.put("stemEnglishPossessive", "0"); > args.put("language", "English"); > > wordDelimiter = new WordDelimiterFilterFactory(); > wordDelimiter.init(args); > stream = wordDelimiter.create(stream); > > -- > Kind regards, > Mathias > > > -----Original Message----- > > From: Max Lynch [mailto:ihas...@gmail.com] > > Sent: Tuesday, October 05, 2010 1:03 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Using Solr Analyzers in Lucene > > > > I have made progress on this by writing my own Analyzer. I basically > added > > the TokenFilters that are under each of the solr factory classes. I had > to > > copy and paste the WordDelimiterFilter because, of course, it was package > > protected. > > > > > > > > On Mon, Oct 4, 2010 at 3:05 PM, Max Lynch <ihas...@gmail.com> wrote: > > > > > Hi, > > > I asked this question a month ago on lucene-user and was referred here. > > > > > > I have content being analyzed in Solr using these tokenizers and > filters: > > > > > > <fieldType name="text_standard" class="solr.TextField" > > > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > > > > <filter class="solr.WordDelimiterFilterFactory" > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.SnowballPorterFilterFactory" > language="English" > > > protected="protwords.txt"/> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.WordDelimiterFilterFactory" > > > generateWordParts="0" generateNumberParts="1" catenateWords="1" > > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.SnowballPorterFilterFactory" > language="English" > > > protected="protwords.txt"/> > > > </analyzer> > > > </fieldType> > > > > > > Basically I want to be able to search against this index in Lucene with > one > > > of my background searching applications. > > > > > > My main reason for using Lucene over Solr for this is that I use the > > > highlighter to keep track of exactly which terms were found which I use > for > > > my own scoring system and I always collect the whole set of found > > > documents. I've messed around with using Boosts but it wasn't fine > grained > > > enough and I wasn't able to effectively create a score threshold (would > > > creating my own scorer be a better idea?) > > > > > > Is it possible to use this analyzer from Lucene, or at least re-create > it > > > in code? > > > > > > Thanks. > > > > > > > >