Another thing you might try is set preserverOriginal=1 (just saw this in another thread). Which one is "better" usually depends on your problem space...
Best Erick On Mon, Aug 23, 2010 at 9:16 AM, Scottie <scot...@live.com> wrote: > > Nikolas, thanks a lot for that, I've just gave it a quick test and it > definitely seems to work for the examples I've gave. > > Thanks again, > > Scott > > > From: Nikolas Tautenhahn [via Lucene] > Sent: Monday, August 23, 2010 3:14 PM > To: Scottie > Subject: Re: Tokenising on Each Letter > > > Hi Scottie, > > > Could you elaborate about N gram for me, based on my schema? > > just a quick reply: > > > > <fieldType name="textNGram" class="solr.TextField" > positionIncrementGap="100"> > > <analyzer type="index"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <!-- in this example, we will only use synonyms at query time > > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> > > > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="0" catenateWords="1" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" > splitOnNumerics="0" preserveOriginal="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.EdgeNGramFilterFactory" side="front" minGramSize="2" > maxGramSize="30" /> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > <analyzer type="query"> > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="0" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" > splitOnNumerics="0" preserveOriginal="1"/> > > <filter class="solr.LowerCaseFilterFactory"/> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > </analyzer> > > </fieldType> > > Will produce any NGrams from 2 up to 30 Characters, for Info check > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory > > Be sure to adjust those sizes (minGramSize/maxGramSize) so that > maxGramSize is big enough to keep the whole original serial number/model > number and minGramSize is not so small that you fill your index with > useless information. > > Best regards, > Nikolas Tautenhahn > > > > > > > -------------------------------------------------------------------------------- > > View message @ > http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1292238.html > To unsubscribe from Tokenising on Each Letter, click here. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Tokenising-on-Each-Letter-tp1247113p1294586.html > Sent from the Solr - User mailing list archive at Nabble.com. >