Is this a general question or specific? You can handle specific ones by using synonyms.
But the general case, that is treating any two pairs of tokens as a single pair seems fraught with unintended consequences, but you know your problem space better than I do. Best Erick On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach <chrisfauerb...@gmail.com>wrote: > Good afternoon everyone! > I am stumped, and I would love some help. I'm new to solr/lucene, > but I have thrown myself into it, so I think I have a solid > understanding. Using the analysis tool in the admin interface, I see > these words stemmed and processed as I assume they would be, so I'm > stuck. > > In my index, I have two documents, each with a text field, and here > are example values > > 1) microsoft.com > 2) micro soft > > I want to do a search using microsoft or "micro soft" and find both. > I'm using the dismax interface, the fields are properly listed in the > config, and I can find both records, but never at the same time. > Here's my schema.xml for my text field, any thoughts on what I can do > to find these together? > > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" > preserveOriginal="1"/> > <filter class="solr.SynonymFilterFactory" > synonyms="syn/index_synonyms.txt" ignoreCase="true" expand="true"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="front"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="back"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="front"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" > maxGramSize="15" side="back"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1" > preserveOriginal="1"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > > </analyzer> > </fieldType> >