Is this a general question or specific? You can handle specific ones by
using synonyms.

But the general case, that is treating any two pairs of tokens as
a single pair seems fraught with unintended consequences, but
you know your problem space better than I do.

Best
Erick

On Sat, Apr 2, 2011 at 2:21 PM, Chris Fauerbach <chrisfauerb...@gmail.com>wrote:

> Good afternoon everyone!
> I am stumped, and I would love some help.    I'm new to solr/lucene,
> but I have thrown myself into it, so I think I have a solid
> understanding.   Using the analysis tool in the admin interface, I see
> these words stemmed and processed as I assume they would be, so I'm
> stuck.
>
> In my index, I have two documents, each with a text field, and here
> are example values
>
> 1) microsoft.com
> 2) micro soft
>
> I want to do a search using microsoft or "micro soft" and find both.
> I'm using the dismax interface, the fields are properly listed in the
> config, and I can find both records, but never at the same time.
> Here's my schema.xml for my text field, any thoughts on what I can do
> to find these together?
>
>
>    <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>                <filter class="solr.SynonymFilterFactory"
> synonyms="syn/index_synonyms.txt" ignoreCase="true" expand="true"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="front"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="back"/>
>        <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>                <filter class="solr.LowerCaseFilterFactory"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="front"/>
>                <filter class="solr.EdgeNGramFilterFactory" minGramSize="2"
> maxGramSize="15" side="back"/>
>        <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>        <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>
>      </analyzer>
>    </fieldType>
>

Reply via email to