We are moving from Lucene indexer and readers to a Hybrid solution where
we still use the Lucene Indexer but use Solr for querying the index.

I am indexing our content using Lucene 2.3 and have a field called
"contents" which is tokenized and stored. When I search the contents
field for "U2" using Luke the correct document turns up. However,
searching for U2 in solr returns nothing. 

Any ideas?

Lucene field is as follows:

doc.add(new Field("contents", body.replaceAll("  ", ""),
Field.Store.YES, Field.Index.TOKENIZED));

Lucene is using the following analyzer:

result = new ISOLatin1AccentFilter(result);
        result = new StandardFilter(result);
        result = new LowerCaseFilter(result);
        result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//,
stopTable);
        result = new PorterStemFilter(result);


In Solr I have the field mapped as a text field stored and indexed. The
text field uses

<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.ISOLatin1AccentFilterFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


Regards,
John




***********************************************************
The information in this e-mail is confidential and may be legally privileged.
It is intended solely for the addressee. Access to this e-mail by anyone else
is unauthorised. If you are not the intended recipient, any disclosure,
copying, distribution, or any action taken or omitted to be taken in reliance
on it, is prohibited and may be unlawful.
Please note that emails to, from and within RTÉ may be subject to the Freedom
of Information Act 1997 and may be liable to disclosure.
************************************************************

Reply via email to