We are moving from Lucene indexer and readers to a Hybrid solution where we still use the Lucene Indexer but use Solr for querying the index.
I am indexing our content using Lucene 2.3 and have a field called "contents" which is tokenized and stored. When I search the contents field for "U2" using Luke the correct document turns up. However, searching for U2 in solr returns nothing. Any ideas? Lucene field is as follows: doc.add(new Field("contents", body.replaceAll(" ", ""), Field.Store.YES, Field.Index.TOKENIZED)); Lucene is using the following analyzer: result = new ISOLatin1AccentFilter(result); result = new StandardFilter(result); result = new LowerCaseFilter(result); result = new StopFilter(result,StandardAnalyzer.STOP_WORDS);//, stopTable); result = new PorterStemFilter(result); In Solr I have the field mapped as a text field stored and indexed. The text field uses <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ISOLatin1AccentFilterFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Regards, John
*********************************************************** The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee. Access to this e-mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution, or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Please note that emails to, from and within RTÉ may be subject to the Freedom of Information Act 1997 and may be liable to disclosure. ************************************************************