Hi Yetkin,

You are on the right track by examining analysis page. How is your query 
analyzed using query analyzer?

According to what you pasted q=CRD should return your example document.

Did you change something in schema.xml and forget to re-start solr and  
re-index?

By the way simple letter tokenizer based lowercase tokenizer seems a better fit 
to your use-case. With this you dont have deal with WDF's parameters.

https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-LowerCaseTokenizer

Ahmet





On Thursday, May 1, 2014 5:04 PM, Yetkin Ozkucur <yetkin.ozku...@asg.com> wrote:
Hello everyone,

I am new to SOLR and this is my first post in this list. 
I have been working on this problem for a couple of days. I tried everything 
which I found in google but it looks like I am missing something.

Here is my problem:
I have a field called: DBASE_LOCAT_NM_TEXT
It contains values like: CRD_PROD
The goal is to be able to search this field either by putting the exact string 
"CRD_PROD" or part of it (tokenized by "_")  like "CRD" or "PROD"

Currently: 
This query returns results: q=DBASE_LOCAT_NM_TEXT:CRD_PROD
But this does not: q=DBASE_LOCAT_NM_TEXT:CRD
I want to understand why the second query does not return any results

Here is how I configured the field:
<field name="DBASE_LOCAT_NM_TEXT" type="text_general" indexed="true" 
stored="true" required="false" multiValued="false"/>

And Here is how I configured the field type :
    <fieldType name="text_general" class="solr.TextField" 
positionIncrementGap="100">
      <analyzer type="index">
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" 
generateWordParts="1" generateNumberParts="1" catenateWords="1" 
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"  ignoreCase="true" 
words="stopwords.txt"/>
         <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" 
generateWordParts="1" generateNumberParts="1" catenateWords="0" 
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt"/>

        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

      </analyzer>
    </fieldType>

I am also using the analysis panel in the SOLR admin console. It shows this:
WT    CRD_PROD

WDF    CRD_PROD
    CRD
    PROD
    CRDPROD

SF    CRD_PROD
    CRD
    PROD
    CRDPROD

LCF    crd_prod
    crd
    prod
    crdprod

SKMF    crd_prod
    crd
    prod
    crdprod

RDTF    crd_prod
    crd
    prod
    crdprod


I am not sure if it is related or not but this index was created using a Java 
program using Lucene interface. It used StandardAnalyzer for writing and the 
field was configured as tokenized, indexed and stored.  Does this affect the 
SOLR configuration?
    
Can you please help me understand what I am missing and how I can debug it?

Thanks,
Yetkin

Reply via email to