Re: Strange behavior on text field with number-text content

Erick Erickson Wed, 29 May 2013 04:13:01 -0700

Hmmm, there are two things you _must_ get familiar with when diagnosing
these <G>..


1> admin/analysis. That'll show you exactly what the analysis chain does,
and it's
     not always obvious.
2> add &debug=query to your input and look at the parsed query results. For
instance,
     this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc.

That doesn't explain why name=4nSolutions, except......

your index chain has splitOnCaseChange=1 and your query bit has
splitOnCaseChange=0
which doesn't seem right....

Best
Erick


On Tue, May 28, 2013 at 10:31 AM, Алексей Цой <alexey...@gmail.com> wrote:

> solr-user-unsubscribe <solr-user-unsubscr...@lucene.apache.org>
>
>
> 2013/5/28 Michał Matulka <michal.matu...@gowork.pl>
>
>>  Thanks for your responses, I must admit that after hours of trying I
>> made some mistakes.
>> So the most problematic phrase will now be:
>> "4nSolution Inc." which cannot be found using query:
>>
>> name:4nSolution
>>
>> or even
>>
>> name:4nSolution Inc.
>>
>> but can be using following queries:
>>
>> name:nSolution
>> name:4
>> name:inc
>>
>> Sorry for the mess, it turned out I didn't reindex fields after modyfying
>> schema so I thought that the problem also applies to 300letters .
>>
>> The cause of all of this is the WordDelimiter filter defined as following:
>>
>> <fieldType name="text" class="solr.TextField">
>>       <analyzer type="index">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <!-- in this example, we will only use synonyms at query time
>>         <filter class="solr.SynonymFilterFactory"
>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>         -->
>>         <!-- Case insensitive stop word removal.
>>           add enablePositionIncrements=true in both the index and query
>>           analyzers to leave a 'gap' for more accurate phrase queries.
>>         -->
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
>> preserveOriginal="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>       <analyzer type="query">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
>> preserveOriginal="1" />
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>     </fieldType>
>>
>> and I still don't know why it behaves like that - after all there is
>> "preserveOriginal" attribute set to 1...
>>
>> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>>
>> Hmmm, with 4.x I get much different behavior than you're
>> describing, what version of Solr are you using?
>>
>> Besides Alex's comments, try adding &debug=query to the url and see what 
>> comes
>> out from the query parser.
>>
>> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
>> do
>> any analysis, here's the javadoc...
>>  /**
>>    * Default analyzer for types that only produces 1 verbatim token...
>>    * A maximum size of chars to be read must be specified
>>    */
>>
>> so it's much like the "string" type. Which means I'm totally perplexed by 
>> your
>> statement that 300 and letters return a hit. Have you perhaps changed the
>> field definition and not re-indexed?
>>
>> The behavior you're seeing really looks like somehow 
>> WordDelimiterFilterFactory
>> is getting into your analysis chain with settings that don't mash the parts 
>> back
>> together, i.e. you can set up WDDF to split on letter/number transitions, 
>> index
>> each and NOT index the original, but I have no explanation for how that
>> could happen with the field definition you indicated....
>>
>> FWIW,
>> Erick
>>
>> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<arafa...@gmail.com> 
>> <arafa...@gmail.com> wrote:
>>
>>   What does analyzer screen say in the Web AdminUI when you try to do that?
>> Also, what are the tokens stored in the field (also in Web AdminUI).
>>
>> I think it is very strange to have TextField without a tokenizer chain.
>> Maybe you get a standard one assigned by default, but I don't know what the
>> standard chain would be.
>>
>> Regards,
>>
>>   Alex.
>> On 28 May 2013 04:44, "Michał Matulka" <michal.matu...@gowork.pl> 
>> <michal.matu...@gowork.pl> wrote:
>>
>>
>>  Hello,
>>
>> I've got following problem. I have a text type in my schema and a field
>> "name" of that type.
>> That field contains a data, there is, for example, record that has
>> "300letters" as name.
>>
>> Now field type definition:
>> <fieldType name="text" class="solr.TextField"></**fieldType>
>>
>> And, of course, field definition:
>> <fieldname="name"type="text"**indexed="true"stored="true"/>
>>
>> yes, that's all - there are no tokenizers.
>>
>> And now time for my question:
>>
>> Why following queries:
>>
>> name:300
>>
>> and
>>
>> name:letters
>>
>> are returning that result, but:
>>
>> name:300letters
>>
>> is not (0 results)?
>>
>> Best regards,
>> Michał Matulka
>>
>>
>>
>>
>> --
>>  Pozdrawiam,
>> Michał Matulka
>>  Programista
>>  michal.matu...@gowork.pl
>>
>>
>>  *[image: GoWork.pl]*
>>  ul. Zielna 39
>>  00-108 Warszawa
>>  www.GoWork.pl
>>
>
>

Re: Strange behavior on text field with number-text content

Reply via email to