Re: Strange behavior on text field with number-text content

Алексей Цой Tue, 28 May 2013 07:32:28 -0700

solr-user-unsubscribe <[email protected]>


2013/5/28 Michał Matulka <[email protected]>

>  Thanks for your responses, I must admit that after hours of trying I
> made some mistakes.
> So the most problematic phrase will now be:
> "4nSolution Inc." which cannot be found using query:
>
> name:4nSolution
>
> or even
>
> name:4nSolution Inc.
>
> but can be using following queries:
>
> name:nSolution
> name:4
> name:inc
>
> Sorry for the mess, it turned out I didn't reindex fields after modyfying
> schema so I thought that the problem also applies to 300letters .
>
> The cause of all of this is the WordDelimiter filter defined as following:
>
> <fieldType name="text" class="solr.TextField">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <!-- in this example, we will only use synonyms at query time
>         <filter class="solr.SynonymFilterFactory"
> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>         -->
>         <!-- Case insensitive stop word removal.
>           add enablePositionIncrements=true in both the index and query
>           analyzers to leave a 'gap' for more accurate phrase queries.
>         -->
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0"
> preserveOriginal="1" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>     </fieldType>
>
> and I still don't know why it behaves like that - after all there is
> "preserveOriginal" attribute set to 1...
>
> W dniu 28.05.2013 14:21, Erick Erickson pisze:
>
> Hmmm, with 4.x I get much different behavior than you're
> describing, what version of Solr are you using?
>
> Besides Alex's comments, try adding &debug=query to the url and see what comes
> out from the query parser.
>
> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't 
> do
> any analysis, here's the javadoc...
>  /**
>    * Default analyzer for types that only produces 1 verbatim token...
>    * A maximum size of chars to be read must be specified
>    */
>
> so it's much like the "string" type. Which means I'm totally perplexed by your
> statement that 300 and letters return a hit. Have you perhaps changed the
> field definition and not re-indexed?
>
> The behavior you're seeing really looks like somehow 
> WordDelimiterFilterFactory
> is getting into your analysis chain with settings that don't mash the parts 
> back
> together, i.e. you can set up WDDF to split on letter/number transitions, 
> index
> each and NOT index the original, but I have no explanation for how that
> could happen with the field definition you indicated....
>
> FWIW,
> Erick
>
> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<[email protected]> 
> <[email protected]> wrote:
>
>   What does analyzer screen say in the Web AdminUI when you try to do that?
> Also, what are the tokens stored in the field (also in Web AdminUI).
>
> I think it is very strange to have TextField without a tokenizer chain.
> Maybe you get a standard one assigned by default, but I don't know what the
> standard chain would be.
>
> Regards,
>
>   Alex.
> On 28 May 2013 04:44, "Michał Matulka" <[email protected]> 
> <[email protected]> wrote:
>
>
>  Hello,
>
> I've got following problem. I have a text type in my schema and a field
> "name" of that type.
> That field contains a data, there is, for example, record that has
> "300letters" as name.
>
> Now field type definition:
> <fieldType name="text" class="solr.TextField"></**fieldType>
>
> And, of course, field definition:
> <fieldname="name"type="text"**indexed="true"stored="true"/>
>
> yes, that's all - there are no tokenizers.
>
> And now time for my question:
>
> Why following queries:
>
> name:300
>
> and
>
> name:letters
>
> are returning that result, but:
>
> name:300letters
>
> is not (0 results)?
>
> Best regards,
> Michał Matulka
>
>
>
>
> --
>  Pozdrawiam,
> Michał Matulka
>  Programista
>  [email protected]
>
>
>  *[image: GoWork.pl]*
>  ul. Zielna 39
>  00-108 Warszawa
>  www.GoWork.pl
>

Re: Strange behavior on text field with number-text content

Reply via email to