Thanks for your responses, I must admit
that after hours of trying I made some mistakes.
So the most problematic phrase will now be: "4nSolution Inc." which cannot be found using query: name:4nSolution or even name:4nSolution Inc. but can be using following queries: name:nSolution name:4 name:inc Sorry for the mess, it turned out I didn't reindex fields after modyfying schema so I thought that the problem also applies to 300letters . The cause of all of this is the WordDelimiter filter defined as following: <fieldType name="text" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <!-- in this example, we will only use synonyms at query time <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> --> <!-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. --> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" preserveOriginal="1" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType> and I still don't know why it behaves like that - after all there is "preserveOriginal" attribute set to 1... W dniu 28.05.2013 14:21, Erick Erickson pisze: Hmmm, with 4.x I get much different behavior than you're describing, what version of Solr are you using?Besides Alex's comments, try adding &debug=query to the url and see what comes out from the query parser. A quick glance at the code shows that DefaultAnalyzer is used, which doesn't do any analysis, here's the javadoc... /** * Default analyzer for types that only produces 1 verbatim token... * A maximum size of chars to be read must be specified */ so it's much like the "string" type. Which means I'm totally perplexed by your statement that 300 and letters return a hit. Have you perhaps changed the field definition and not re-indexed? The behavior you're seeing really looks like somehow WordDelimiterFilterFactory is getting into your analysis chain with settings that don't mash the parts back together, i.e. you can set up WDDF to split on letter/number transitions, index each and NOT index the original, but I have no explanation for how that could happen with the field definition you indicated.... FWIW, Erick On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:What does analyzer screen say in the Web AdminUI when you try to do that? Also, what are the tokens stored in the field (also in Web AdminUI). I think it is very strange to have TextField without a tokenizer chain. Maybe you get a standard one assigned by default, but I don't know what the standard chain would be. Regards, Alex. On 28 May 2013 04:44, "Michał Matulka" <michal.matu...@gowork.pl> wrote:Hello, I've got following problem. I have a text type in my schema and a field "name" of that type. That field contains a data, there is, for example, record that has "300letters" as name. Now field type definition: <fieldType name="text" class="solr.TextField"></**fieldType> And, of course, field definition: <fieldname="name"type="text"**indexed="true"stored="true"/> yes, that's all - there are no tokenizers. And now time for my question: Why following queries: name:300 and name:letters are returning that result, but: name:300letters is not (0 results)? Best regards, Michał Matulka --
Pozdrawiam,
Michał Matulka Programista
![]() ul. Zielna 39
00-108 Warszawa
|
- Strange behavior on text field with number-text cont... Michał Matulka
- Re: Strange behavior on text field with number-... Alexandre Rafalovitch
- Re: Strange behavior on text field with num... Erick Erickson
- Re: Strange behavior on text field with... Michał Matulka
- Re: Strange behavior on text field ... Алексей Цой
- Re: Strange behavior on text f... Erick Erickson