Hmmm, there are two things you _must_ get familiar with when diagnosing these <G>..
1> admin/analysis. That'll show you exactly what the analysis chain does, and it's not always obvious. 2> add &debug=query to your input and look at the parsed query results. For instance, this "name:4nSolution Inc." parses as name:4nSolution defaultfield:inc. That doesn't explain why name=4nSolutions, except...... your index chain has splitOnCaseChange=1 and your query bit has splitOnCaseChange=0 which doesn't seem right.... Best Erick On Tue, May 28, 2013 at 10:31 AM, Алексей Цой <alexey...@gmail.com> wrote: > solr-user-unsubscribe <solr-user-unsubscr...@lucene.apache.org> > > > 2013/5/28 Michał Matulka <michal.matu...@gowork.pl> > >> Thanks for your responses, I must admit that after hours of trying I >> made some mistakes. >> So the most problematic phrase will now be: >> "4nSolution Inc." which cannot be found using query: >> >> name:4nSolution >> >> or even >> >> name:4nSolution Inc. >> >> but can be using following queries: >> >> name:nSolution >> name:4 >> name:inc >> >> Sorry for the mess, it turned out I didn't reindex fields after modyfying >> schema so I thought that the problem also applies to 300letters . >> >> The cause of all of this is the WordDelimiter filter defined as following: >> >> <fieldType name="text" class="solr.TextField"> >> <analyzer type="index"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <!-- in this example, we will only use synonyms at query time >> <filter class="solr.SynonymFilterFactory" >> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> >> --> >> <!-- Case insensitive stop word removal. >> add enablePositionIncrements=true in both the index and query >> analyzers to leave a 'gap' for more accurate phrase queries. >> --> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" >> preserveOriginal="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> <analyzer type="query"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" >> preserveOriginal="1" /> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> </fieldType> >> >> and I still don't know why it behaves like that - after all there is >> "preserveOriginal" attribute set to 1... >> >> W dniu 28.05.2013 14:21, Erick Erickson pisze: >> >> Hmmm, with 4.x I get much different behavior than you're >> describing, what version of Solr are you using? >> >> Besides Alex's comments, try adding &debug=query to the url and see what >> comes >> out from the query parser. >> >> A quick glance at the code shows that DefaultAnalyzer is used, which doesn't >> do >> any analysis, here's the javadoc... >> /** >> * Default analyzer for types that only produces 1 verbatim token... >> * A maximum size of chars to be read must be specified >> */ >> >> so it's much like the "string" type. Which means I'm totally perplexed by >> your >> statement that 300 and letters return a hit. Have you perhaps changed the >> field definition and not re-indexed? >> >> The behavior you're seeing really looks like somehow >> WordDelimiterFilterFactory >> is getting into your analysis chain with settings that don't mash the parts >> back >> together, i.e. you can set up WDDF to split on letter/number transitions, >> index >> each and NOT index the original, but I have no explanation for how that >> could happen with the field definition you indicated.... >> >> FWIW, >> Erick >> >> On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<arafa...@gmail.com> >> <arafa...@gmail.com> wrote: >> >> What does analyzer screen say in the Web AdminUI when you try to do that? >> Also, what are the tokens stored in the field (also in Web AdminUI). >> >> I think it is very strange to have TextField without a tokenizer chain. >> Maybe you get a standard one assigned by default, but I don't know what the >> standard chain would be. >> >> Regards, >> >> Alex. >> On 28 May 2013 04:44, "Michał Matulka" <michal.matu...@gowork.pl> >> <michal.matu...@gowork.pl> wrote: >> >> >> Hello, >> >> I've got following problem. I have a text type in my schema and a field >> "name" of that type. >> That field contains a data, there is, for example, record that has >> "300letters" as name. >> >> Now field type definition: >> <fieldType name="text" class="solr.TextField"></**fieldType> >> >> And, of course, field definition: >> <fieldname="name"type="text"**indexed="true"stored="true"/> >> >> yes, that's all - there are no tokenizers. >> >> And now time for my question: >> >> Why following queries: >> >> name:300 >> >> and >> >> name:letters >> >> are returning that result, but: >> >> name:300letters >> >> is not (0 results)? >> >> Best regards, >> Michał Matulka >> >> >> >> >> -- >> Pozdrawiam, >> Michał Matulka >> Programista >> michal.matu...@gowork.pl >> >> >> *[image: GoWork.pl]* >> ul. Zielna 39 >> 00-108 Warszawa >> www.GoWork.pl >> > >