solr-user-unsubscribe <solr-user-unsubscr...@lucene.apache.org>
2013/5/28 Michał Matulka <michal.matu...@gowork.pl> > Thanks for your responses, I must admit that after hours of trying I > made some mistakes. > So the most problematic phrase will now be: > "4nSolution Inc." which cannot be found using query: > > name:4nSolution > > or even > > name:4nSolution Inc. > > but can be using following queries: > > name:nSolution > name:4 > name:inc > > Sorry for the mess, it turned out I didn't reindex fields after modyfying > schema so I thought that the problem also applies to 300letters . > > The cause of all of this is the WordDelimiter filter defined as following: > > <fieldType name="text" class="solr.TextField"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <!-- in this example, we will only use synonyms at query time > <filter class="solr.SynonymFilterFactory" > synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> > --> > <!-- Case insensitive stop word removal. > add enablePositionIncrements=true in both the index and query > analyzers to leave a 'gap' for more accurate phrase queries. > --> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1" > preserveOriginal="1"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" > ignoreCase="true" > words="stopwords.txt" > enablePositionIncrements="true" > /> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="1" splitOnCaseChange="0" > preserveOriginal="1" /> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English" protected="protwords.txt"/> > </analyzer> > </fieldType> > > and I still don't know why it behaves like that - after all there is > "preserveOriginal" attribute set to 1... > > W dniu 28.05.2013 14:21, Erick Erickson pisze: > > Hmmm, with 4.x I get much different behavior than you're > describing, what version of Solr are you using? > > Besides Alex's comments, try adding &debug=query to the url and see what comes > out from the query parser. > > A quick glance at the code shows that DefaultAnalyzer is used, which doesn't > do > any analysis, here's the javadoc... > /** > * Default analyzer for types that only produces 1 verbatim token... > * A maximum size of chars to be read must be specified > */ > > so it's much like the "string" type. Which means I'm totally perplexed by your > statement that 300 and letters return a hit. Have you perhaps changed the > field definition and not re-indexed? > > The behavior you're seeing really looks like somehow > WordDelimiterFilterFactory > is getting into your analysis chain with settings that don't mash the parts > back > together, i.e. you can set up WDDF to split on letter/number transitions, > index > each and NOT index the original, but I have no explanation for how that > could happen with the field definition you indicated.... > > FWIW, > Erick > > On Tue, May 28, 2013 at 7:47 AM, Alexandre Rafalovitch<arafa...@gmail.com> > <arafa...@gmail.com> wrote: > > What does analyzer screen say in the Web AdminUI when you try to do that? > Also, what are the tokens stored in the field (also in Web AdminUI). > > I think it is very strange to have TextField without a tokenizer chain. > Maybe you get a standard one assigned by default, but I don't know what the > standard chain would be. > > Regards, > > Alex. > On 28 May 2013 04:44, "Michał Matulka" <michal.matu...@gowork.pl> > <michal.matu...@gowork.pl> wrote: > > > Hello, > > I've got following problem. I have a text type in my schema and a field > "name" of that type. > That field contains a data, there is, for example, record that has > "300letters" as name. > > Now field type definition: > <fieldType name="text" class="solr.TextField"></**fieldType> > > And, of course, field definition: > <fieldname="name"type="text"**indexed="true"stored="true"/> > > yes, that's all - there are no tokenizers. > > And now time for my question: > > Why following queries: > > name:300 > > and > > name:letters > > are returning that result, but: > > name:300letters > > is not (0 results)? > > Best regards, > Michał Matulka > > > > > -- > Pozdrawiam, > Michał Matulka > Programista > michal.matu...@gowork.pl > > > *[image: GoWork.pl]* > ul. Zielna 39 > 00-108 Warszawa > www.GoWork.pl >