Hi, Anybody has any thoughts about this? I'm really struggling whit these problems, any hints would be very welcome!
Regards, Dirceu On Fri, Feb 10, 2012 at 4:45 PM, Dirceu Vieira <dirceu...@gmail.com> wrote: > Hi Guys, > > Would someone have time to help me understand what's happening here: > > I have a dynamic field called *prMeta_service *and this value *"EHT2011-2012" > *is indexed for various documents. > > When I search for the same exact value (*"EHT2011-2012"*), it ends up NOT > matching the content. > I have spent quite a lot of time lately trying to understand what happens, > reading every documentation possible about the Token Filters that are used > in this field, but I can't seem to find the answer. > > It seems to me that for some reason, the parser is getting lost because > the value contains letters and numbers, I mention that because I have tried > querying only for *"2011-2012" and *"*20112012*" and then I have the > expected results. > > I am using Solr 1.4, and I haven't tried it in any other version. > > Another interesting factor is that for some reason the > SnowballPorterFilterFactory is removing a character from *"2011" * and so > *"201" *is the value that is actually indexed. > I don't believe that this last point is what actually causes > my unsatisfactory results, but I just wanted to know if anybody have any > issue with the Finish language stemming. > > > I would very much appreciate if someone could spare some time to help me > on this issue. > > > My configuration looks like: > > > *- Dynamic field: * > > <dynamicField name="prMeta_*" type="text" indexed="true" stored="true" > multiValued="true"/> > > *- Field type:* > > <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" words=" > stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll > ="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" > /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> > <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize=" > 25"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" words=" > stopwords.txt"/> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll > ="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt" > /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> > </analyzer> > </fieldType> > > *- The field analysis gives me that as a response:* > > EHT2011-2012 EHT2011-2012 EHT 2011 2012 20112012 eht 2011 2012 20112012 > eht 2011 2012 20112012 eht 2011 2012 20112012 eht 201 2012 20112012 e > eheht2202012202012012220201201120112201120201120120112012 > > - *When I run the query in the admin in debug mode (&debugQuery=true), > that's the result:* > > <str name="rawquerystring"> > prMeta_service:EHT2011-2012 > </str> > <str name="querystring"> > prMeta_service:EHT2011-2012 > </str> > <str name="parsedquery"> > PhraseQuery(prMeta_service:"eht 201 2012") > </str> > <str name="parsedquery_toString"> > prMeta_service:"eht 201 2012" > </str> > > > Thank you very much in advance! > > Best regards, > > -- > Dirceu Vieira Júnior > ------------------------------------------------------------------- > +47 9753 2473 > dirceuvjr.blogspot.com > twitter.com/dirceuvjr > > -- Dirceu Vieira Júnior ------------------------------------------------------------------- +47 9753 2473 dirceuvjr.blogspot.com twitter.com/dirceuvjr