query and index analysis is different. word delimiters are set up differently and no ngram filter.
look at the index and query an field analysis.jsp output you should see what filter in the query chain fails to match. (choose verbose output) On 13 February 2012 11:12, Dirceu Vieira <dirceu...@gmail.com> wrote: > Hi Lee, > > Thanks for you reply! > > Yes, we actually need those filters. This dynamic field is parsing the meta > data concerning each video and they may have different content. > If I understand where you're going with your comment you mean that I > probably should plan it better and create field types that are more > specific for different field contents, correct? > > But still, that does not explain why I have indexed this specific value > "EHT2011-2012" and the very same value does not match anything when I > search for it. > > > On Mon, Feb 13, 2012 at 11:28 AM, Lee Carroll > <lee.a.carr...@googlemail.com>wrote: > >> Hi You have a lot of language processing for a field which contains, >> at least in your example non words. >> >> Do you need the synonyms, two lots of stemming, etc.... >> >> what is the field for? >> >> >>" I don't believe that this last point is what actually causes >> >> my unsatisfactory results" >> >> it probably is >> >> On 13 February 2012 10:02, Dirceu Vieira <dirceu...@gmail.com> wrote: >> > Hi, >> > >> > Anybody has any thoughts about this? >> > I'm really struggling whit these problems, any hints would be very >> welcome! >> > >> > Regards, >> > >> > Dirceu >> > >> > On Fri, Feb 10, 2012 at 4:45 PM, Dirceu Vieira <dirceu...@gmail.com> >> wrote: >> > >> >> Hi Guys, >> >> >> >> Would someone have time to help me understand what's happening here: >> >> >> >> I have a dynamic field called *prMeta_service *and this value >> *"EHT2011-2012" >> >> *is indexed for various documents. >> >> >> >> When I search for the same exact value (*"EHT2011-2012"*), it ends up >> NOT >> >> matching the content. >> >> I have spent quite a lot of time lately trying to understand what >> happens, >> >> reading every documentation possible about the Token Filters that are >> used >> >> in this field, but I can't seem to find the answer. >> >> >> >> It seems to me that for some reason, the parser is getting lost because >> >> the value contains letters and numbers, I mention that because I have >> tried >> >> querying only for *"2011-2012" and *"*20112012*" and then I have the >> >> expected results. >> >> >> >> I am using Solr 1.4, and I haven't tried it in any other version. >> >> >> >> Another interesting factor is that for some reason the >> >> SnowballPorterFilterFactory is removing a character from *"2011" * and >> so >> >> *"201" *is the value that is actually indexed. >> >> I don't believe that this last point is what actually causes >> >> my unsatisfactory results, but I just wanted to know if anybody have any >> >> issue with the Finish language stemming. >> >> >> >> >> >> I would very much appreciate if someone could spare some time to help me >> >> on this issue. >> >> >> >> >> >> My configuration looks like: >> >> >> >> >> >> *- Dynamic field: * >> >> >> >> <dynamicField name="prMeta_*" type="text" indexed="true" stored="true" >> >> multiValued="true"/> >> >> >> >> *- Field type:* >> >> >> >> <fieldType name="text" class="solr.TextField" >> positionIncrementGap="100"> >> >> <analyzer type="index"> >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> >> <filter class="solr.StopFilterFactory" ignoreCase="true" words=" >> >> stopwords.txt"/> >> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >> >> generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> catenateAll >> >> ="0"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> <filter class="solr.EnglishPorterFilterFactory" >> protected="protwords.txt" >> >> /> >> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> >> >> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" >> maxGramSize=" >> >> 25"/> >> >> </analyzer> >> >> <analyzer type="query"> >> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> >> ignoreCase="true" expand="true"/> >> >> <filter class="solr.StopFilterFactory" ignoreCase="true" words=" >> >> stopwords.txt"/> >> >> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" >> >> generateNumberParts="1" catenateWords="0" catenateNumbers="0" >> catenateAll >> >> ="0"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> >> <filter class="solr.EnglishPorterFilterFactory" >> protected="protwords.txt" >> >> /> >> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> >> <filter class="solr.SnowballPorterFilterFactory" language="Finnish"/> >> >> </analyzer> >> >> </fieldType> >> >> >> >> *- The field analysis gives me that as a response:* >> >> >> >> EHT2011-2012 EHT2011-2012 EHT 2011 2012 20112012 eht 2011 2012 20112012 >> >> eht 2011 2012 20112012 eht 2011 2012 20112012 eht 201 2012 20112012 e >> eheht2202012202012012220201201120112201120201120120112012 >> >> >> >> - *When I run the query in the admin in debug mode (&debugQuery=true), >> >> that's the result:* >> >> >> >> <str name="rawquerystring"> >> >> prMeta_service:EHT2011-2012 >> >> </str> >> >> <str name="querystring"> >> >> prMeta_service:EHT2011-2012 >> >> </str> >> >> <str name="parsedquery"> >> >> PhraseQuery(prMeta_service:"eht 201 2012") >> >> </str> >> >> <str name="parsedquery_toString"> >> >> prMeta_service:"eht 201 2012" >> >> </str> >> >> >> >> >> >> Thank you very much in advance! >> >> >> >> Best regards, >> >> >> >> -- >> >> Dirceu Vieira Júnior >> >> ------------------------------------------------------------------- >> >> +47 9753 2473 >> >> dirceuvjr.blogspot.com >> >> twitter.com/dirceuvjr >> >> >> >> >> > >> > >> > -- >> > Dirceu Vieira Júnior >> > ------------------------------------------------------------------- >> > +47 9753 2473 >> > dirceuvjr.blogspot.com >> > twitter.com/dirceuvjr >> > > > > -- > Dirceu Vieira Júnior > ------------------------------------------------------------------- > +47 9753 2473 > dirceuvjr.blogspot.com > twitter.com/dirceuvjr