Ah, good. Good luck with the rest of your app! WordDelimiterFilterFactory is powerful, but tricky <G>...
Best Erick On Thu, Feb 3, 2011 at 9:51 AM, openvictor Open <openvic...@gmail.com>wrote: > Dear Erick, > > You were totally right about the fact that I didn't use any space to > separate words, cause SolR to concatenate words ! > Everything is solved now. Thank you very much for your help ! > > Best regards, > Victor Kabdebon > > 2011/2/3 Erick Erickson <erickerick...@gmail.com> > > > There are a couple of things going on here. First, > > WordDelimiterFilterFactory is > > splitting things up on letter/number boundaries. Take a look at: > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters > > > > for a list of *some* of the available tokenizers. You may want to just > use > > one of the others, or change the parameters to > > WordDelimiterFilterFilterFactory > > to not split as it is. > > > > See the page: http://localhost:8983/solr/admin/analysis.jsp and check > the > > "verbose" > > box to see what the effects of the various elements in your analysis > chain > > are. > > This is a very important page for understanding the analysis part of the > > whole > > operation. > > > > Second, if you've been trying different things out, you may well have > some > > old stuff in your index. When you delete documents, the terms are still > in > > the index until an optimize. I'd advise starting with a clean slate for > > your > > experiments each time. The cheap way to do this is stop your server and > > delete <solr_home>/data/index. Delete the index directory too, not just > the > > contents. So it's possible your TermsComponent is returning data from > > previous > > attempts, because I sure don't see how the concatenated terms would be > > in this index given the definition you've posted. > > > > And if none of that works, well, we'll try something else <G>.. > > > > Best > > Erick > > > > On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open <openvic...@gmail.com > > >wrote: > > > > > Dear Erick, > > > > > > Thank you for your answer, here is my fieldtype definition. I took the > > > standard one because I don't need a better one for this field > > > > > > <fieldType name="text" class="solr.TextField" > positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt" enablePositionIncrements="true"/> > > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > > > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > > > catenateAll="0" splitOnCaseChange="1"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.SnowballPorterFilterFactory" language="English" > > > protected="protwords.txt"/> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > > > ignoreCase="true" expand="true"/> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt" enablePositionIncrements="true"/> > > > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" > > > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > > > catenateAll="0" splitOnCaseChange="1"/> > > > <filter class="solr.LowerCaseFilterFactory"/> > > > <filter class="solr.SnowballPorterFilterFactory" language="English" > > > protected="protwords.txt"/> > > > </analyzer> > > > </fieldType> > > > > > > Now my field : > > > > > > <field name="p_field" type="text" indexed="true" stored="true"/> > > > > > > But I have a doubt now... Do I really put a space between words or is > it > > > just a coma... If I only put a coma then the whole process is going to > be > > > impacted ? What I don't really understand is that I find the separate > > > words, > > > but also their concatenation (but again in one direction only). Let me > > > explain : if a have "man" "bear" "pig" I will find : > > > "manbearpig" "bearpig" but never pigman or anyother combination in a > > > different order. > > > > > > Thank you very much > > > Best Regards, > > > Victor > > > > > > 2011/2/1 Erick Erickson <erickerick...@gmail.com> > > > > > > > Nope, this isn't what I'd expect. There are a couple of > possibilities: > > > > 1> check out what WordDelimiterFilterFactory is doing, although > > > > if you're really sending spaces that's probably not it. > > > > 2> Let's see the <field> and <fieldType> definitions for the field > > > > in question. type="text" doesn't say anything about analysis, > > > > and that's where I'd expect you're having trouble. In particular > > > > if your analysis chain uses KeywordTokenizerFactory for instance. > > > > 3> Look at the admin/schema browse page, look at your field and > > > > see what the actual tokens are. That'll tell you what > > TermsComponents > > > > is returning, perhaps the concatenation is happening somewhere > > > > else. > > > > > > > > Bottom line: Solr will not concatenate terms like this unless you > tell > > it > > > > to, > > > > so I suspect you're telling it to, you just don't realize it <G>... > > > > > > > > Best > > > > Erick > > > > > > > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open < > openvic...@gmail.com > > > > >wrote: > > > > > > > > > Dear Solr users, > > > > > > > > > > I am currently using SolR and TermsComponents to make an auto > suggest > > > for > > > > > my > > > > > website. > > > > > > > > > > I have a field called p_field indexed and stored with type="text" > in > > > the > > > > > schema xml. Nothing out of the usual. > > > > > I feed to Solr a set of words separated by a coma and a space such > as > > > > (for > > > > > two documents) : > > > > > > > > > > Document 1: > > > > > word11, word12, word13. word14 > > > > > > > > > > Document 2: > > > > > word21, word22, word23. word24 > > > > > > > > > > > > > > > When I use my newly designed field I get things for the prefix > > "word1" > > > : > > > > > word11, word12, word13. word14 word11word12 word11word13 etc... > > > > > Is it normal to have the concatenation of words and not only the > > words > > > > > indexed ? Did I miss something about Terms ? > > > > > > > > > > Thank you very much, > > > > > Best regards all, > > > > > Victor > > > > > > > > > > > > > > >