Dear Erick, Thank you for your answer, here is my fieldtype definition. I took the standard one because I don't need a better one for this field
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType> Now my field : <field name="p_field" type="text" indexed="true" stored="true"/> But I have a doubt now... Do I really put a space between words or is it just a coma... If I only put a coma then the whole process is going to be impacted ? What I don't really understand is that I find the separate words, but also their concatenation (but again in one direction only). Let me explain : if a have "man" "bear" "pig" I will find : "manbearpig" "bearpig" but never pigman or anyother combination in a different order. Thank you very much Best Regards, Victor 2011/2/1 Erick Erickson <erickerick...@gmail.com> > Nope, this isn't what I'd expect. There are a couple of possibilities: > 1> check out what WordDelimiterFilterFactory is doing, although > if you're really sending spaces that's probably not it. > 2> Let's see the <field> and <fieldType> definitions for the field > in question. type="text" doesn't say anything about analysis, > and that's where I'd expect you're having trouble. In particular > if your analysis chain uses KeywordTokenizerFactory for instance. > 3> Look at the admin/schema browse page, look at your field and > see what the actual tokens are. That'll tell you what TermsComponents > is returning, perhaps the concatenation is happening somewhere > else. > > Bottom line: Solr will not concatenate terms like this unless you tell it > to, > so I suspect you're telling it to, you just don't realize it <G>... > > Best > Erick > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open <openvic...@gmail.com > >wrote: > > > Dear Solr users, > > > > I am currently using SolR and TermsComponents to make an auto suggest for > > my > > website. > > > > I have a field called p_field indexed and stored with type="text" in the > > schema xml. Nothing out of the usual. > > I feed to Solr a set of words separated by a coma and a space such as > (for > > two documents) : > > > > Document 1: > > word11, word12, word13. word14 > > > > Document 2: > > word21, word22, word23. word24 > > > > > > When I use my newly designed field I get things for the prefix "word1" : > > word11, word12, word13. word14 word11word12 word11word13 etc... > > Is it normal to have the concatenation of words and not only the words > > indexed ? Did I miss something about Terms ? > > > > Thank you very much, > > Best regards all, > > Victor > > >