Re: Terms and termscomponent questions

openvictor Open Thu, 03 Feb 2011 06:52:15 -0800

Dear Erick,

You were totally right about the fact that I didn't use any space to
separate words, cause SolR to concatenate words !
Everything is solved now. Thank you very much for your help !


Best regards,
Victor Kabdebon

2011/2/3 Erick Erickson <erickerick...@gmail.com>

> There are a couple of things going on here. First,
> WordDelimiterFilterFactory is
> splitting things up on letter/number boundaries. Take a look at:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>
> for a list of *some* of the available tokenizers. You may want to just use
> one of the others, or change the parameters to
> WordDelimiterFilterFilterFactory
> to not split as it is.
>
> See the page: http://localhost:8983/solr/admin/analysis.jsp and check the
> "verbose"
> box to see what the effects of the various elements in your analysis chain
> are.
> This is a very important page for understanding the analysis part of the
> whole
> operation.
>
> Second, if you've been trying different things out, you may well have some
> old stuff in your index. When you delete documents, the terms are still in
> the index until an optimize. I'd advise starting with a clean slate for
> your
> experiments each time. The cheap way to do this is stop your server and
> delete <solr_home>/data/index. Delete the index directory too, not just the
> contents. So it's possible your TermsComponent is returning data from
> previous
> attempts, because I sure don't see how the concatenated terms would be
> in this index given the definition you've posted.
>
> And if none of that works, well, we'll try something else <G>..
>
> Best
> Erick
>
> On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open <openvic...@gmail.com
> >wrote:
>
> > Dear Erick,
> >
> > Thank you for your answer, here is my fieldtype definition. I took the
> > standard one because I don't need a better one for this field
> >
> > <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> > <analyzer type="index">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.SnowballPorterFilterFactory" language="English"
> > protected="protwords.txt"/>
> > </analyzer>
> > <analyzer type="query">
> > <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true"/>
> > <filter class="solr.StopFilterFactory" ignoreCase="true"
> > words="stopwords.txt" enablePositionIncrements="true"/>
> > <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> > <filter class="solr.LowerCaseFilterFactory"/>
> > <filter class="solr.SnowballPorterFilterFactory" language="English"
> > protected="protwords.txt"/>
> > </analyzer>
> > </fieldType>
> >
> > Now my field :
> >
> > <field name="p_field" type="text" indexed="true" stored="true"/>
> >
> > But I have a doubt now... Do I really put a space between words or is it
> > just a coma... If I only put a coma then the whole process is going to be
> > impacted ? What I don't really understand is that I find the separate
> > words,
> > but also their concatenation (but again in one direction only). Let me
> > explain : if a have "man" "bear" "pig" I will find :
> > "manbearpig" "bearpig" but never pigman or anyother combination in a
> > different order.
> >
> > Thank you very much
> > Best Regards,
> > Victor
> >
> > 2011/2/1 Erick Erickson <erickerick...@gmail.com>
> >
> > > Nope, this isn't what I'd expect. There are a couple of possibilities:
> > > 1> check out what WordDelimiterFilterFactory is doing, although
> > >     if you're really sending spaces that's probably not it.
> > > 2> Let's see the <field> and <fieldType> definitions for the field
> > >     in question. type="text" doesn't say anything about analysis,
> > >     and that's where I'd expect you're having trouble. In particular
> > >     if your analysis chain uses KeywordTokenizerFactory for instance.
> > > 3> Look at the admin/schema browse page, look at your field and
> > >     see what the actual tokens are. That'll tell you what
> TermsComponents
> > >     is returning, perhaps the concatenation is happening somewhere
> > >     else.
> > >
> > > Bottom line: Solr will not concatenate terms like this unless you tell
> it
> > > to,
> > > so I suspect you're telling it to, you just don't realize it <G>...
> > >
> > > Best
> > > Erick
> > >
> > > On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open <openvic...@gmail.com
> > > >wrote:
> > >
> > > > Dear Solr users,
> > > >
> > > > I am currently using SolR and TermsComponents to make an auto suggest
> > for
> > > > my
> > > > website.
> > > >
> > > > I have a field called p_field indexed and stored with type="text" in
> > the
> > > > schema xml. Nothing out of the usual.
> > > > I feed to Solr a set of words separated by a coma and a space such as
> > > (for
> > > > two documents) :
> > > >
> > > > Document 1:
> > > > word11, word12, word13. word14
> > > >
> > > > Document 2:
> > > > word21, word22, word23. word24
> > > >
> > > >
> > > > When I use my newly designed field I get things for the prefix
> "word1"
> > :
> > > > word11, word12, word13. word14 word11word12 word11word13 etc...
> > > > Is it normal to have the concatenation of words and not only the
> words
> > > > indexed ? Did I miss something about Terms ?
> > > >
> > > > Thank you very much,
> > > > Best regards all,
> > > > Victor
> > > >
> > >
> >
>

Re: Terms and termscomponent questions

Reply via email to