Hi Arcadius,

Thank you for your reply.

So this means that the de-duplication has to be done during indexing time,
and not during query time?

Yes, currently I'm building on the "search" to be do my suggestion as I
faced some issues with the suggestions components in the Solr 5.1.0 version.
Will the suggestion components solve this issues of giving duplicating
suggestions?

There might also be cases where about 1/2 to 3/4 of my indexed documents
are the same, with only the remaining 1/4 to 1/2 are different. So this
will probably lead to cases where the index is different, but a search may
return the part of the document that are the same.


Regards,
Edwin


On 23 August 2015 at 21:44, Arcadius Ahouansou <arcad...@menelic.com> wrote:

> Hi Edwin.
>
> What you are doing here is "search" as Solr has separate components for
> doing suggestions.
>
> About dedup,
>
> - have a look at  the manual
> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>
> - or simply do your dedup upfront before ingesting into Solr by assigning
> the same "id" to all doc with same "textng" (may require a different index
> if you want to keep the existing data with duplicate for other purpose)
>
> - Or you could use result grouping/fieldCollapsing to group/dedup your
> result
>
> Hope this helps
>
> Arcadius.
>
>
> On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I would like to check, is there anyway to remove duplicate suggestions in
> > Solr?
> > I have several documents that looks very similar, and when I do a
> > suggestion query, it came back with all same results. I'm using Solr
> 5.2.1
> >
> > This is my suggestion pipeline:
> >
> > <requestHandler name="/suggest" class="solr.SearchHandler">
> > <lst name="defaults">
> > <!-- Browse specific stuff -->
> > <str name="echoParams">all</str>
> >   <str name="wt">json</str>
> >   <str name="indent">true</str>
> >
> > <!-- Everything below should be identical to "ac" handler above -->
> > <str name="defType">edismax</str>
> > <str name="rows">10</str>
> > <str name="fl">id, score</str>
> > <!--<str name="qf">textsuggest^30 extrasearch^30.0 textng^50.0
> > phonetic^10</str>-->
> > <!--<str name="qf">content^50 title^50 extrasearch^30.0 textng^1.0
> > textng2^200.0</str>-->
> > <str name="qf">content^50 title^50 extrasearch^30.0</str>
> > <str name="pf">textnge^50.0</str>
> > <!--<str name="bf">product(log(sum(popularity,1)),100)^20</str>-->
> > <!-- Define relative importance between types. May be overridden per
> > request by e.g. &personboost=120 -->
> > <str
> >
> >
> name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)</str>
> > <double name="typeboost">1.0</double>
> >
> > <str name="type1query">content_type:"application/pdf"</str>
> > <double name="type1boost">0.9</double>
> > <str name="type2query">content_type:"application/msword"</str>
> > <double name="type2boost">0.5</double>
> > <str name="type3query">content_type:"NA"</str>
> > <double name="type3boost">0.0</double>
> > <str name="type4query">content_type:"NA"</str>
> > <double name="type4boost">0.0</double>
> >   <str name="hl">on</str>
> >   <str name="hl.fl">id, textng, textng2, language_s</str>
> >   <str name="hl.highlightMultiTerm">true</str>
> >   <str name="hl.preserveMulti">true</str>
> >   <str name="hl.encoder">html</str>
> >   <!--<str name="f.content.hl.fragsize">80</str>-->
> >   <str name="hl.fragsize">50</str>
> > <str name="debugQuery">false</str>
> > </lst>
> > </requestHandler>
> >
> > This is my query:
> > http://localhost:8983/edm/chinese2/suggest?q=do our
> > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20
> >
> textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true
> >
> >
> > This is the suggestion result:
> >
> >  "highlighting":{
> >     "responsibility001":{
> >       "id":["responsibility001"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility002":{
> >       "id":["responsibility002"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility003":{
> >       "id":["responsibility003"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility004":{
> >       "id":["responsibility004"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility005":{
> >       "id":["responsibility005"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility006":{
> >       "id":["responsibility006"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility007":{
> >       "id":["responsibility007"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility008":{
> >       "id":["responsibility008"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility009":{
> >       "id":["responsibility009"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >     "responsibility010":{
> >       "id":["responsibility010"],
> >       "textng":["We will strive to <em>do</em> <em>our</em>
> <em>best</em>.
> >  &lt;br&gt; "],
> >
> >
> > Regards,
> > Edwin
> >
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---
>

Reply via email to