Hi Arcadius, Thank you for your reply.
So this means that the de-duplication has to be done during indexing time, and not during query time? Yes, currently I'm building on the "search" to be do my suggestion as I faced some issues with the suggestions components in the Solr 5.1.0 version. Will the suggestion components solve this issues of giving duplicating suggestions? There might also be cases where about 1/2 to 3/4 of my indexed documents are the same, with only the remaining 1/4 to 1/2 are different. So this will probably lead to cases where the index is different, but a search may return the part of the document that are the same. Regards, Edwin On 23 August 2015 at 21:44, Arcadius Ahouansou <arcad...@menelic.com> wrote: > Hi Edwin. > > What you are doing here is "search" as Solr has separate components for > doing suggestions. > > About dedup, > > - have a look at the manual > https://cwiki.apache.org/confluence/display/solr/De-Duplication > > - or simply do your dedup upfront before ingesting into Solr by assigning > the same "id" to all doc with same "textng" (may require a different index > if you want to keep the existing data with duplicate for other purpose) > > - Or you could use result grouping/fieldCollapsing to group/dedup your > result > > Hope this helps > > Arcadius. > > > On 21 August 2015 at 06:41, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > > Hi, > > > > I would like to check, is there anyway to remove duplicate suggestions in > > Solr? > > I have several documents that looks very similar, and when I do a > > suggestion query, it came back with all same results. I'm using Solr > 5.2.1 > > > > This is my suggestion pipeline: > > > > <requestHandler name="/suggest" class="solr.SearchHandler"> > > <lst name="defaults"> > > <!-- Browse specific stuff --> > > <str name="echoParams">all</str> > > <str name="wt">json</str> > > <str name="indent">true</str> > > > > <!-- Everything below should be identical to "ac" handler above --> > > <str name="defType">edismax</str> > > <str name="rows">10</str> > > <str name="fl">id, score</str> > > <!--<str name="qf">textsuggest^30 extrasearch^30.0 textng^50.0 > > phonetic^10</str>--> > > <!--<str name="qf">content^50 title^50 extrasearch^30.0 textng^1.0 > > textng2^200.0</str>--> > > <str name="qf">content^50 title^50 extrasearch^30.0</str> > > <str name="pf">textnge^50.0</str> > > <!--<str name="bf">product(log(sum(popularity,1)),100)^20</str>--> > > <!-- Define relative importance between types. May be overridden per > > request by e.g. &personboost=120 --> > > <str > > > > > name="boost">product(map(query($type1query),0,0,1,$type1boost),map(query($type2query),0,0,1,$type2boost),map(query($type3query),0,0,1,$type3boost),map(query($type4query),0,0,1,$type4boost),$typeboost)</str> > > <double name="typeboost">1.0</double> > > > > <str name="type1query">content_type:"application/pdf"</str> > > <double name="type1boost">0.9</double> > > <str name="type2query">content_type:"application/msword"</str> > > <double name="type2boost">0.5</double> > > <str name="type3query">content_type:"NA"</str> > > <double name="type3boost">0.0</double> > > <str name="type4query">content_type:"NA"</str> > > <double name="type4boost">0.0</double> > > <str name="hl">on</str> > > <str name="hl.fl">id, textng, textng2, language_s</str> > > <str name="hl.highlightMultiTerm">true</str> > > <str name="hl.preserveMulti">true</str> > > <str name="hl.encoder">html</str> > > <!--<str name="f.content.hl.fragsize">80</str>--> > > <str name="hl.fragsize">50</str> > > <str name="debugQuery">false</str> > > </lst> > > </requestHandler> > > > > This is my query: > > http://localhost:8983/edm/chinese2/suggest?q=do our > > best&defType=edismax&qf=content^5 textng^5&pf=textnge^50&pf2=content^20 > > > textnge^50&pf3=content^40%20textnge^50&ps2=2&ps3=2&stats.calcdistinct=true > > > > > > This is the suggestion result: > > > > "highlighting":{ > > "responsibility001":{ > > "id":["responsibility001"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility002":{ > > "id":["responsibility002"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility003":{ > > "id":["responsibility003"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility004":{ > > "id":["responsibility004"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility005":{ > > "id":["responsibility005"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility006":{ > > "id":["responsibility006"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility007":{ > > "id":["responsibility007"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility008":{ > > "id":["responsibility008"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility009":{ > > "id":["responsibility009"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > "responsibility010":{ > > "id":["responsibility010"], > > "textng":["We will strive to <em>do</em> <em>our</em> > <em>best</em>. > > <br> "], > > > > > > Regards, > > Edwin > > > > > > -- > Arcadius Ahouansou > Menelic Ltd | Information is Power > M: 07908761999 > W: www.menelic.com > --- >