Thank you for your advice Alexandre. Will try out the de-duplication from the link you gave.
Regards, Edwin On 1 September 2015 at 10:34, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > Re-read the question. You want to de-dupe on the full text-content. > > I would actually try to use the dedupe chain as per the link I gave > but put results into a separate string field. Then, you group on that > field. You cannot actually group on the long text field, that would > kill any performance. So a signature is your proxy. > > Regards, > Alex > ---- > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > http://www.solr-start.com/ > > > On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > wrote: > > Hi Alexandre, > > > > Will treating it as String affect the search or other functions like > > highlighting? > > > > Yes, the content must be in my index, unless I do a copyField to do > > de-duplication on that field.. Will that help? > > > > Regards, > > Edwin > > > > > > On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafa...@gmail.com> > > wrote: > > > >> Can't you just treat it as String? > >> > >> Also, do you actually want those documents in your index in the first > >> place? If not, have you looked at De-duplication: > >> https://cwiki.apache.org/confluence/display/solr/De-Duplication > >> > >> Regards, > >> Alex. > >> ---- > >> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > >> http://www.solr-start.com/ > >> > >> > >> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <edwinye...@gmail.com> > >> wrote: > >> > Thanks Jan. > >> > > >> > But I read that the field that is being collapsed on must be a single > >> > valued String, Int or Float. As I'm required to get the distinct > results > >> > from "content" field that was indexed from a rich text document, I got > >> the > >> > following error: > >> > > >> > "error":{ > >> > "msg":"java.io.IOException: 64 bit numeric collapse fields are not > >> > supported", > >> > "trace":"java.lang.RuntimeException: java.io.IOException: 64 bit > >> > numeric collapse fields are not supported\r\n\tat > >> > > >> > > >> > Is it possible to collapsed on fields which has a long integer of > data, > >> > like content from a rich text document? > >> > > >> > Regards, > >> > Edwin > >> > > >> > > >> > On 31 August 2015 at 18:59, Jan Høydahl <jan....@cominvent.com> > wrote: > >> > > >> >> Hi > >> >> > >> >> Check out the CollapsingQParser ( > >> >> > >> > https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results > >> ). > >> >> As long as you have a field that will be the same for all duplicates, > >> you > >> >> can “collapse” on that field. If you not have a “group id”, you can > >> create > >> >> one using e.g. an MD5 signature of the identical body text ( > >> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication). > >> >> > >> >> -- > >> >> Jan Høydahl, search solution architect > >> >> Cominvent AS - www.cominvent.com > >> >> > >> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo < > >> edwinye...@gmail.com > >> >> >: > >> >> > > >> >> > Hi, > >> >> > > >> >> > I'm using Solr 5.2.1, and I would like to find out, what is the > best > >> way > >> >> to > >> >> > get Solr to return only distinct results? > >> >> > > >> >> > Currently, I've indexed several exact similar documents into Solr, > >> with > >> >> > just different id and title, but the content is exactly the same. > >> When I > >> >> do > >> >> > a search, Solr will return all these documents several time in the > >> list. > >> >> > > >> >> > What is the most suitable way to get Solr to return only one of the > >> >> > document during the search? > >> >> > I understand that there is result grouping and faceting, but I'm > not > >> sure > >> >> > if that is the best way. > >> >> > > >> >> > Regards, > >> >> > Edwin > >> >> > >> >> > >> >