Re-read the question. You want to de-dupe on the full text-content.

I would actually try to use the dedupe chain as per the link I gave
but put results into a separate string field. Then, you group on that
field. You cannot actually group on the long text field, that would
kill any performance. So a signature is your proxy.

Regards,
   Alex
----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 31 August 2015 at 22:26, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:
> Hi Alexandre,
>
> Will treating it as String affect the search or other functions like
> highlighting?
>
> Yes, the content must be in my index, unless I do a copyField to do
> de-duplication on that field.. Will that help?
>
> Regards,
> Edwin
>
>
> On 1 September 2015 at 10:04, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Can't you just treat it as String?
>>
>> Also, do you actually want those documents in your index in the first
>> place? If not, have you looked at De-duplication:
>> https://cwiki.apache.org/confluence/display/solr/De-Duplication
>>
>> Regards,
>>    Alex.
>> ----
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 31 August 2015 at 22:00, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
>> wrote:
>> > Thanks Jan.
>> >
>> > But I read that the field that is being collapsed on must be a single
>> > valued String, Int or Float. As I'm required to get the distinct results
>> > from "content" field that was indexed from a rich text document, I got
>> the
>> > following error:
>> >
>> >   "error":{
>> >     "msg":"java.io.IOException: 64 bit numeric collapse fields are not
>> > supported",
>> >     "trace":"java.lang.RuntimeException: java.io.IOException: 64 bit
>> > numeric collapse fields are not supported\r\n\tat
>> >
>> >
>> > Is it possible to collapsed on fields which has a long integer of data,
>> > like content from a rich text document?
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 31 August 2015 at 18:59, Jan Høydahl <jan....@cominvent.com> wrote:
>> >
>> >> Hi
>> >>
>> >> Check out the CollapsingQParser (
>> >>
>> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>> ).
>> >> As long as you have a field that will be the same for all duplicates,
>> you
>> >> can “collapse” on that field. If you not have a “group id”, you can
>> create
>> >> one using e.g. an MD5 signature of the identical body text (
>> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication).
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com
>> >>
>> >> > 31. aug. 2015 kl. 12.03 skrev Zheng Lin Edwin Yeo <
>> edwinye...@gmail.com
>> >> >:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I'm using Solr 5.2.1, and I would like to find out, what is the best
>> way
>> >> to
>> >> > get Solr to return only distinct results?
>> >> >
>> >> > Currently, I've indexed several exact similar documents into Solr,
>> with
>> >> > just different id and title, but the content is exactly the same.
>> When I
>> >> do
>> >> > a search, Solr will return all these documents several time in the
>> list.
>> >> >
>> >> > What is the most suitable way to get Solr to return only one of the
>> >> > document during the search?
>> >> > I understand that there is result grouping and faceting, but I'm not
>> sure
>> >> > if that is the best way.
>> >> >
>> >> > Regards,
>> >> > Edwin
>> >>
>> >>
>>

Reply via email to