Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Yes, here it is. These are in solrconfig.xml dedupe true signature false content solr.processor.Lookup3Signature Regards, Edwin On 1 September 2015 at 22:26, Upayavira wrote: > Can you repeat the config you have for the dedup update chain? > > Thx > > On Tue, Sep 1, 2

Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
Can you repeat the config you have for the dedup update chain? Thx On Tue, Sep 1, 2015, at 02:57 PM, Zheng Lin Edwin Yeo wrote: > Hi Upayavira, > > Yes, I tried with a completely new index. I found that once I added the > line below to my /update handler in solrconfig.xml, the indexing doesn't >

Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Upayavira, Yes, I tried with a completely new index. I found that once I added the line below to my /update handler in solrconfig.xml, the indexing doesn't work anymore. dedupe Besides that, it is also not able to do any deletion to the index when this line is added. Regards, Edwin On 1 S

Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Alexandre, Yes, the indexing works fine previously until the following line is added to my /update handler in solrconfig.xml. dedupe Regards, Edwin On 1 September 2015 at 20:25, Alexandre Rafalovitch wrote: > Do you mean that normally you do get stuff indexed but when you make >

Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
Have you tried with a completely clean index? Are you deduping, or just calculating the signature? Is it possible dedup is preventing your documents from indexing (because it thinks they are dups)? On Tue, Sep 1, 2015, at 09:46 AM, Zheng Lin Edwin Yeo wrote: > Hi Upayavira, > > I've tried to chan

Re: Get distinct results in Solr

2015-09-01 Thread Alexandre Rafalovitch
Do you mean that normally you do get stuff indexed but when you make any of these changes the indexing stops working and you get empty index? If so, you probably misconfigured something and should be getting error messages. If, on the other hand, you see no changes, check that you are actually usi

Re: Get distinct results in Solr

2015-09-01 Thread Zheng Lin Edwin Yeo
Hi Upayavira, I've tried to change id to be signature, but nothing is indexed into Solr as well. Is that what you mean? Besides that, I've also included a copyField to copy the content field into the signature field. Both versions (with and without copyField) have nothing indexed into Solr. Rega

Re: Get distinct results in Solr

2015-09-01 Thread Upayavira
you are attempting to write your signature to your ID field. That's not a good idea. You are generating your signature from the content field, which seems okay. Change your id to be your 'signature' field instead of id, and something different will happen :-) Upayavira On Tue, Sep 1, 2015, at 04:

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
I tried to follow the de-duplication guide, but after I configured it in solrconfig.xml and schema.xml, nothing is indexed into Solr, and there is no error message. I'm using SimplePostTool to index rich-text documents. Below are my configurations: In solrconfig.xml dedupe true

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Thank you for your advice Alexandre. Will try out the de-duplication from the link you gave. Regards, Edwin On 1 September 2015 at 10:34, Alexandre Rafalovitch wrote: > Re-read the question. You want to de-dupe on the full text-content. > > I would actually try to use the dedupe chain as per

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
Re-read the question. You want to de-dupe on the full text-content. I would actually try to use the dedupe chain as per the link I gave but put results into a separate string field. Then, you group on that field. You cannot actually group on the long text field, that would kill any performance. So

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Hi Alexandre, Will treating it as String affect the search or other functions like highlighting? Yes, the content must be in my index, unless I do a copyField to do de-duplication on that field.. Will that help? Regards, Edwin On 1 September 2015 at 10:04, Alexandre Rafalovitch wrote: > Can'

Re: Get distinct results in Solr

2015-08-31 Thread Alexandre Rafalovitch
Can't you just treat it as String? Also, do you actually want those documents in your index in the first place? If not, have you looked at De-duplication: https://cwiki.apache.org/confluence/display/solr/De-Duplication Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a ne

Re: Get distinct results in Solr

2015-08-31 Thread Zheng Lin Edwin Yeo
Thanks Jan. But I read that the field that is being collapsed on must be a single valued String, Int or Float. As I'm required to get the distinct results from "content" field that was indexed from a rich text document, I got the following error: "error":{ "msg":"java.io.IOException: 64 bit

Re: Get distinct results in Solr

2015-08-31 Thread Jan Høydahl
Hi Check out the CollapsingQParser (https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results). As long as you have a field that will be the same for all duplicates, you can “collapse” on that field. If you not have a “group id”, you can create one using e.g. an MD5 signatur