On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
I have a solr core with some 20 fields in it.(all are stored and indexed). For 
an environment, the number of documents are around 0.29 million. When I run the 
full import through DIH, indexing is completing successfully. But, it is 
occupying the disk space of around 5 GB. Is there a possibility where I can go 
and check, which document is consuming more memory? Put in another way, can I 
sort the index based on size?

I am not aware of any way to do that.  Might be one that I don't know about, but if there were a way, seems like I would have come across it before.

It is not very that the large index size is due to a single document or a handful of documents.  It is more likely that most documents are relatively large.  I could be wrong about that, though.

If you have 290000 documents (which is how I interpreted 0.29 million) and the total index size is about 5 GB, then the average size per document in the index is about 18 kilobytes.This is in my view pretty large.  Typically I think that most documents are 1-2 kilobytes.

Can we get your Solr version, a copy of your schema, and exactly what Solr returns in search results for a typically sized document?  You'll need to use a paste website or a file-sharing website ... if you try to attach these things to a message, the mailing list will most likely eat them, and we'll never see them. If you need to redact the information in search results ... please do it in a way that we can still see the exact size of the text -- don't just remove information, replace it with information that's the same length.

Thanks,
Shawn

Reply via email to