One more tidbit: are you really sure you need all 20 fields to be indexed and stored? Do you really need all those 20 fields?
See this blog post, for example: https://www.garysieling.com/blog/tuning-solr-lucene-disk-usage On Mon, Nov 19, 2018 at 1:45 PM Walter Underwood <wun...@wunderwood.org> wrote: > > Worst case is 3X. That happens when there are no merges until the commit. > > With tlogs, worst case is more than that. I’ve seen humongous tlogs with a batch load and no hard commit until the end. If you do that several times, then you have a few old humongous tlogs. Bleah. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Nov 19, 2018, at 7:40 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > > > > Also a full import, assuming the documents were already indexed, will just > > double your index size until a merge/optimize is ran since you are just > > marking a document as deleted, not taking back any space, and then adding > > another completely new document on top of it. > > > > On Mon, Nov 19, 2018 at 10:36 AM Shawn Heisey <apa...@elyograg.org> wrote: > > > >> On 11/19/2018 2:31 AM, Srinivas Kashyap wrote: > >>> I have a solr core with some 20 fields in it.(all are stored and > >> indexed). For an environment, the number of documents are around 0.29 > >> million. When I run the full import through DIH, indexing is completing > >> successfully. But, it is occupying the disk space of around 5 GB. Is there > >> a possibility where I can go and check, which document is consuming more > >> memory? Put in another way, can I sort the index based on size? > >> > >> I am not aware of any way to do that. Might be one that I don't know > >> about, but if there were a way, seems like I would have come across it > >> before. > >> > >> It is not very that the large index size is due to a single document or > >> a handful of documents. It is more likely that most documents are > >> relatively large. I could be wrong about that, though. > >> > >> If you have 290000 documents (which is how I interpreted 0.29 million) > >> and the total index size is about 5 GB, then the average size per > >> document in the index is about 18 kilobytes.This is in my view pretty > >> large. Typically I think that most documents are 1-2 kilobytes. > >> > >> Can we get your Solr version, a copy of your schema, and exactly what > >> Solr returns in search results for a typically sized document? You'll > >> need to use a paste website or a file-sharing website ... if you try to > >> attach these things to a message, the mailing list will most likely eat > >> them, and we'll never see them. If you need to redact the information in > >> search results ... please do it in a way that we can still see the exact > >> size of the text -- don't just remove information, replace it with > >> information that's the same length. > >> > >> Thanks, > >> Shawn > >> > >>