Just as a sanity check, is this getting replicated many times, or further
scaled up... it sounds like about $3.50/mo of disk space on AWS and it
should all fit in ram on any decent sized server.. (i.e. any server that
looks like half or quarter of a decent laptop)

As a question, it's interesting but it doesn't yet sound like a problem
worth sweating.

On Mon, Nov 19, 2018, 3:29 PM Edward Ribeiro <edward.ribe...@gmail.com
wrote:

> One more tidbit: are you really sure you need all 20 fields to be indexed
> and stored? Do you really need all those 20 fields?
>
> See this blog post, for example:
> https://www.garysieling.com/blog/tuning-solr-lucene-disk-usage
>
> On Mon, Nov 19, 2018 at 1:45 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
> >
> > Worst case is 3X. That happens when there are no merges until the commit.
> >
> > With tlogs, worst case is more than that. I’ve seen humongous tlogs with
> a batch load and no hard commit until the end. If you do that several
> times, then you have a few old humongous tlogs. Bleah.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> > > On Nov 19, 2018, at 7:40 AM, David Hastings <
> hastings.recurs...@gmail.com> wrote:
> > >
> > > Also a full import, assuming the documents were already indexed, will
> just
> > > double your index size until a merge/optimize is ran since you are just
> > > marking a document as deleted, not taking back any space, and then
> adding
> > > another completely new document on top of it.
> > >
> > > On Mon, Nov 19, 2018 at 10:36 AM Shawn Heisey <apa...@elyograg.org>
> wrote:
> > >
> > >> On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
> > >>> I have a solr core with some 20 fields in it.(all are stored and
> > >> indexed). For an environment, the number of documents are around 0.29
> > >> million. When I run the full import through DIH, indexing is
> completing
> > >> successfully. But, it is occupying the disk space of around 5 GB. Is
> there
> > >> a possibility where I can go and check, which document is consuming
> more
> > >> memory? Put in another way, can I sort the index based on size?
> > >>
> > >> I am not aware of any way to do that.  Might be one that I don't know
> > >> about, but if there were a way, seems like I would have come across it
> > >> before.
> > >>
> > >> It is not very that the large index size is due to a single document
> or
> > >> a handful of documents.  It is more likely that most documents are
> > >> relatively large.  I could be wrong about that, though.
> > >>
> > >> If you have 290000 documents (which is how I interpreted 0.29 million)
> > >> and the total index size is about 5 GB, then the average size per
> > >> document in the index is about 18 kilobytes.This is in my view pretty
> > >> large.  Typically I think that most documents are 1-2 kilobytes.
> > >>
> > >> Can we get your Solr version, a copy of your schema, and exactly what
> > >> Solr returns in search results for a typically sized document?  You'll
> > >> need to use a paste website or a file-sharing website ... if you try
> to
> > >> attach these things to a message, the mailing list will most likely
> eat
> > >> them, and we'll never see them. If you need to redact the information
> in
> > >> search results ... please do it in a way that we can still see the
> exact
> > >> size of the text -- don't just remove information, replace it with
> > >> information that's the same length.
> > >>
> > >> Thanks,
> > >> Shawn
> > >>
> > >>
>

Reply via email to