Really depends on what you consider too large, and why the size is a big issue, since most replication will go at about 100mg/second give or take, and replicating a 300GB index is only an hour or two. What i do for this purpose is store my text in a separate index altogether, and call on that core for highlighting. So for my use case, the primary index with no stored text is around 300GB and replicates as needed, and the full text indexes with stored text totals around 500GB and are replicating non stop. All searching goes against the primary index, and for highlighting i call on the full text indexes that have a stupid simple schema. This has worked for me pretty well at least.
On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla <roman.ch...@gmail.com> wrote: > Hello, > > We have a use case of a very large index (slave-master; for unrelated > reasons the search cannot work in the cloud mode) - one of the fields is a > very large text, stored mostly for highlighting. To cut down the index size > (for purposes of replication/scaling) I thought I could try to save it in a > database - and not in the index. > > Lucene has codecs - one of the methods is for 'stored field', so that seems > likes a natural path for me. > > However, I'd expect somebody else before had a similar problem. I googled > and couldn't find any solutions. Using the codecs seems really good thing > for this particular problem, am I missing something? Is there a better way > to cut down on index size? (besides solr cloud/sharding, compression) > > Thank you, > > Roman >