Haven't really looked much into that, here is a snipped form todays gc log, if you wouldn't mind shedding any details on it:
2017-08-03T11:46:16.265-0400: 3200938.383: [GC (Allocation Failure) 2017-08-03T11:46:16.265-0400: 3200938.383: [ParNew Desired survivor size 1966060336 bytes, new threshold 8 (max 8) - age 1: 128529184 bytes, 128529184 total - age 2: 43075632 bytes, 171604816 total - age 3: 64402592 bytes, 236007408 total - age 4: 35621704 bytes, 271629112 total - age 5: 44285584 bytes, 315914696 total - age 6: 45372512 bytes, 361287208 total - age 7: 41975368 bytes, 403262576 total - age 8: 72959688 bytes, 476222264 total : 9133992K->577219K(10666688K), 0.2730329 secs] 23200886K->14693007K(49066688K), 0.2732690 secs] [Times: user=2.01 sys=0.01, real=0.28 secs] Heap after GC invocations=12835 (full 109): par new generation total 10666688K, used 577219K [0x00007f8023000000, 0x00007f8330400000, 0x00007f8330400000) eden space 8533376K, 0% used [0x00007f8023000000, 0x00007f8023000000, 0x00007f822bd60000) from space 2133312K, 27% used [0x00007f82ae0b0000, 0x00007f82d1460d98, 0x00007f8330400000) to space 2133312K, 0% used [0x00007f822bd60000, 0x00007f822bd60000, 0x00007f82ae0b0000) concurrent mark-sweep generation total 38400000K, used 14115788K [0x00007f8330400000, 0x00007f8c58000000, 0x00007f8c58000000) Metaspace used 36698K, capacity 37169K, committed 37512K, reserved 38912K } On Thu, Aug 3, 2017 at 11:58 AM, Walter Underwood <wun...@wunderwood.org> wrote: > How long are your GC pauses? Those affect all queries, so they make the > 99th percentile slow with queries that should be fast. > > The G1 collector has helped our 99th percentile. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Aug 3, 2017, at 8:48 AM, David Hastings <hastings.recurs...@gmail.com> > wrote: > > > > Thanks, thats what i kind of expected. still debating whether the space > > increase is worth it, right now Im at .7% of searches taking longer than > 10 > > seconds, and 6% taking longer than 1, so when i see things like this in > the > > morning it bugs me a bit: > > > > 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the > Courts > > of Equity of the United States") > > 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause") > > 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of > > justice") > > > > which could all be annihilated with CG's, at the expense, according to > HT, > > of a 40% increase in index size. > > > > > > > > On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> bq: will that search still return results form the earlier documents > >> as well as the new ones > >> > >> In a word, "no". By definition the analysis chain applied at index > >> time puts tokens in the index and that's all you have to search > >> against for the doc unless and until you re-index the document. > >> > >> You really have two choices here: > >> 1> live with the differing results until you get done re-indexing > >> 2> index to an offline collection and then use, say, collection > >> aliasing to make the switch atomically. > >> > >> Best, > >> Erick > >> > >> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings > >> <hastings.recurs...@gmail.com> wrote: > >>> Hey all, I have yet to run an experiment to test this but was wondering > >> if > >>> anyone knows the answer ahead of time. > >>> If i have an index built with documents before implementing the > >> commongrams > >>> filter, then enable it, and start adding documents that have the > >>> filter/tokenizer applied, will searches that fit the criteria, for > >> example: > >>> "to be or not to be" > >>> will that search still return results form the earlier documents as > well > >> as > >>> the new ones? The idea is that a full re-index is going to be > difficult, > >>> so would rather do it over time by replacing large numbers of documents > >>> incrementally. Thanks, > >>> Dave > >> > >