Re: mixed index with commongrams

David Hastings Thu, 03 Aug 2017 09:05:45 -0700

Haven't really looked much into that, here is a snipped form todays gc log,
if you wouldn't mind shedding any details on it:


2017-08-03T11:46:16.265-0400: 3200938.383: [GC (Allocation Failure)
2017-08-03T11:46:16.265-0400: 3200938.383: [ParNew
Desired survivor size 1966060336 bytes, new threshold 8 (max 8)
- age   1:  128529184 bytes,  128529184 total
- age   2:   43075632 bytes,  171604816 total
- age   3:   64402592 bytes,  236007408 total
- age   4:   35621704 bytes,  271629112 total
- age   5:   44285584 bytes,  315914696 total
- age   6:   45372512 bytes,  361287208 total
- age   7:   41975368 bytes,  403262576 total
- age   8:   72959688 bytes,  476222264 total
: 9133992K->577219K(10666688K), 0.2730329 secs]
23200886K->14693007K(49066688K), 0.2732690 secs] [Times: user=2.01
sys=0.01, real=0.28 secs]
Heap after GC invocations=12835 (full 109):
 par new generation   total 10666688K, used 577219K [0x00007f8023000000,
0x00007f8330400000, 0x00007f8330400000)
  eden space 8533376K,   0% used [0x00007f8023000000, 0x00007f8023000000,
0x00007f822bd60000)
  from space 2133312K,  27% used [0x00007f82ae0b0000, 0x00007f82d1460d98,
0x00007f8330400000)
  to   space 2133312K,   0% used [0x00007f822bd60000, 0x00007f822bd60000,
0x00007f82ae0b0000)
 concurrent mark-sweep generation total 38400000K, used 14115788K
[0x00007f8330400000, 0x00007f8c58000000, 0x00007f8c58000000)
 Metaspace       used 36698K, capacity 37169K, committed 37512K, reserved
38912K
}





On Thu, Aug 3, 2017 at 11:58 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> How long are your GC pauses? Those affect all queries, so they make the
> 99th percentile slow with queries that should be fast.
>
> The G1 collector has helped our 99th percentile.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Aug 3, 2017, at 8:48 AM, David Hastings <hastings.recurs...@gmail.com>
> wrote:
> >
> > Thanks, thats what i kind of expected.  still debating whether the space
> > increase is worth it, right now Im at .7% of searches taking longer than
> 10
> > seconds, and 6% taking longer than 1, so when i see things like this in
> the
> > morning it bugs me a bit:
> >
> > 2017-08-02 11:50:48 : 58979/1000 secs : ("Rules of Practice for the
> Courts
> > of Equity of the United States")
> > 2017-08-02 02:16:36 : 54749/1000 secs : ("The American Cause")
> > 2017-08-02 19:27:58 : 54561/1000 secs : ("register of the department of
> > justice")
> >
> > which could all be annihilated with CG's, at the expense, according to
> HT,
> > of a 40% increase in index size.
> >
> >
> >
> > On Thu, Aug 3, 2017 at 11:21 AM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> bq: will that search still return results form the earlier documents
> >> as well as the new ones
> >>
> >> In a word, "no". By definition the analysis chain applied at index
> >> time puts tokens in the index and that's all you have to search
> >> against for the doc unless and until you re-index the document.
> >>
> >> You really have two choices here:
> >> 1> live with the differing results until you get done re-indexing
> >> 2> index to an offline collection and then use, say, collection
> >> aliasing to make the switch atomically.
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Aug 3, 2017 at 8:07 AM, David Hastings
> >> <hastings.recurs...@gmail.com> wrote:
> >>> Hey all, I have yet to run an experiment to test this but was wondering
> >> if
> >>> anyone knows the answer ahead of time.
> >>> If i have an index built with documents before implementing the
> >> commongrams
> >>> filter, then enable it, and start adding documents that have the
> >>> filter/tokenizer applied, will searches that fit the criteria, for
> >> example:
> >>> "to be or not to be"
> >>> will that search still return results form the earlier documents as
> well
> >> as
> >>> the new ones?  The idea is that a full re-index is going to be
> difficult,
> >>> so would rather do it over time by replacing large numbers of documents
> >>> incrementally.  Thanks,
> >>> Dave
> >>
>
>

Re: mixed index with commongrams

Reply via email to