mikemccand commented on pull request #518:
URL: https://github.com/apache/lucene/pull/518#issuecomment-999027826
OK, thank you @uschindler and @rmuir for helping me debug the tricky setup!
I ran this `perf.py` using `luceneutil`:
```
import sys
sys.path.insert(0, '/l/util/src/python')
import competition
if __name__ == '__main__':
sourceData = competition.sourceData()
comp = competition.Competition()
checkout = 'trunk'
checkoutNewMMap = 'trunk-new-mmap'
index = comp.newIndex(checkout, sourceData, numThreads=12,
addDVFields=True, verbose=True,
grouping=False, useCMS=True,
javaCommand='/opt/jdk-18-ea-28/bin/java
--add-modules jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC
-Djava.io.tmpdir=/l/tmp',
analyzer = 'StandardAnalyzerNoStopWords',
facets = (('taxonomy:Date', 'Date'),
('taxonomy:Month', 'Month'),
('taxonomy:DayOfYear', 'DayOfYear'),
('taxonomy:RandomLabel.taxonomy',
'RandomLabel'),
('sortedset:Month', 'Month'),
('sortedset:DayOfYear', 'DayOfYear'),
('sortedset:RandomLabel.sortedset',
'RandomLabel')))
comp.competitor('base', checkout, index=index,
javacCommand='/opt/jdk-18-ea-28/bin/javac',
javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules
jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC
-Djava.io.tmpdir=/l/tmp')
comp.competitor('new-mmap', checkoutNewMMap, index=index,
javacCommand='/opt/jdk-18-ea-28/bin/javac',
javaCommand='/opt/jdk-18-ea-28/bin/java --add-modules
jdk.incubator.foreign -Xmx32g -Xms32g -server -XX:+UseParallelGC
-Djava.io.tmpdir=/l/tmp')
comp.benchmark('new-mmap')
```
I set my `JAVA_HOME` to JDK 17 (`17.0.1+12-LTS-39`) and `RUNTIME_JAVA_HOME`
to JDK 18-ea-b28 (`18-ea+28-1975`). I used git commit
`119c7c29ae697a52c91116f2414f973509830267` from Lucene `main`, and then
@uschindler's branch behind this PR.
Here's the results after 20 JVM iterations:
```
Task QPS base StdDevQPS new-mmap
StdDev Pct diff p-value
BrowseMonthSSDVFacets 8.07 (12.6%) 7.18
(13.4%) -11.0% ( -32% - 17%) 0.008
BrowseMonthTaxoFacets 4.67 (5.7%) 4.33
(2.6%) -7.2% ( -14% - 1%) 0.000
BrowseRandomLabelSSDVFacets 5.34 (6.6%) 5.08
(6.4%) -4.9% ( -16% - 8%) 0.017
IntNRQ 49.91 (7.0%) 48.07
(2.3%) -3.7% ( -12% - 6%) 0.026
PKLookup 126.62 (4.6%) 122.06
(3.4%) -3.6% ( -11% - 4%) 0.005
BrowseDayOfYearSSDVFacets 7.46 (12.8%) 7.28
(16.8%) -2.5% ( -28% - 31%) 0.598
Respell 25.49 (1.1%) 24.97
(1.2%) -2.1% ( -4% - 0%) 0.000
Fuzzy1 40.18 (1.5%) 39.52
(1.4%) -1.7% ( -4% - 1%) 0.000
Fuzzy2 31.18 (1.8%) 30.67
(1.5%) -1.6% ( -4% - 1%) 0.002
HighSloppyPhrase 19.11 (5.7%) 18.99
(5.2%) -0.6% ( -10% - 10%) 0.710
Wildcard 59.01 (6.8%) 58.89
(6.9%) -0.2% ( -13% - 14%) 0.926
LowSloppyPhrase 14.92 (3.7%) 14.92
(3.4%) 0.0% ( -6% - 7%) 0.978
MedSloppyPhrase 117.00 (3.7%) 117.28
(3.2%) 0.2% ( -6% - 7%) 0.829
MedTermDayTaxoFacets 22.39 (3.3%) 22.51
(4.2%) 0.5% ( -6% - 8%) 0.649
Prefix3 62.59 (5.3%) 62.99
(5.8%) 0.6% ( -9% - 12%) 0.713
BrowseRandomLabelTaxoFacets 3.93 (3.9%) 3.95
(6.3%) 0.7% ( -9% - 11%) 0.669
LowTerm 678.95 (3.2%) 684.44
(4.4%) 0.8% ( -6% - 8%) 0.505
OrHighMed 61.65 (2.9%) 62.22
(2.1%) 0.9% ( -3% - 6%) 0.252
AndHighHighDayTaxoFacets 5.64 (4.5%) 5.70
(4.1%) 1.0% ( -7% - 10%) 0.450
OrHighHigh 16.45 (3.1%) 16.63
(2.3%) 1.1% ( -4% - 6%) 0.220
MedPhrase 157.72 (2.1%) 159.52
(2.5%) 1.1% ( -3% - 5%) 0.117
HighPhrase 110.71 (3.9%) 112.10
(2.7%) 1.3% ( -5% - 8%) 0.237
OrHighLow 270.14 (3.2%) 274.07
(3.0%) 1.5% ( -4% - 7%) 0.135
HighTermTitleBDVSort 7.37 (3.7%) 7.49
(3.2%) 1.5% ( -5% - 8%) 0.170
AndHighHigh 44.95 (5.4%) 45.63
(4.6%) 1.5% ( -7% - 12%) 0.336
HighSpanNear 7.27 (6.4%) 7.39
(5.2%) 1.6% ( -9% - 14%) 0.390
BrowseDayOfYearTaxoFacets 4.37 (7.5%) 4.45
(9.8%) 1.8% ( -14% - 20%) 0.512
AndHighMedDayTaxoFacets 63.88 (2.6%) 65.05
(1.3%) 1.8% ( -2% - 5%) 0.005
BrowseDateTaxoFacets 4.37 (7.6%) 4.45
(10.0%) 1.8% ( -14% - 20%) 0.513
TermDTSort 379.61 (2.6%) 386.94
(2.2%) 1.9% ( -2% - 6%) 0.011
OrHighMedDayTaxoFacets 5.48 (3.4%) 5.59
(4.5%) 2.0% ( -5% - 10%) 0.113
MedSpanNear 3.79 (2.3%) 3.86
(3.7%) 2.0% ( -3% - 8%) 0.042
HighTermDayOfYearSort 1151.05 (4.4%) 1174.57
(6.2%) 2.0% ( -8% - 13%) 0.227
AndHighMed 56.38 (5.3%) 57.64
(5.9%) 2.2% ( -8% - 14%) 0.208
HighTerm 976.99 (6.7%) 1002.21
(6.8%) 2.6% ( -10% - 17%) 0.225
LowIntervalsOrdered 12.43 (4.8%) 12.77
(5.2%) 2.8% ( -6% - 13%) 0.079
LowSpanNear 9.60 (2.4%) 9.87
(1.4%) 2.8% ( 0% - 6%) 0.000
OrHighNotMed 598.12 (4.1%) 614.79
(4.2%) 2.8% ( -5% - 11%) 0.034
HighTermMonthSort 42.77 (14.2%) 44.03
(19.5%) 3.0% ( -26% - 42%) 0.584
MedIntervalsOrdered 29.73 (4.0%) 30.68
(4.5%) 3.2% ( -5% - 12%) 0.017
OrNotHighHigh 555.82 (3.9%) 573.67
(4.3%) 3.2% ( -4% - 11%) 0.013
HighIntervalsOrdered 4.36 (6.5%) 4.50
(5.9%) 3.3% ( -8% - 16%) 0.094
OrHighNotLow 699.58 (5.0%) 723.40
(5.0%) 3.4% ( -6% - 14%) 0.031
OrNotHighMed 511.29 (3.9%) 529.02
(3.6%) 3.5% ( -3% - 11%) 0.004
OrNotHighLow 419.51 (3.9%) 434.62
(2.6%) 3.6% ( -2% - 10%) 0.000
LowPhrase 241.42 (3.2%) 250.97
(2.1%) 4.0% ( -1% - 9%) 0.000
OrHighNotHigh 562.96 (3.9%) 585.87
(3.9%) 4.1% ( -3% - 12%) 0.001
AndHighLow 293.83 (5.5%) 306.09
(1.8%) 4.2% ( -2% - 12%) 0.001
MedTerm 1022.47 (6.6%) 1066.29
(4.4%) 4.3% ( -6% - 16%) 0.015
```
SSDV and Taxo facets maybe got a bit slower, and lots of queries got a bit
faster.
This was the merged CPU profile results for this new `mmap` impl:
```
PROFILE SUMMARY from 894683 events (total: 894683)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
4.27% 38211
org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc()
4.15% 37164
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.56% 31835
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
2.93% 26214
org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
2.87% 25641
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
2.47% 22090
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.43% 21784
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
2.17% 19392
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
2.10% 18801 org.apache.lucene.search.ConjunctionDISI#doNext()
2.10% 18781
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.97% 17597
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
1.93% 17238
jdk.internal.foreign.AbstractMemorySegmentImpl#checkBoundsSmall()
1.85% 16576
jdk.internal.misc.ScopedMemoryAccess#getByteInternal()
1.81% 16231
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.74% 15561
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.73% 15498
org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl#readByte()
1.53% 13721
jdk.internal.misc.ScopedMemoryAccess#getIntUnalignedInternal()
1.49% 13317
jdk.internal.foreign.AbstractMemorySegmentImpl#isSet()
1.38% 12362
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
1.34% 12016
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
1.16% 10395 org.apache.lucene.search.TermScorer#score()
1.16% 10338
jdk.internal.foreign.AbstractMemorySegmentImpl#checkBounds()
1.10% 9856
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
1.01% 9014
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.96% 8580
jdk.internal.foreign.SharedScope#checkValidState()
0.93% 8349
org.apache.lucene.index.SingletonSortedSetDocValues#getValueCount()
0.90% 8020
org.apache.lucene.search.ScoreCachingWrappingScorer#score()
0.86% 7654
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance()
0.82% 7361
org.apache.lucene.queries.spans.SpanScorer#setFreqCurrentDoc()
0.82% 7328
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
```
versus baseline CPU JFR profiler results:
```
PROFILE SUMMARY from 894453 events (total: 894453)
tests.profile.mode=cpu
tests.profile.count=30
tests.profile.stacksize=1
tests.profile.linenumbers=false
PERCENT CPU SAMPLES STACK
5.93% 53070
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$20#ordValue()
4.26% 38078
org.apache.lucene.index.SingletonSortedNumericDocValues#nextDoc()
3.84% 34318
org.apache.lucene.facet.sortedset.SortedSetDocValuesFacetCounts#countOneSegment()
3.65% 32685 jdk.internal.misc.Unsafe#convEndian()
2.86% 25554
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
2.74% 24483
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#nextPosition()
2.64% 23617
org.apache.lucene.facet.taxonomy.FastTaxonomyFacetCounts#countAll()
2.18% 19515 org.apache.lucene.search.ConjunctionDISI#doNext()
2.17% 19373
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
2.12% 18958
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$3#longValue()
1.93% 17298
org.apache.lucene.util.packed.DirectReader$DirectPackedReader20#get()
1.93% 17258
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#advance()
1.82% 16284
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$EverythingEnum#skipPositions()
1.75% 15647 org.apache.lucene.search.TermScorer#score()
1.71% 15292
org.apache.lucene.codecs.lucene90.ForUtil#expand8()
1.67% 14979
org.apache.lucene.queries.intervals.OrderedIntervalsSource$OrderedIntervalIterator#nextInterval()
1.65% 14744
org.apache.lucene.store.ByteBufferGuard#ensureValid()
1.57% 14061
org.apache.lucene.queries.spans.NearSpansOrdered#stretchToOrder()
1.15% 10247
org.apache.lucene.queries.spans.TermSpans#nextStartPosition()
1.14% 10222 java.util.Objects#checkIndex()
1.12% 9990 java.nio.Buffer#scope()
1.06% 9459 org.apache.lucene.store.ByteBufferGuard#getByte()
0.98% 8724
org.apache.lucene.queries.intervals.IntervalFilter#nextInterval()
0.91% 8179
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
0.88% 7906
org.apache.lucene.search.ScoreCachingWrappingScorer#score()
0.88% 7867
org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
0.87% 7823
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$DenseNumericDocValues#advance()
0.87% 7789 org.apache.lucene.store.ByteBufferGuard#getInt()
0.84% 7518
org.apache.lucene.facet.taxonomy.IntTaxonomyFacets#increment()
0.74% 6639
org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3#longValue()
```
It's curious how costly `SingletonSortedNumericDocValues#nextDoc` is. I
think these facet fields are dense.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]