romseygeek commented on PR #12357:
URL: https://github.com/apache/lucene/pull/12357#issuecomment-1582409156
I've only implemented this on `readByte()` so far, as that seems to be the
method that is effected most. Random reads of short, int and long values are
mostly done when binary searching which is a much less adversarial case than
step-by-step backwards reading.
I ran a wikimedium10k benchmark using `NIOFSDirectory` for both baseline and
competitor, and got the following results:
```
TaskQPS baseline StdDevQPS my_modified_version StdDev
Pct diff p-value
BrowseDayOfYearTaxoFacets 2514.67 (5.2%) 2202.60
(7.0%) -12.4% ( -23% - 0%) 0.000
BrowseDayOfYearSSDVFacets 4397.09 (10.1%) 3875.61
(13.3%) -11.9% ( -31% - 12%) 0.001
BrowseDateTaxoFacets 2974.26 (6.5%) 2641.17
(6.0%) -11.2% ( -22% - 1%) 0.000
BrowseRandomLabelSSDVFacets 1189.01 (4.1%) 1067.62
(6.2%) -10.2% ( -19% - 0%) 0.000
BrowseRandomLabelTaxoFacets 1601.99 (5.7%) 1438.85
(6.2%) -10.2% ( -20% - 1%) 0.000
BrowseMonthTaxoFacets 2524.45 (8.1%) 2273.63
(7.1%) -9.9% ( -23% - 5%) 0.000
BrowseDateSSDVFacets 1754.54 (10.7%) 1616.39
(8.4%) -7.9% ( -24% - 12%) 0.009
IntNRQ 1604.64 (7.7%) 1485.48
(8.9%) -7.4% ( -22% - 9%) 0.005
BrowseMonthSSDVFacets 4473.87 (11.2%) 4179.28
(10.4%) -6.6% ( -25% - 16%) 0.053
Prefix3 568.51 (3.9%) 982.22
(14.0%) 72.8% ( 52% - 94%) 0.000
MedSpanNear 262.90 (2.4%) 490.78
(10.3%) 86.7% ( 72% - 101%) 0.000
Wildcard 496.50 (3.3%) 1031.86
(17.1%) 107.8% ( 84% - 132%) 0.000
HighTermMonthSort 583.14 (4.4%) 1263.09
(19.9%) 116.6% ( 88% - 147%) 0.000
MedIntervalsOrdered 353.37 (2.3%) 767.89
(11.9%) 117.3% ( 100% - 134%) 0.000
LowSpanNear 254.88 (3.1%) 624.77
(16.6%) 145.1% ( 121% - 170%) 0.000
PKLookup 10.23 (1.7%) 25.13
(8.6%) 145.6% ( 133% - 158%) 0.000
OrHighHigh 307.39 (3.1%) 762.93
(18.4%) 148.2% ( 122% - 175%) 0.000
OrHighMed 384.51 (3.3%) 958.60
(19.5%) 149.3% ( 122% - 178%) 0.000
HighPhrase 304.14 (2.9%) 765.04
(14.6%) 151.5% ( 130% - 174%) 0.000
HighSpanNear 317.72 (3.1%) 804.30
(18.0%) 153.1% ( 127% - 179%) 0.000
AndHighHigh 398.97 (3.4%) 1012.15
(21.7%) 153.7% ( 124% - 185%) 0.000
HighTermDayOfYearSort 582.77 (2.7%) 1495.18
(23.5%) 156.6% ( 126% - 187%) 0.000
HighTerm 798.23 (2.8%) 2094.10
(22.5%) 162.3% ( 133% - 193%) 0.000
MedSloppyPhrase 414.51 (2.7%) 1089.03
(28.6%) 162.7% ( 127% - 199%) 0.000
Respell 68.25 (2.2%) 187.81
(14.7%) 175.2% ( 154% - 196%) 0.000
Fuzzy2 14.66 (1.7%) 41.38
(13.1%) 182.3% ( 164% - 200%) 0.000
HighSloppyPhrase 331.71 (2.1%) 946.59
(22.2%) 185.4% ( 157% - 214%) 0.000
LowIntervalsOrdered 823.97 (2.7%) 2387.93
(19.5%) 189.8% ( 163% - 217%) 0.000
MedTerm 770.00 (4.3%) 2234.31
(30.4%) 190.2% ( 149% - 234%) 0.000
LowPhrase 358.61 (3.0%) 1080.46
(26.5%) 201.3% ( 166% - 238%) 0.000
OrHighLow 365.95 (2.8%) 1106.13
(23.3%) 202.3% ( 171% - 235%) 0.000
AndHighMed 352.26 (2.5%) 1135.10
(26.1%) 222.2% ( 189% - 257%) 0.000
HighIntervalsOrdered 295.50 (2.7%) 971.90
(32.3%) 228.9% ( 188% - 271%) 0.000
MedPhrase 317.77 (2.7%) 1083.18
(29.9%) 240.9% ( 202% - 281%) 0.000
LowTerm 932.40 (2.8%) 3290.65
(24.1%) 252.9% ( 219% - 287%) 0.000
Fuzzy1 37.84 (1.6%) 137.03
(17.0%) 262.1% ( 239% - 285%) 0.000
LowSloppyPhrase 313.05 (3.1%) 1253.27
(30.8%) 300.3% ( 258% - 344%) 0.000
AndHighLow 404.61 (2.6%) 1947.42
(38.5%) 381.3% ( 331% - 433%) 0.000
```
I'm not sure what's happening to slow down the facets implementation, so I
will dig further into that, but it's a clear win for terms-based queries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]