Hi!

With the provided test I've profiled the preceding() and following()
calls on the base Java iterators in the different options.

=== default highlighter arguments ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1130 calls of
baseIter.preceding() took 1.039629 seconds in total
- from LengthGoalBreakIterator.following(): 1140 calls of
baseIter.following() took 0.340679 seconds in total
- from LengthGoalBreakIterator.preceding(): 1150 calls of
baseIter.preceding() took 0.099344 seconds in total
- from LengthGoalBreakIterator.preceding(): 1100 calls of
baseIter.following() took 0.015156 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1200 calls of
baseIter.preceding() took 0.001006 seconds in total
- from LengthGoalBreakIterator.following(): 1700 calls of
baseIter.following() took 0.006278 seconds in total
- from LengthGoalBreakIterator.preceding(): 1710 calls of
baseIter.preceding() took 0.016320 seconds in total
- from LengthGoalBreakIterator.preceding(): 1090 calls of
baseIter.following() took 0.000527 seconds in total

=== hl.fragsizeIsMinimum=true&hl.fragAlignRatio=0 ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 860 calls of
baseIter.following() took 0.012593 seconds in total
- from LengthGoalBreakIterator.preceding(): 870 calls of
baseIter.preceding() took 0.022208 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1360 calls of
baseIter.following() took 0.004789 seconds in total
- from LengthGoalBreakIterator.preceding(): 1370 calls of
baseIter.preceding() took 0.015983 seconds in total

=== hl.fragsizeIsMinimum=true ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 980 calls of
baseIter.following() took 0.010253 seconds in total
- from LengthGoalBreakIterator.preceding(): 980 calls of
baseIter.preceding() took 0.341997 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 1670 calls of
baseIter.following() took 0.005150 seconds in total
- from LengthGoalBreakIterator.preceding(): 1680 calls of
baseIter.preceding() took 0.013657 seconds in total

=== hl.fragAlignRatio=0 ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1070 calls of
baseIter.preceding() took 1.312056 seconds in total
- from LengthGoalBreakIterator.following(): 1080 calls of
baseIter.following() took 0.678575 seconds in total
- from LengthGoalBreakIterator.preceding(): 1080 calls of
baseIter.preceding() took 0.020507 seconds in total
- from LengthGoalBreakIterator.preceding(): 1080 calls of
baseIter.following() took 0.006977 seconds in total

Calling the test query with WORD base iterator:
- from LengthGoalBreakIterator.following(): 880 calls of
baseIter.preceding() took 0.000706 seconds in total
- from LengthGoalBreakIterator.following(): 1370 calls of
baseIter.following() took 0.004110 seconds in total
- from LengthGoalBreakIterator.preceding(): 1380 calls of
baseIter.preceding() took 0.014752 seconds in total
- from LengthGoalBreakIterator.preceding(): 1380 calls of
baseIter.following() took 0.000106 seconds in total

There is definitely a big difference between SENTENCE and WORD. I'm
not sure how we can improve the logic on our side while keeping the
features as is. Since the number of calls is roughly the same for when
the performance is good and bad, it seems to depend on what the text
is that the iterator is traversing.

Reply via email to