Here's my PR, which includes some edits to the ref guide docs where I tried
to clarify these settings a little too.
https://github.com/apache/lucene-solr/pull/1651
~ David
On Sat, Jul 4, 2020 at 8:44 AM Nándor Mátravölgyi
wrote:
> I guess that's fair. Let's have hl.fragsizeIsMinimum=true as def
I guess that's fair. Let's have hl.fragsizeIsMinimum=true as default.
On 7/4/20, David Smiley wrote:
> I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of
> quality of the highlight since there are vastly more breaks to pick from.
> I think that setting is more useful in S
I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of
quality of the highlight since there are vastly more breaks to pick from.
I think that setting is more useful in SENTENCE mode if you can stand the
perf hit. If you agree, then why not just let this one default to "true"?
Since the issue seems to be affecting the highlighter differently
based on which mode it is using, having different defaults for the
modes could be explored.
WORD may have the new defaults as it has little effect on performance
and it creates nicer highlights.
SENTENCE should have the defaults tha
I think we should flip the default of hl.fragsizeIsMinimum to be 'true',
thus have the behavior close to what preceded 8.5.
(a) it was very recently (<= 8.4) the previous behavior and so may require
less tuning for users in 8.6 henceforth
(b) it's significantly faster for long text -- seems to be 2
Hi!
With the provided test I've profiled the preceding() and following()
calls on the base Java iterators in the different options.
=== default highlighter arguments ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1130 calls of
baseIter.precedin
Hi David,
sorry for my late answer. I created simple test scenarios on github
https://github.com/hlavki/solr-unified-highlighter-test[1]
There are 2 documents, both bigger sized.
Test method:
https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example/Highlight
Hi!
I've not been able to delve into this issue deeply, but it could be
useful to know that "fragsizeIsMinimum" and "fragAlignRatio" are new
parameters which have behavior changing default values.
Leaving those with their default values makes the comparison between
8.4 and 8.5 like apples to oran
try setting hl.fragsizeIsMinimum=true
I did some benchmarking and found that this helps quite a bit
BTW I used the highlights.alg benchmark file, with some changes to make it
more reflective of your scenario -- offsets in postings, and used "enwiki"
(english wikipedia) docs which are larger than
fine, I'l try to write simple test, thanks
On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote:
> Please create an issue. I haven't reproduced it yet but it seems unlikely
> to be user-error.
>
> ~ David
>
>
> On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote:
>
> > Hi,
> >
> > I have
Please create an issue. I haven't reproduced it yet but it seems unlikely
to be user-error.
~ David
On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote:
> Hi,
>
> I have field:
> stored="true" indexed="false" storeOffsetsWithPositions="true"/>
>
> and configuration:
> true
> unified
> true
>
Yes, have no problems in 8.4.1, only 8.5.1
Also yes, those are multi page pdf files.
m.
On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote:
> Wow that's terrible!
> So this problem is for SENTENCE in particular, and it's a regression in
> 8.5? I'll see if I can reproduce this with the Lu
Wow that's terrible!
So this problem is for SENTENCE in particular, and it's a regression in
8.5? I'll see if I can reproduce this with the Lucene benchmark module.
I figure you have some meaty text, like "page" size or longer?
~ David
On Mon, May 25, 2020 at 10:38 AM Michal Hlavac wrote:
>
I did same test on solr 8.4.1 and response times are same for both
hl.bs.type=SENTENCE and hl.bs.type=WORD
m.
On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote:
Hi,
I have field:
and configuration:
true
unified
true
content_txt_sk_highlight
2
true
Doing query with hl.bs.type=S
Hi,
I have field:
and configuration:
true
unified
true
content_txt_sk_highlight
2
true
Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is
really slow.
Same query with hl.bs.type=WORD takes from 8 - 45 ms
is this normal behaviour or should I create issue?
thanks, m.
15 matches
Mail list logo