ighlighting.
> > > Regarding the existing bug, I think there might be an additional issue
> > > here because it happens only when id field contains an underscore
> (didn't
> > > check for other special characters).
> > > Currently I have no other choice but to
issue
> > here because it happens only when id field contains an underscore (didn't
> > check for other special characters).
> > Currently I have no other choice but to use enableLazyFieldLoading=false.
> > I hope it wouldn't have a significant performance impact.
&
formance impact.
>
> -Original Message-
> From: David Smiley
> Sent: יום ה 18 פברואר 2021 01:03
> To: solr-user
> Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field
> Loading => Invalid Index
>
> I think the issue is this existing bug, but
but to use enableLazyFieldLoading=false. I
hope it wouldn't have a significant performance impact.
-Original Message-
From: David Smiley
Sent: יום ה 18 פברואר 2021 01:03
To: solr-user
Subject: Re: Atomic Update (nested), Unified Highlighter and Lazy Field Loading
=> Invalid Index
; termVectors="true" termOffsets="true" termPositions="true"
>> required="false" multiValued="true" />
>> Than I inserted one document with a nested child e.g.
>> {id:"abc_1", utterances:{id:"abc_1-1", text_e
sets="true" termPositions="true" required="false"
> multiValued="true" />
> Than I inserted one document with a nested child e.g.
> {id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
>
> To reproduce:
I inserted one document with a nested child e.g.
{id:"abc_1", utterances:{id:"abc_1-1", text_en:"Solr is great"}}
To reproduce:
Do a search with surround and unified highlighter:
hl.fl=text_en&hl.method=unified&hl=on&q=%7B!surround%7Dtext_en%3A4W("s
Hello Ronen,
Can you please file a JIRA issue? Some quick searches did not turn
anything up. It would be super helpful to me if you could list a series of
steps with Solr out-of-the-box in 8.8 including what data to index and
query. Solr already includes the "tech products" sample data; maybe t
Hi All,
I discovered a strange behaviour with this combination.
Not only the atomic update fails, the child documents are not properly
indexed, and you can't use highlights on their text fields. Currently there
is no workaround other than reindex.
Checked on 8.3.0, 8.6.1 and 8.8.0.
1. Configure n
Hi David,
Thanks for filing this issue. The classic non-weightMatcher mode works well
for us right now. Yes, we are using the POSTINGS mode for most of the
fields although explicitly mentioning it gives an error since not all
fields are indexed with offsets. So I guess the highlighter is picking
https://issues.apache.org/jira/browse/SOLR-10321 -- near the end my opinion
is we should just omit the field if there is no highlight, which would
address your need to do this work-around. Glob or no glob. PR welcome!
It's satisfying seeing that the Unified Highlighter is so much faster
On another note, since response time is in question, I have been using a
customhighlighter to just override the method encodeSnippets() in the
UnifiedSolrHighlighter class since solr 6 since Solr sends back blank array
(ZERO_LEN_STR_ARRAY) in the response payload for fields that do not match.
Here
Hi David,
Thanks so much for your reply.
hl.weightMatches was indeed the culprit. After setting it to false, I am
now getting the same sub-second response as Solr 6. I am using Solr 8.6.1
(8.6.1)
Here are the tests I carried out:
hl.requireFieldMatch=true&hl.weightMatches=true (2458 ms)
hl.requi
tches=true is now the default. Try setting it to false.
Does that help performance much? It's documented on the highlighting page
of the ref guide:
https://lucene.apache.org/solr/guide/8_7/highlighting.html#the-unified-highlighter
You might want to try toggling hl.requireFieldMatch=true (default
Hi,
While upgrading to Solr 8 from 6 the Unified highlighter begins to have
performance issues going from approximately 100ms to more than 4 seconds
with 76 fields in the hl.q and hl.fl parameters. So I played with
different options and found that the hl.q parameter needs to have any one
field
Here's my PR, which includes some edits to the ref guide docs where I tried
to clarify these settings a little too.
https://github.com/apache/lucene-solr/pull/1651
~ David
On Sat, Jul 4, 2020 at 8:44 AM Nándor Mátravölgyi
wrote:
> I guess that's fair. Let's have hl.fragsizeIsMinimum=true as def
I guess that's fair. Let's have hl.fragsizeIsMinimum=true as default.
On 7/4/20, David Smiley wrote:
> I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of
> quality of the highlight since there are vastly more breaks to pick from.
> I think that setting is more useful in S
I doubt that WORD mode is impacted much by hl.fragsizeIsMinimum in terms of
quality of the highlight since there are vastly more breaks to pick from.
I think that setting is more useful in SENTENCE mode if you can stand the
perf hit. If you agree, then why not just let this one default to "true"?
Since the issue seems to be affecting the highlighter differently
based on which mode it is using, having different defaults for the
modes could be explored.
WORD may have the new defaults as it has little effect on performance
and it creates nicer highlights.
SENTENCE should have the defaults tha
I think we should flip the default of hl.fragsizeIsMinimum to be 'true',
thus have the behavior close to what preceded 8.5.
(a) it was very recently (<= 8.4) the previous behavior and so may require
less tuning for users in 8.6 henceforth
(b) it's significantly faster for long text -- seems to be 2
Hi!
With the provided test I've profiled the preceding() and following()
calls on the base Java iterators in the different options.
=== default highlighter arguments ===
Calling the test query with SENTENCE base iterator:
- from LengthGoalBreakIterator.following(): 1130 calls of
baseIter.precedin
David –
It’s fine to take this conversation back to the mailing list. Thank you very
much again for your suggestions.
I think you are correct. It doesn’t appear necessary to set termOffsets, and
it appears that that the unified highlighter is using the TERM_VECTORS offset
source if I don’t
Hi David,
sorry for my late answer. I created simple test scenarios on github
https://github.com/hlavki/solr-unified-highlighter-test[1]
There are 2 documents, both bigger sized.
Test method:
https://github.com/hlavki/solr-unified-highlighter-test/blob/master/src/test/java/com/example
Hi!
I've not been able to delve into this issue deeply, but it could be
useful to know that "fragsizeIsMinimum" and "fragAlignRatio" are new
parameters which have behavior changing default values.
Leaving those with their default values makes the comparison between
8.4 and 8.5 like apples to oran
try setting hl.fragsizeIsMinimum=true
I did some benchmarking and found that this helps quite a bit
BTW I used the highlights.alg benchmark file, with some changes to make it
more reflective of your scenario -- offsets in postings, and used "enwiki"
(english wikipedia) docs which are larger than
fine, I'l try to write simple test, thanks
On utorok 26. mája 2020 17:44:52 CEST David Smiley wrote:
> Please create an issue. I haven't reproduced it yet but it seems unlikely
> to be user-error.
>
> ~ David
>
>
> On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote:
>
> > Hi,
> >
> > I have
Please create an issue. I haven't reproduced it yet but it seems unlikely
to be user-error.
~ David
On Mon, May 25, 2020 at 9:28 AM Michal Hlavac wrote:
> Hi,
>
> I have field:
> stored="true" indexed="false" storeOffsetsWithPositions="true"/>
>
> and configuration:
> true
> unified
> true
>
Yes, have no problems in 8.4.1, only 8.5.1
Also yes, those are multi page pdf files.
m.
On pondelok 25. mája 2020 19:11:31 CEST David Smiley wrote:
> Wow that's terrible!
> So this problem is for SENTENCE in particular, and it's a regression in
> 8.5? I'll see if I can reproduce this with the Lu
Wow that's terrible!
So this problem is for SENTENCE in particular, and it's a regression in
8.5? I'll see if I can reproduce this with the Lucene benchmark module.
I figure you have some meaty text, like "page" size or longer?
~ David
On Mon, May 25, 2020 at 10:38 AM Michal Hlavac wrote:
>
I did same test on solr 8.4.1 and response times are same for both
hl.bs.type=SENTENCE and hl.bs.type=WORD
m.
On pondelok 25. mája 2020 15:28:24 CEST Michal Hlavac wrote:
Hi,
I have field:
and configuration:
true
unified
true
content_txt_sk_highlight
2
true
Doing query with hl.bs.type=S
Hi,
I have field:
and configuration:
true
unified
true
content_txt_sk_highlight
2
true
Doing query with hl.bs.type=SENTENCE it takes around 1000 - 1300 ms which is
really slow.
Same query with hl.bs.type=WORD takes from 8 - 45 ms
is this normal behaviour or should I create issue?
thanks, m.
iginal Message-
> From: Jörn Franke [mailto:jornfra...@gmail.com]
> Sent: Sunday, May 24, 2020 1:22 PM
> To: solr-user@lucene.apache.org
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> hl.fragsize=0
>
> https://lucene.apache.org/solr/guide/8_
g the field data coming from meta-tags and not strip the html
>> tags)
>>
>> Then I could use solr.HTMLStripCharFilterFactory for analysis.
>>
>> Thank You,
>>
>> Serkan,
>>
>>
>>
>>
>> -Original Message-
>> From: Davi
d Smiley [mailto:dsmi...@apache.org]
> Sent: Sunday, May 24, 2020 5:26 PM
> To: solr-user
> Subject: Re: highlighting a whole html document using Unified highlighter
>
> Instead of stripping the HTML for the stored value, leave it be and remove
> it during the analysis stage with solr.HT
,
-Original Message-
From: David Smiley [mailto:dsmi...@apache.org]
Sent: Sunday, May 24, 2020 5:26 PM
To: solr-user
Subject: Re: highlighting a whole html document using Unified highlighter
Instead of stripping the HTML for the stored value, leave it be and remove
it during the analysis stage with
e=0
> parameter, it is displayed as original html document?
>
> Or
>
> Is it possible to give a whole html document as a parameter to the Unified
> highlighter so that output is also a highlighted html document?
>
> Or
>
> Do you have a better idea to highlight
document as a parameter to the Unified
highlighter so that output is also a highlighted html document?
Or
Do you have a better idea to highlight the keywords of the whole html document?
Thanks,
Serkan
-Original Message-
From: Jörn Franke [mailto:jornfra...@gmail.com]
Sent: Sunday
eywords that are used to
> find and access the document.
>
>
>
> Unified highlighter is fast, accurate and supports different languages but
> only highlights passages with given parameters.
>
>
>
> How can I highlight a whole html document using Unified highlig
Hi,
I use solr to search over a million html documents, when a document is
searched and displayed, I want to highlight the keywords that are used to
find and access the document.
Unified highlighter is fast, accurate and supports different languages but
only highlights passages with given
e=WHOLE as well, then a a simpler PassageFormatter
> >> could basically ignore the passage starts & ends and merely mark up the
> >> original content in entirety, which is a null concatenated sequence of
> all
> >> the values for this field for a document.
> &
which is a null concatenated sequence of all
>> the values for this field for a document.
>>
>> ~ David
>>
>>
>> On Fri, Mar 29, 2019 at 2:02 PM Walter Underwood
>> wrote:
>>
>>> We are testing 6.6.1.
>>>
>>> wunder
>
; > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/ (my blog)
> >
> > > On Mar 29, 2019, at 11:02 AM, Walter Underwood
> > wrote:
> > >
> > > In testing, hl.preserveMulti=true works with the uni
Hi David,
Thanks for the response! I use Unified Highlighter combined with
maxAnalyzedChars to accomplish my needs.
I'll file an issue and PR for it!
Kind Regards,
Furkan KAMACI
On Fri, May 22, 2020 at 11:25 PM David Smiley wrote:
> Feel free to file an issue; I know it's not
blog)
>
> > On Mar 29, 2019, at 11:02 AM, Walter Underwood
> wrote:
> >
> > In testing, hl.preserveMulti=true works with the unified highlighter.
> But the documentation says that the parameter is only implemented in the
> original highlighter.
> >
> &g
6:47 AM Furkan KAMACI wrote:
> Hi All,
>
> I want to switch to Unified Highlighter due to performance reasons for my
> Solr 7.6 I was using these fields
>
> solrQuery.addHighlightField("content_*")
> .set("f.content_en.hl.alternateField", "content&quo
Hi Roland,
I was not able to reproduce this. I modified the tech_products same config
to change the name field to use a new field type that had a trivial
edgengram config. Then I composed this query based. alittle on some of
your parameters, and it did find highlights:
http://localhost:8983/solr
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
> )
> > for using the unified highlighter.
> >
> > ...
> > * "set storeOffsetsWithPositions to true"
> > * "set termVectors to true but no oth
l highlighter or the term
> vector highlighter, but when I try to use the unified highlighter, I get no
> results returned. My Google searches so far have not revealed anybody
> having this same problem (perhaps user error on my part), hence why I’m
> asking a question to the Solr mail
I am running Solr 8.4 and am attempting to use its highlighting feature. It
appears to work well when I use the original highlighter or the term vector
highlighter, but when I try to use the unified highlighter, I get no results
returned. My Google searches so far have not revealed anybody
Hi All,
I use Solr 8.4.1 and implement suggester functionality. As part of the
suggestions I would like to show product info so I had to implement this
functionality with normal query parsers instead of suggester component. I
applied an edgengramm filter without stemming to fasten the analysis of
Hi, I'm trying to understand what's going on with
the combination of:
* Solr 8.1.1
* edismax parser
* qf with multiple fields specified (each of which has type
text_en_splitting, some of which are multiValued)
* unified highlight method
* query with two terms
* results where the two terms match
On 22 Jul 2019, at 11:32 am, Richard Walker wrote:
> I'm trying out the advice in the user guide
> (
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
> )
> for using the unified highlighter.
>
> ...
> * &quo
I'm trying out the advice in the user guide
(
https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
)
for using the unified highlighter.
I saw the note:
"This is definitely the fastest option for highlighting
wildcard queries on
Hi All,
I want to switch to Unified Highlighter due to performance reasons for my
Solr 7.6 I was using these fields
solrQuery.addHighlightField("content_*")
.set("f.content_en.hl.alternateField", "content")
.set("f.content_es.hl.alternateField", &quo
We are testing 6.6.1.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 29, 2019, at 11:02 AM, Walter Underwood wrote:
>
> In testing, hl.preserveMulti=true works with the unified highlighter. But the
> documentation says that the
In testing, hl.preserveMulti=true works with the unified highlighter. But the
documentation says that the parameter is only implemented in the original
highlighter.
Is the documentation wrong? Can we trust this to keep working with unified?
wunder
Walter Underwood
wun...@wunderwood.org
http
It looks like hl.preserveMulti is only implemented in the Original highlighter.
Has anyone looked at doing this for the Unified highlighter?
We need to preserve order in the highlights for a multi-valued field.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my
Hi Solr community,
I would like some help with a strange behavior that I observe on the
unified highlighter.
Here is the configuration of my highlighter :
on
unified
false
<span class="em">
</span>
content_fr content_en exactContent
true
CHARACTER
html
200
51200
Hi Shawn,
Thank you for your reply.
> that sounds like a bug in the argument parser that needs to be fixed.
I have created a JIRA about this.
https://issues.apache.org/jira/browse/SOLR-11334
Thanks,
Yasufumi
On 2017/09/06 9:48 PM, Shawn Heisey wrote:
On 9/4/2017 9:49 PM, Yasufumi Mizoguchi
On 9/4/2017 9:49 PM, Yasufumi Mizoguchi wrote:
> I understood what you are saying. However, at least, I think it
> strange that UnifiedSolrHighlighter
> returns the same error when choosing ", " as the field delimiter in
> hl.fl (e.g. hl.fl=name,%20manu).
> This is because UnifiedSolrHighlighter de
Hi, Shawn,
(Sorry, I have sent this your private email address...)
Thanks for your reply.
I understood what you are saying. However, at least, I think it strange
that UnifiedSolrHighlighter
returns the same error when choosing ", " as the field delimiter in
hl.fl (e.g. hl.fl=name,%20manu).
Thi
On 9/3/2017 10:31 PM, Yasufumi Mizoguchi wrote:
> I am testing UnifiedHighlighter(hl.method=unified) with Solr 6.6 and
> found that the highlighter returns following error when hl.fl
> parameter has undefined fields.
> The error occurs even if hl.fl parameter has ", "( + )
> as a field delimiter. (
Hi,
I am testing UnifiedHighlighter(hl.method=unified) with Solr 6.6 and
found that the highlighter returns following error when hl.fl parameter
has undefined fields.
The error occurs even if hl.fl parameter has ", "( + ) as
a field delimiter. (e.g. hl.fl=name, manu)
Is this a bug? I think tha
The escaping does appear excessive. Please file a bug to the Lucene
project in Apache JIRA.
On Fri, May 26, 2017 at 11:26 AM Michael Joyner wrote:
> Isn't the unified html escaper a rather bit extreme in it's escaping?
>
> It makes it hard to deal with for simple post-processing.
>
> The origin
Hi,
I'm not so sure about the escaping, but to control how much text is
returned as context around the highlighted frag, you can set the following
in solrconfig.xml.
200
This will limit the fragments to consider for highlight to around 200
characters, and it will not return the whole chunk of da
Isn't the unified html escaper a rather bit extreme in it's escaping?
It makes it hard to deal with for simple post-processing.
The original html escaper seems to do minimial escaping, not every
non-alphabetical character it can find.
Also, is there a way to control how much text is returned
Hi list,
Given the text:
"Kontraktsproget vil være dansk og arbejdssproget kan være dansk, svensk,
norsk og engelsk"
and the query:
{!complexphrase df=content_da}("sve* no*")
the unified highlighter (hl.method=unified) does not return any highlights.
For reference, the original
67 matches
Mail list logo