[ https://issues.apache.org/jira/browse/LUCENE-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995503#comment-16995503 ]
Lucene/Solr QA commented on LUCENE-9091: ---------------------------------------- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 3m 15s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 40s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 12s{color} | {color:green} highlighter in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 14s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black}102m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | LUCENE-9091 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12988720/LUCENE-9091.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-LUCENE-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 3ba0054 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-LUCENE-Build/244/testReport/ | | modules | C: lucene lucene/highlighter solr/core U: . | | Console output | https://builds.apache.org/job/PreCommit-LUCENE-Build/244/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > UnifiedHighlighter HTML escaping should only escape essentials > -------------------------------------------------------------- > > Key: LUCENE-9091 > URL: https://issues.apache.org/jira/browse/LUCENE-9091 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter > Reporter: Nándor Mátravölgyi > Priority: Minor > Attachments: LUCENE-9091.patch > > > The unified highlighter does not use the > *org.apache.lucene.search.highlight.SimpleHTMLEncoder* through > *org.apache.solr.highlight.HtmlEncoder*. It has the HTML escaping feature > re-implemented and embedded in the > *org.apache.lucene.search.uhighlight.DefaultPassageFormatter*. > The HTML escaping done by the unified highlighter escapes characters that do > not need it. This makes the result payload 50%+ more heavy with no benefit. > Here is a highlight snippet using the original highlighter: > {noformat} > A <em>filter</em> that stems words using a Snowball-generated stemmer. > Available stemmers & x are listed in org.tartarus.snowball.ext. Note: > This <em>filter</em> is aware of the KeywordAttribute. > {noformat} > Here is the same highlight snippet using the unified highlighter: > {noformat} > A <em>filter</em> that stems words using a Snowball-generated stemmer. Available stemmers & x are listed in org.tartarus.snowball.ext. Note: This <em>filter</em> is aware of the KeywordAttribute. > {noformat} > Maybe I'm missing the point why this is done the way it is. If this behaviour > is desired for some use-case it should be a separate encoder, and the HTML > encoder should only escape the necessary characters. > Affects all versions of Lucene-Solr since the addition of the > UnifiedHighlighter. Here are the lines where the escaping are implemented > differently: > * [Escaping by the unified > highlighter|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/DefaultPassageFormatter.java#L132] > * [Escaping by the other > highlighters|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/SimpleHTMLEncoder.java#L69] > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org