Re: Solr 7.7 UpdateRequestProcessor broken

Andreas Hubold Fri, 15 Feb 2019 01:10:58 -0800

Hi,

thank you, Jan.

I've created https://issues.apache.org/jira/browse/SOLR-13255. Maybe youwant to add your patch to that ticket. I did not have time to test it yet.

So I guess, all SolrJ usages have to handle CharSequence now for stringfields? Well, this really sounds like a major breaking change for customcode.


Thanks,
Andreas

Jan Høydahl schrieb am 15.02.19 um 09:14:

Hi

This is a subtle change which is not detected by our langid unit tests, as I 
think it only happens when document is trasferred with SolrJ and Javabin codec.
Was introduced in https://issues.apache.org/jira/browse/SOLR-12992

Please create a new JIRA issue for langid so we can try to fix it in 7.7.1

Other SolrInputDocument users assuming String type for strings in 
SolrInputDocument would also be vulnerable.

I have a patch ready that you could test:

Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
      (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java
      (date 1550217809000)
@@ -60,12 +60,12 @@
            Collection<Object> fieldValues = doc.getFieldValues(fieldName);
            if (fieldValues != null) {
              for (Object content : fieldValues) {
-              if (content instanceof String) {
-                String stringContent = (String) content;
+              if (content instanceof CharSequence) {
+                CharSequence stringContent = (CharSequence) content;
                  if (stringContent.length() > maxFieldValueChars) {
-                  detector.append(stringContent.substring(0, 
maxFieldValueChars));
+                  detector.append(stringContent.subSequence(0, 
maxFieldValueChars).toString());
                  } else {
-                  detector.append(stringContent);
+                  detector.append(stringContent.toString());
                  }
                  detector.append(" ");
                } else {
Index: 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
        (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5)
+++ 
solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java
        (date 1550217691000)
@@ -413,10 +413,10 @@
          Collection<Object> fieldValues = doc.getFieldValues(fieldName);
          if (fieldValues != null) {
            for (Object content : fieldValues) {
-            if (content instanceof String) {
-              String stringContent = (String) content;
+            if (content instanceof CharSequence) {
+              CharSequence stringContent = (CharSequence) content;
                if (stringContent.length() > maxFieldValueChars) {
-                sb.append(stringContent.substring(0, maxFieldValueChars));
+                sb.append(stringContent.subSequence(0, maxFieldValueChars));
                } else {
                  sb.append(stringContent);
                }
@@ -449,8 +449,8 @@
          Collection<Object> contents = doc.getFieldValues(field);
          if (contents != null) {
            for (Object content : contents) {
-            if (content instanceof String) {
-              docSize += Math.min(((String) content).length(), 
maxFieldValueChars);
+            if (content instanceof CharSequence) {
+              docSize += Math.min(((CharSequence) content).length(), 
maxFieldValueChars);
              }
            }


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

14. feb. 2019 kl. 16:02 skrev Andreas Hubold <[email protected]>:

Hi,

while trying to update from Solr 7.6 to 7.7 I run into some unexpected 
incompatibilites with UpdateRequestProcessors.

The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not return Strings 
for string fields anymore but instances of 
org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some related JIRA issues 
(SOLR-12983?) but nothing under the "Upgrade Notes" section.

I can adapt our UpdateRequestProcessor implementations but at least the 
org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor is 
broken now as well and needs to be fixed in Solr. It expects String values and 
logs messages such as the following now:

2019-02-14 13:14:47.537 WARN  (qtp802600647-19) [   x:studio] 
o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized not 
a String value, not including in detection

I wonder what kind of plugins are affected by the change. Does this only affect 
UpdateRequestProcessors or more plugins? Do I need to handle these 
ByteArrayUtf8CharSequence instances in SolrJ clients now as well?

Cheers,
Andreas

Re: Solr 7.7 UpdateRequestProcessor broken

Reply via email to