Hi This is a subtle change which is not detected by our langid unit tests, as I think it only happens when document is trasferred with SolrJ and Javabin codec. Was introduced in https://issues.apache.org/jira/browse/SOLR-12992
Please create a new JIRA issue for langid so we can try to fix it in 7.7.1 Other SolrInputDocument users assuming String type for strings in SolrInputDocument would also be vulnerable. I have a patch ready that you could test: Index: solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) +++ solr/contrib/langid/src/java/org/apache/solr/update/processor/LangDetectLanguageIdentifierUpdateProcessor.java (date 1550217809000) @@ -60,12 +60,12 @@ Collection<Object> fieldValues = doc.getFieldValues(fieldName); if (fieldValues != null) { for (Object content : fieldValues) { - if (content instanceof String) { - String stringContent = (String) content; + if (content instanceof CharSequence) { + CharSequence stringContent = (CharSequence) content; if (stringContent.length() > maxFieldValueChars) { - detector.append(stringContent.substring(0, maxFieldValueChars)); + detector.append(stringContent.subSequence(0, maxFieldValueChars).toString()); } else { - detector.append(stringContent); + detector.append(stringContent.toString()); } detector.append(" "); } else { Index: solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== --- solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java (revision 8c831daf4eb41153c25ddb152501ab5bae3ea3d5) +++ solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java (date 1550217691000) @@ -413,10 +413,10 @@ Collection<Object> fieldValues = doc.getFieldValues(fieldName); if (fieldValues != null) { for (Object content : fieldValues) { - if (content instanceof String) { - String stringContent = (String) content; + if (content instanceof CharSequence) { + CharSequence stringContent = (CharSequence) content; if (stringContent.length() > maxFieldValueChars) { - sb.append(stringContent.substring(0, maxFieldValueChars)); + sb.append(stringContent.subSequence(0, maxFieldValueChars)); } else { sb.append(stringContent); } @@ -449,8 +449,8 @@ Collection<Object> contents = doc.getFieldValues(field); if (contents != null) { for (Object content : contents) { - if (content instanceof String) { - docSize += Math.min(((String) content).length(), maxFieldValueChars); + if (content instanceof CharSequence) { + docSize += Math.min(((CharSequence) content).length(), maxFieldValueChars); } } -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 14. feb. 2019 kl. 16:02 skrev Andreas Hubold <andreas.hub...@coremedia.com>: > > Hi, > > while trying to update from Solr 7.6 to 7.7 I run into some unexpected > incompatibilites with UpdateRequestProcessors. > > The SolrInputDocument passed to UpdateRequestProcessor#processAdd does not > return Strings for string fields anymore but instances of > org.apache.solr.common.util.ByteArrayUtf8CharSequence. I found some related > JIRA issues (SOLR-12983?) but nothing under the "Upgrade Notes" section. > > I can adapt our UpdateRequestProcessor implementations but at least the > org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessor > is broken now as well and needs to be fixed in Solr. It expects String values > and logs messages such as the following now: > > 2019-02-14 13:14:47.537 WARN (qtp802600647-19) [ x:studio] > o.a.s.u.p.LangDetectLanguageIdentifierUpdateProcessor Field name_tokenized > not a String value, not including in detection > > I wonder what kind of plugins are affected by the change. Does this only > affect UpdateRequestProcessors or more plugins? Do I need to handle these > ByteArrayUtf8CharSequence instances in SolrJ clients now as well? > > Cheers, > Andreas > >