intr3p1d opened a new issue, #12352: URL: https://github.com/apache/pinot/issues/12352
# Cause `clp-ffi-java` [internally use](https://github.com/y-scope/clp-ffi-java/blob/c4a74dbdeb09bd4e7e3d119826dddbe5005ccf53/src/main/java/com/yscope/clp/compressorfrontend/EncodedMessage.java#L30-L36) `StandardCharsets.ISO_8859_1` in `EncodedMessage.getLogTypeAsString();`  (`getDictionaryVarsAsStrings` also) # Effect https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-plugins/pinot-input-format/pinot-clp-log/src/main/java/org/apache/pinot/plugin/inputformat/clplog/CLPLogRecordExtractor.java#L151-L154 This makes some characters broken in `column_logtype` like this: `Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: 0 이상이어야 합니다` into `Request processing failed: jakarta.validation.ConstraintViolationException: getAgentsList.from: ì´ìì´ì´ì¼ í©ëë¤` This is fine after going through the CLPDECODE function, but when dealing with individual `_logtype` columns, these broken strings don't seem appropriate (LIKE searches, etc). The `clp-ffi-java` library makes all EncodedMessage member variables public. So it would be nice if pinot's `CLPLogMessageDecoder` could handle them (or at least match the other encodings used internally by pinot). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org