[
https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16991014#comment-16991014
]
Yonik Seeley commented on SOLR-14013:
-------------------------------------
I worked up a quick-n-dirty patch to disable the charseq optimization stuff to
test my hypothesis on slower indexing speed:
{code}
git diff
diff --git
a/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
b/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
index 69da3948fe9..620fffb1303 100644
--- a/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
+++ b/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java
@@ -146,7 +146,7 @@ public class HttpShardHandler extends ShardHandler {
private static final BinaryResponseParser READ_STR_AS_CHARSEQ_PARSER = new
BinaryResponseParser() {
@Override
protected JavaBinCodec createCodec() {
- return new JavaBinCodec(null, stringCache).setReadStringAsCharSeq(true);
+ return new JavaBinCodec(null, stringCache).setReadStringAsCharSeq(false);
}
};
diff --git a/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
b/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
index 3d1976e143c..056dc08d963 100644
--- a/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
+++ b/solr/core/src/java/org/apache/solr/response/DocsStreamer.java
@@ -148,9 +148,7 @@ public class DocsStreamer implements Iterator<SolrDocument>
{
// because that doesn't include extra fields needed by transformers
final Set<String> fieldNamesNeeded = fields.getLuceneFieldNames();
- final SolrDocument out = ResultContext.READASBYTES.get() == null ?
- new SolrDocument() :
- new BinaryResponseWriter.MaskCharSeqSolrDocument();
+ final SolrDocument out = new SolrDocument();
// NOTE: it would be tempting to try and optimize this to loop over
fieldNamesNeeded
// when it's smaller then the IndexableField[] in the Document -- but
that's actually *less* effecient
diff --git
a/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
b/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
index 7a4abe2c303..53cfbee320f 100644
---
a/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
+++
b/solr/solrj/src/java/org/apache/solr/common/util/ByteArrayUtf8CharSequence.java
@@ -209,8 +209,11 @@ public class ByteArrayUtf8CharSequence implements
Utf8CharSequence {
}
return vals;
}
-
public static Object convertCharSeq(Object o) {
+ return o; // nocommit
+ }
+
+ public static Object _convertCharSeq(Object o) {
if (o == null) return null;
if (o instanceof Utf8CharSequence) return ((Utf8CharSequence)
o).toString();
if (o instanceof Collection) return convertCharSeq((Collection) o);
{code}
I also hacked up the unit test I used to find the N^2 issue...
it's obviously not good for benchmarking (being a unit test, etc), but good
enough to detect anything major.
I tested with a single value per string field (and many fields per doc).. it
would be worse for multiple values per field.
Results:
===================== master, single valued string fields
[junit4] 2> INDEX TIME=10293
[junit4] 2> QUERY TIME=891 xml
[junit4] 2> QUERY TIME=415 javabin
[junit4] 2> QUERY TIME=600 json
[junit4] 2> INDEX TIME=10313
[junit4] 2> QUERY TIME=872 xml
[junit4] 2> QUERY TIME=389 javabin
[junit4] 2> QUERY TIME=579 json
[junit4] 2> INDEX TIME=10307
[junit4] 2> QUERY TIME=858 xml
[junit4] 2> QUERY TIME=410 javabin
[junit4] 2> QUERY TIME=570 json
[junit4] 2> INDEX TIME=10318
[junit4] 2> QUERY TIME=915 xml
[junit4] 2> QUERY TIME=382 javabin
[junit4] 2> QUERY TIME=600 json
[junit4] 2> INDEX TIME=10579
[junit4] 2> QUERY TIME=843 xml
[junit4] 2> QUERY TIME=386 javabin
[junit4] 2> QUERY TIME=570 json
===================== patch disabling charseq stuff, single valued string fields
[junit4] 2> INDEX TIME=8547
[junit4] 2> QUERY TIME=881 xml
[junit4] 2> QUERY TIME=396 javabin
[junit4] 2> QUERY TIME=576 json
[junit4] 2> INDEX TIME=9428
[junit4] 2> QUERY TIME=821 xml
[junit4] 2> QUERY TIME=374 javabin
[junit4] 2> QUERY TIME=543 json
[junit4] 2> INDEX TIME=9181
[junit4] 2> QUERY TIME=812 xml
[junit4] 2> QUERY TIME=382 javabin
[junit4] 2> QUERY TIME=533 json
[junit4] 2> INDEX TIME=9455
[junit4] 2> QUERY TIME=863 xml
[junit4] 2> QUERY TIME=395 javabin
[junit4] 2> QUERY TIME=613 json
[junit4] 2> INDEX TIME=9530
[junit4] 2> QUERY TIME=863 xml
[junit4] 2> QUERY TIME=385 javabin
[junit4] 2> QUERY TIME=559 json
So the charseq stuff (or rather probably the extra work to
auto-convert-to-string) did cause slower indexing speed.
There is enough noise that I don't think one can draw any conclusions about
query speed.
> javabin performance regressions
> -------------------------------
>
> Key: SOLR-14013
> URL: https://issues.apache.org/jira/browse/SOLR-14013
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.7
> Reporter: Yonik Seeley
> Assignee: Yonik Seeley
> Priority: Major
> Attachments: test.json
>
>
> As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became
> orders of magnitude slower in certain cases since v7.7. The cases identified
> so far include large numbers of values in a field.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]