[
https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043840#comment-17043840
]
Erick Erickson commented on SOLR-14137:
---------------------------------------
The programs I use to generate docs and run Jmeter are here:
[https://github.com/ErickErickson/index_doc_generator.] It's a bit of a mess, I
was trying several different things. But if people want to work with it I can
help untangle it.
> Boosting by date (and perhaps others) shows a steady decline 6.6->8.3
> ---------------------------------------------------------------------
>
> Key: SOLR-14137
> URL: https://issues.apache.org/jira/browse/SOLR-14137
> Project: Solr
> Issue Type: Improvement
> Reporter: Erick Erickson
> Priority: Major
> Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot
> 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png,
> second_run.png
>
>
> Moving a user's list discussion over here.
> {color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the
> throughput for date boosting in my tests dropped by 40+%{color}
> {color:#000000}I’ve been hearing about slowdowns in successive Solr releases
> with boost functions, so I dug into it a bit. The test setup is just a
> boost-by-date with an additional big OR clause of 100 random words so I’d be
> sure to hit a bunch of docs. I figured that if there were few hits, the
> signal would be lost in the noise, but I didn’t look at the actual hit
> counts.{color}
> {color:#000000}I saw several Solr JIRAs about this subject, but they were
> slightly different, although quite possibly the same underlying issue. So I
> tried to get this down to a very specific form of a query.{color}
> {color:#000000}I’ve also seen some cases in the wild where the response was
> proportional to the number of segments, thus my optimize experiments.{color}
> {color:#000000}Here are the results, explanation below. O stands for
> optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and
> they weren’t significantly different performance wise from tdate. All have
> docValues enabled. I ran these against a multiValued=“false” field. All the
> tests pegged all my CPUs. Jmeter is being run on a different machine than
> Solr. Only one Solr was running for any test.{color}
> {color:#000000}Solr version queries/min {color}
> {color:#000000}6.6.1 3,400 {color}
> {color:#000000}6.6.1 O 4,800 {color}
> {color:#000000}7.1 2,800 {color}
> {color:#000000}7.1 O 4,200 {color}
> {color:#000000}7.7.1 2,400 {color}
> {color:#000000}7.7.1 O 3,500 {color}
> {color:#000000}8.3.1 2,000 {color}
> {color:#000000}8.3.1 O 2,600 {color}
> {color:#000000}The tests I’ve been running just index 20M docs into a single
> core, then run the exact same 10,000 queries against them from jmeter with 24
> threads. Spot checks showed no hits on the queryResultCache.{color}
> {color:#000000}A query looks like this: {color}
> {color:#000000}rows=0&\{!boost b=recip(ms(NOW,
> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR
> anyplace…97 more random words){color}
> {color:#000000}There is no faceting. No grouping. No sorting.{color}
> {color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running
> the exact same queries for every test.{color}
> {color:#000000}One wildcard is that I did regenerate the index for each major
> revision, and the chose random words from the same list of words, as well as
> random times (bounded in the same range though) so the docs are not
> completely identical. The index was in the native format for that major
> version even if slightly different between versions. I ran the test once,
> then ran it again after optimizing the index.{color}
> {color:#000000}I haven’t dug any farther, if anyone’s interested I can throw
> a profiler at, say, 8.3 and see what I can see, although I’m not going to
> have time to dive into this any time soon. I’d be glad to run some tests
> though. I saved the queries and the indexes so running a test would only
> take a few minutes.{color}
> {color:#000000}While I concentrated on date fields, the docs have date, int,
> and long fields, both docValues=true and docValues=false, each variant with
> multiValued=true and multiValued=false and both Trie and Point (where
> possible) variants as well as a pretty simple text field.{color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]