[jira] [Commented] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3

Erick Erickson (Jira) Mon, 24 Feb 2020 12:34:01 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043840#comment-17043840
 ]


Erick Erickson commented on SOLR-14137:
---------------------------------------

The programs I use to generate docs and run Jmeter are here: 
[https://github.com/ErickErickson/index_doc_generator.] It's a bit of a mess, I 
was trying several different things. But if people want to work with it I can 
help untangle it.

> Boosting by date (and perhaps others) shows a steady decline 6.6->8.3
> ---------------------------------------------------------------------
>
>                 Key: SOLR-14137
>                 URL: https://issues.apache.org/jira/browse/SOLR-14137
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erick Erickson
>            Priority: Major
>         Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 
> 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png, 
> second_run.png
>
>
> Moving a user's list discussion over here.
> {color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the 
> throughput for date boosting in my tests dropped by 40+%{color}
> {color:#000000}I’ve been hearing about slowdowns in successive Solr releases 
> with boost functions, so I dug into it a bit. The test setup is just a 
> boost-by-date with an additional big OR clause of 100 random words so I’d be 
> sure to hit a bunch of docs. I figured that if there were few hits, the 
> signal would be lost in the noise, but I didn’t look at the actual hit 
> counts.{color}
> {color:#000000}I saw several Solr JIRAs about this subject, but they were 
> slightly different, although quite possibly the same underlying issue. So I 
> tried to get this down to a very specific form of a query.{color}
> {color:#000000}I’ve also seen some cases in the wild where the response was 
> proportional to the number of segments, thus my optimize experiments.{color}
> {color:#000000}Here are the results, explanation below. O stands for 
> optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and 
> they weren’t significantly different performance wise from tdate. All have 
> docValues enabled. I ran these against a multiValued=“false” field. All the 
> tests pegged all my CPUs. Jmeter is being run on a different machine than 
> Solr. Only one Solr was running for any test.{color}
> {color:#000000}Solr version   queries/min   {color}
> {color:#000000}6.6.1              3,400          {color}
> {color:#000000}6.6.1 O           4,800          {color}
> {color:#000000}7.1                 2,800           {color}
> {color:#000000}7.1 O             4,200           {color}
> {color:#000000}7.7.1              2,400           {color}
> {color:#000000}7.7.1 O          3,500            {color}
> {color:#000000}8.3.1             2,000            {color}
> {color:#000000}8.3.1 O          2,600            {color}
> {color:#000000}The tests I’ve been running just index 20M docs into a single 
> core, then run the exact same 10,000 queries against them from jmeter with 24 
> threads. Spot checks showed no hits on the queryResultCache.{color}
> {color:#000000}A query looks like this: {color}
> {color:#000000}rows=0&\{!boost b=recip(ms(NOW, 
> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR 
> anyplace…97 more random words){color}
> {color:#000000}There is no faceting. No grouping. No sorting.{color}
> {color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running 
> the exact same queries for every test.{color}
> {color:#000000}One wildcard is that I did regenerate the index for each major 
> revision, and the chose random words from the same list of words, as well as 
> random times (bounded in the same range though) so the docs are not 
> completely identical. The index was in the native format for that major 
> version even if slightly different between versions. I ran the test once, 
> then ran it again after optimizing the index.{color}
> {color:#000000}I haven’t dug any farther, if anyone’s interested I can throw 
> a profiler at, say, 8.3 and see what I can see, although I’m not going to 
> have time to dive into this any time soon. I’d be glad to run some tests 
> though. I saved the queries and the indexes so running a test would  only 
> take a few minutes.{color}
> {color:#000000}While I concentrated on date fields, the docs have date, int, 
> and long fields, both docValues=true and docValues=false, each variant with 
> multiValued=true and multiValued=false and both Trie and Point (where 
> possible) variants as well as a pretty simple text field.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3

Reply via email to