[ https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044930#comment-17044930 ]
David Smiley commented on SOLR-14137: ------------------------------------- Too bad we can't do a "git bisect" perf test.... or can we? Even just a dumb constant threshold and using the same index data (thus no re-indexing) would be super-helpful in potentially pinpointing commits that made things worse. > Boosting by date (and perhaps others) shows a steady decline 6.6->8.3 > --------------------------------------------------------------------- > > Key: SOLR-14137 > URL: https://issues.apache.org/jira/browse/SOLR-14137 > Project: Solr > Issue Type: Improvement > Reporter: Erick Erickson > Priority: Major > Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot > 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png, > hoss_experiement1.tgz, second_run.png > > > Moving a user's list discussion over here... > http://mail-archives.apache.org/mod_mbox/lucene-dev/201912.mbox/%3ccb89b979-babb-4f76-875b-e222b46eb...@gmail.com%3e > {color:#000000}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the > throughput for date boosting in my tests dropped by 40+%{color} > {color:#000000}I’ve been hearing about slowdowns in successive Solr releases > with boost functions, so I dug into it a bit. The test setup is just a > boost-by-date with an additional big OR clause of 100 random words so I’d be > sure to hit a bunch of docs. I figured that if there were few hits, the > signal would be lost in the noise, but I didn’t look at the actual hit > counts.{color} > {color:#000000}I saw several Solr JIRAs about this subject, but they were > slightly different, although quite possibly the same underlying issue. So I > tried to get this down to a very specific form of a query.{color} > {color:#000000}I’ve also seen some cases in the wild where the response was > proportional to the number of segments, thus my optimize experiments.{color} > {color:#000000}Here are the results, explanation below. O stands for > optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and > they weren’t significantly different performance wise from tdate. All have > docValues enabled. I ran these against a multiValued=“false” field. All the > tests pegged all my CPUs. Jmeter is being run on a different machine than > Solr. Only one Solr was running for any test.{color} > {color:#000000}Solr version queries/min {color} > {color:#000000}6.6.1 3,400 {color} > {color:#000000}6.6.1 O 4,800 {color} > {color:#000000}7.1 2,800 {color} > {color:#000000}7.1 O 4,200 {color} > {color:#000000}7.7.1 2,400 {color} > {color:#000000}7.7.1 O 3,500 {color} > {color:#000000}8.3.1 2,000 {color} > {color:#000000}8.3.1 O 2,600 {color} > {color:#000000}The tests I’ve been running just index 20M docs into a single > core, then run the exact same 10,000 queries against them from jmeter with 24 > threads. Spot checks showed no hits on the queryResultCache.{color} > {color:#000000}A query looks like this: {color} > {color:#000000}rows=0&\{!boost b=recip(ms(NOW, > INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR > anyplace…97 more random words){color} > {color:#000000}There is no faceting. No grouping. No sorting.{color} > {color:#000000}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running > the exact same queries for every test.{color} > {color:#000000}One wildcard is that I did regenerate the index for each major > revision, and the chose random words from the same list of words, as well as > random times (bounded in the same range though) so the docs are not > completely identical. The index was in the native format for that major > version even if slightly different between versions. I ran the test once, > then ran it again after optimizing the index.{color} > {color:#000000}I haven’t dug any farther, if anyone’s interested I can throw > a profiler at, say, 8.3 and see what I can see, although I’m not going to > have time to dive into this any time soon. I’d be glad to run some tests > though. I saved the queries and the indexes so running a test would only > take a few minutes.{color} > {color:#000000}While I concentrated on date fields, the docs have date, int, > and long fields, both docValues=true and docValues=false, each variant with > multiValued=true and multiValued=false and both Trie and Point (where > possible) variants as well as a pretty simple text field.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org