Use stream result like a query (alternative to innerJoin)
Hi all, I’m looking for a way to query two collections and find documents that exist in both, I know this can be done with innerJoin streaming expression but I want to avoid it, since one of the collection streams can possibly have billions of results: Let’s say two collections are: deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...] items = [ { id: 1, name: "a" }, { id: 2, name: "b" }, { id: 3, name: "c" }. ] “deletedItems” contain a few documents compared to “items” collection (1mil vs 2-3 bil). If I query them both with a typical query in our system, deletedItems gives a few thousand results but items give tens/hundreds of millions. To use innerJoin, I have to stream the whole items result to worker node over network. Is there a way to avoid this, something like using “deletedItems” result as a query to “items” stream? Thanks in advance for the help Sent from Mail for Windows 10
Hitting solr throttling for ingestion
Hi Experts, We are using solr 8.4 (none cloud). When ingesting data with multiple processes to one core in a solr node, we are hitting some throttling: the max ingestion rate achieved is about 47K docs per second with 17 posting processes; each doc is about 250 bytes; the CPU utilization rate is only 20% and I/O about 6%. When increasing the posting processes, the posting will start failing. With solr 6.6, such issue does not happen: increasing posting processes will increase CPU/IO utilization rates to be close to 100% then start failing. Below are some relevant configurations specified in solrconfig.xml: 16 1024 10 10 4096 0.1 4096 ${solr.lock.type:native} true ${solr.ulog.dir:} ${solr.ulog.numVersionBuckets:65536} ${solr.autoCommit.maxTime:12} false ${solr.autoSoftCommit.maxTime:5000} It seems maxIndexingThreads is no longer supported in solr 8? Any idea to break the solr throttling? Thanks. Shushuai