No faceting. Highlighting. We have very long queries, because students are pasting homework problems. I’ve seen 1000 word queries, but we truncate at 40 words.
We do as-you-type results, so we also have ngram fields on the 20 million solved homework questions. This bloats the index severely. About 75% of terms are ngram. Median query time is over one second, so a burst of traffic can back up a lot of work. If we hard limit the amount of simultaneous requests, the cluster can get slow instead of falling over. Thousands of connections is a lot better than thousands of threads. Connections are just blocks of data in the client and OS. wunder Walter Underwood [email protected] http://observer.wunderwood.org/ (my blog) > On Nov 29, 2017, at 3:41 PM, Toke Eskildsen <[email protected]> wrote: > > Walter Underwood <[email protected]> wrote: >> I knew about SOLR-7433, but I’m really surprised that 200 incoming requests >> can need 4000 threads. >> >> We have four shards. > > For that I would have expected at most 800 Threads. Are you perhaps doing > faceting on multiple fields with facet.threads=5? (kinda grasping at straws > here) > >> Why is there a thread per shard? HTTP can be done async: send1, >> send2, send3, send4, recv1 recv2, recv3, recv4. I’ve been doing >> that for over a decade with HTTPClient. > > I don't know the reasoning. Should I design it from scratch, I would probably > still use Threads (wrapped as Futures) as they are easy to work with. Getting > into thousands of connections in Solr seems like a danger sigh to me, whether > they are done async or not. > > - Toke Eskildsen
