Thanks for the replies. I made the changes so that the external file field
is loaded per:

<listener event="newSearcher" class="solr.QuerySenderListener">
  <arr name="queries">
      <!-- See:
https://github.com/freelawproject/courtlistener/issues/581#issuecomment-252443419-->
      <lst>
          <str name="q">*</str><str name="sort">score desc</str>
      </lst>
  </arr>
</listener>


Looking at the logs I now have entries like:

8202710 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore
– Loaded external value source external_pagerank :25 missing keys [4026457,
4026464, 4026468, 4026926, 4029539, 4030007, 4030897, 4030898, 4030899,
4031105]
8202722 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore
– QuerySenderListener sending requests to Searcher@6c77ceaf[collection1]
main{StandardDirectoryReader(segments_380g:441995:nrt
_3liq(4.10.4):C3868116/58950:delGen=2145
_3nfp(4.10.4):C29745/12438:delGen=1033 _3pu5(4.10.4):C22649/7807:delGen=575
_3s9r(4.10.4):C30846/3868:delGen=374 _3r4k(4.10.4):C23730/6740:delGen=478
_3ti5(4.10.4):C6980/461:delGen=151 _3spo(4.10.4):C4447/741:delGen=202
_3u5f(4.10.4):C13863/240:delGen=17 _3tzy(4.10.4):C249/25:delGen=14
_3u0j(4.10.4):C229/17:delGen=4 _3u8n(4.10.4):C4440/45:delGen=5
_3u36(4.10.4):C161/12:delGen=9 _3u84(4.10.4):C4599/1685:delGen=8
_3u8v(4.10.4):C597/12:delGen=4 _3u80(4.10.4):C279/40:delGen=5
_3u9t(4.10.4):C357/4:delGen=2 _3u98(4.10.4):C128/10:delGen=5
_3u8h(4.10.4):C214/13:delGen=3 _3u8y(4.10.4):C119/32:delGen=5
_3ua5(4.10.4):C78/6 _3u94(4.10.4):C96/9:delGen=2
_3u93(4.10.4):C101/14:delGen=5 _3u9n(4.10.4):C214/59:delGen=2
_3u9p(4.10.4):C125/30:delGen=1 _3ua6(4.10.4):C5 _3ua7(4.10.4):C10
_3ua8(4.10.4):C1 _3ua9(4.10.4):C1)}
8202795 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore
– [collection1] webapp=null path=null
params={sort=score+desc&event=newSearcher&q=*&distrib=false} hits=3919121
status=0 QTime=73
8202799 [searcherExecutor-6-thread-1] INFO  org.apache.solr.core.SolrCore
– QuerySenderListener done.

And after these entries, the queries are great. It fixed the issue.

BUT...from time to time, I still have entries like:

10921036 [qtp669611164-42] INFO  org.apache.solr.core.SolrCore  – Loaded
external value source external_pagerank :25 missing keys [4026457, 4026464,
4026468, 4026926, 4029539, 4030007, 4030897, 4030898, 4030899, 4031105]
10921037 [qtp669611164-38] INFO  org.apache.solr.core.SolrCore  –
[collection1] webapp=/solr path=/select/ params={foo=bar&bang=boof} hits=12
status=0 QTime=41337
10921038 [qtp669611164-27] INFO  org.apache.solr.core.SolrCore  –
[collection1] webapp=/solr path=/select/ params={foo=bar&bang=boof}
hits=275 status=0 QTime=22363

So, in those cases the listener didn't seem to fire, and the queries are
very slow (you can see the QTimes of 41 and 22 seconds above).

Any ideas why this would be? I didn't set up a listener for firstSearcher.
Is that what this is caused by? My understanding is firstSearcher is only
triggered by solr startup?

(I also created a new ticket to investigate whether the modtime of the
external file field could be used to avoid EFF reloads:
https://issues.apache.org/jira/browse/LUCENE-7488)
<https://issues.apache.org/jira/browse/LUCENE-7488>

Thanks again. This is already a lot of progress.

Mike

On Sun, Oct 9, 2016 at 7:27 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/8/2016 1:18 PM, Mike Lissner wrote:
> > I want to make sure I understand this properly and document this for
> > futurepeople that may find this thread. Here's what I interpret your
> > advice to be:
> > 0. Slacken my auto soft commit interval to something more like a minute.
>
> Yes, I would do this.  I would also increase autoCommit to something
> between one and five minutes, with openSearcher set to false.  There's
> nothing *wrong* with 15 seconds for autoCommit, but I want my server to
> be doing less work during normal operation.
>
> To answer a question you posed in a later message: Yes, it's common for
> users to have a longer interval on autoSoftCommit than autoCommit.
> Remember the mantra in the URL about understanding commits:  Hard
> commits are about durability, soft commits are about visibility.  Hard
> commits when openSearcher is false are almost always *very* fast, so
> it's typically not much of a burden to have them happen more frequently,
> and thus have a better data durability guarantee.  Like I said above, I
> generally use an autoCommit value between one and five minutes.
>
> > I'm a bit confused about the example autowarmcount for the caches, which
> is
> > 0. Why not set this to something higher? I guess it's a RAM utilization
> vs.
> > speed tradeoff? A low number like 16 seems like it'd have minimal impact
> on
> > RAM?
>
> A low autowarmCount is generally chosen for one reason: commit speed.
> If the example configs have it set to zero, I'm sure this was done so
> commits would proceed as fast as possible.  Large values can turn
> opening a new searcher into a process that can take *minutes*.
>
> On my index shards, the autowarmCount on my filterCache is *four*.
> That's it -- execute only four of the most recent filters in the cache
> when a new searcher opens.  That warming *still* sometimes takes as long
> as 20 seconds on the larger shards.  The filters used in queries on my
> indexes are very large and very complex, and can match millions of
> documents.  Pleading with the dev team to decrease query complexity
> doesn't help.
>
> On the idea of reusing the external file data when it doesn't change:  I
> do not know if this is possible.  I have no idea how Solr and Lucene use
> the data found in the external file, so it might be completely necessary
> to re-load it every time.  You can open an issue in Jira to explore the
> idea, but don't be too surprised if it doesn't go anywhere.
>
> Thanks,
> Shawn
>
>

Reply via email to