Re: Real Time Search and External File Fields

Shawn Heisey Sat, 08 Oct 2016 08:46:41 -0700

On 10/7/2016 6:19 PM, Mike Lissner wrote:
> Soft commits seem to be exactly the thing for this, but whenever I open a
> new searcher (which soft commits seem to do), the external file is
> reloaded, and all queries are halted until it finishes loading. When I just
> measured, this took about 30 seconds to complete. Most soft commit
> documentation talks about setting up soft commits with <maxtime> of about a
> second.

IMHO any documentation that recommends autoSoftCommit with a maxTime of
one second is bad documentation, and needs to be fixed. Where have you
seen such a recommendation? Unless the index is extremely small and has
been thoroughly optimized for NRT (which usually means *no*
autowarming), achieving commit times of less than one second is usually
not possible. This is the page that usually comes out when people start
talking about commits:

http://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

On the topic of one-second commit latency, that page has this to say:
"Set your soft commit interval to as long as you can stand. Don’t listen
to your product manager who says “we need no more than 1 second
latency”. Really. Push back hard and see if the /user/ is best served or
will even notice. Soft commits and NRT are pretty amazing, but they’re
not free."

The kind of intervals for autocommit and autosoftcommit that I like to
see is at LEAST one minute, and preferably longer if you can stand it to
be longer.

> Is there anything I can do to make the external file field not get reloaded
> constantly? It only changes about once a month, and I want to use soft
> commits to power the alerts feature.

Anytime you want changes to show up in your index, you need a new
searcher. When you're using an external file field, part of that info
will come from that external source, and right now Solr/Lucene has no
way of knowing that your external file has not changed, so it must read
the file every time it builds a searcher. I doubt this feature was
designed to deal well with an extremely large external file like yours.
The code looks like it goes line by line reading the file, and although
I'm sure that process has been optimized as far as it can be, it still
takes a lot of time when there are millions of lines.

If the info changes that infrequently, can you just incorporate it
directly into the index with a standard field, with the info coming in
as a part of your normal indexing process? I'm sure the performance
would be MUCH better if Solr didn't have to reference the external file.

It seems unlikely that Solr would stop serving queries while setting up
a new searcher. The old searcher should continue to serve requests
until the new searcher is ready. If this is happening, that definitely
seems like a bug.

Thanks,
Shawn

Re: Real Time Search and External File Fields

Reply via email to