This is one case where permanent caches are interesting. Another case
is highlighting: in some cases highlighting takes a lot of work, and
this work is not cached.

It might be a cleaner architecture to have session-maintaining code in
a separate front-end app, and leave Solr session-free.

On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris <rygu...@gmail.com> wrote:
> If documents are being added to and removed from an index (and commits
> are being issued) while a user is searching, then the experience of
> paging through search results using the obvious solr mechanism
> (&start=100&Rows=10) may be disorienting for the user. For one
> example, by the time the user clicks "next page" for the first time, a
> document that they saw on page 1 may have been pushed onto page 2.
> (This may be especially pronounced if docs are being sorted by date.)
>
> I'm wondering what are the best options available for presenting a
> more stable set of search results to users in such cases. The obvious
> candidates to me are:
>
> #1: Cache results in the user session of the web tier. (In particular,
> maybe just cache the uniqueKey of each maching document.)
>
>  Pro: Simple
>  Con: May require capping the # of search results in order to make
> the initial query (which now has Solr numRows param >> web pageSize)
> fast enough. For example, maybe it's only practical to cache the first
> 500 records.
>
> #2: Create some kind of per-user results cache in Solr. (One simple
> implementation idea: You could make your Solr search handler take a
> userid parameter, and cache each user's last search in a special
> per-user results cache. You then also provide an API that says, "give
> me records n through m of userid #1334's last search". For your
> subsequent queries, you consult the latter API rather than redoing
> your search. Because Lucene docids are unstable across commits and
> such, I think this means caching the uniqueKey of each maching
> document. This in turn means looking up the uniqueKey of each maching
> document at search time. It also means you can't use the existing Solr
> caches, but need to make a new one.)
>
>  Pro: Maybe faster than #1?? (Saves on data transfer between Solr and
> web tier, at least during the initial query.)
>  Con: More complicated than #1.
>
> #3: Use filter queries to attempt to make your subsequent queries (for
> page 2, page 3, etc.) return results consistent with your original
> query. (One idea is to give each document a docAddedTimestamp field,
> which would have precision down to the millisecond or something. On
> your initial query, you could note the current time, T. Then for the
> subsequent queries you add a filter query for docAddedTimestamp<=T.
> Hopefully with a trie date field this would be fast. This should
> hopefully keep any docs newly added after T from showing up in the
> user's search results as they page through them. However, it won't
> necessarily protect you from docs that were *reindexed* (i.e. re-add a
> doc with the same uniqueKey as an existing doc) or docs that were
> deleted.)
>
>  Pro: Doesn't require a new cache, and no cap on # of search results
>  Con: Maybe doesn't provide total stability.
>
> Any feedback on these options? Are there other ideas to consider?
>
> Thanks,
> Chris
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to