This is one case where permanent caches are interesting. Another case is highlighting: in some cases highlighting takes a lot of work, and this work is not cached.
It might be a cleaner architecture to have session-maintaining code in a separate front-end app, and leave Solr session-free. On Fri, Nov 13, 2009 at 12:48 PM, Chris Harris <rygu...@gmail.com> wrote: > If documents are being added to and removed from an index (and commits > are being issued) while a user is searching, then the experience of > paging through search results using the obvious solr mechanism > (&start=100&Rows=10) may be disorienting for the user. For one > example, by the time the user clicks "next page" for the first time, a > document that they saw on page 1 may have been pushed onto page 2. > (This may be especially pronounced if docs are being sorted by date.) > > I'm wondering what are the best options available for presenting a > more stable set of search results to users in such cases. The obvious > candidates to me are: > > #1: Cache results in the user session of the web tier. (In particular, > maybe just cache the uniqueKey of each maching document.) > > Pro: Simple > Con: May require capping the # of search results in order to make > the initial query (which now has Solr numRows param >> web pageSize) > fast enough. For example, maybe it's only practical to cache the first > 500 records. > > #2: Create some kind of per-user results cache in Solr. (One simple > implementation idea: You could make your Solr search handler take a > userid parameter, and cache each user's last search in a special > per-user results cache. You then also provide an API that says, "give > me records n through m of userid #1334's last search". For your > subsequent queries, you consult the latter API rather than redoing > your search. Because Lucene docids are unstable across commits and > such, I think this means caching the uniqueKey of each maching > document. This in turn means looking up the uniqueKey of each maching > document at search time. It also means you can't use the existing Solr > caches, but need to make a new one.) > > Pro: Maybe faster than #1?? (Saves on data transfer between Solr and > web tier, at least during the initial query.) > Con: More complicated than #1. > > #3: Use filter queries to attempt to make your subsequent queries (for > page 2, page 3, etc.) return results consistent with your original > query. (One idea is to give each document a docAddedTimestamp field, > which would have precision down to the millisecond or something. On > your initial query, you could note the current time, T. Then for the > subsequent queries you add a filter query for docAddedTimestamp<=T. > Hopefully with a trie date field this would be fast. This should > hopefully keep any docs newly added after T from showing up in the > user's search results as they page through them. However, it won't > necessarily protect you from docs that were *reindexed* (i.e. re-add a > doc with the same uniqueKey as an existing doc) or docs that were > deleted.) > > Pro: Doesn't require a new cache, and no cap on # of search results > Con: Maybe doesn't provide total stability. > > Any feedback on these options? Are there other ideas to consider? > > Thanks, > Chris > -- Lance Norskog goks...@gmail.com