On 2/7/07 10:04 AM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote:

> I'm interested in improving my existing custom cache warming by being
> selective about what updates rather than rebuilding completely.
> 
> How can I tell what documents were updated/added/deleted from the old
> cache to the new IndexSearcher?

We could add a system-maintained timestamp field. LDAP has that.

Knowing which documents were added or changed doesn't actually
work for this, because the new or changed documents might now
match queries that they didn't match before. Add a term to a
document, and it shows up in new queries. Those queries need
to be re-run.

In order to selectively warm, you need to know which terms
changed. Build a set of all terms in documents before they
are updated and all from the new documents. Then extract
the terms from each query. If a query has any term that
is in the set from the document changes, that query must
be re-run.

We used to do something similar manually for stemmer dictionary
changes. The same would be necessary for changes to protwords.txt.
Search for the old and new forms, and reindex only the matching
documents.

This is very efficient for stemmer changes, but I'm not sure
how well it would work for document changes. If your documents
are a good match to your queries (and I hope they are), a few
changes could match many queries, then you are back to a full
re-warm. 

wunder
-- 
Walter Underwood
Search Guru, Netflix



Reply via email to