On 2/7/07 10:04 AM, "Erik Hatcher" <[EMAIL PROTECTED]> wrote:
> I'm interested in improving my existing custom cache warming by being > selective about what updates rather than rebuilding completely. > > How can I tell what documents were updated/added/deleted from the old > cache to the new IndexSearcher? We could add a system-maintained timestamp field. LDAP has that. Knowing which documents were added or changed doesn't actually work for this, because the new or changed documents might now match queries that they didn't match before. Add a term to a document, and it shows up in new queries. Those queries need to be re-run. In order to selectively warm, you need to know which terms changed. Build a set of all terms in documents before they are updated and all from the new documents. Then extract the terms from each query. If a query has any term that is in the set from the document changes, that query must be re-run. We used to do something similar manually for stemmer dictionary changes. The same would be necessary for changes to protwords.txt. Search for the old and new forms, and reindex only the matching documents. This is very efficient for stemmer changes, but I'm not sure how well it would work for document changes. If your documents are a good match to your queries (and I hope they are), a few changes could match many queries, then you are back to a full re-warm. wunder -- Walter Underwood Search Guru, Netflix