: Yep, that's what came in my search. See how TTL work in hbase/cassandra/ : rocksdb <https://github.com/facebook/rocksdb/wiki/Time-to-Live>. There : isn't a "delete old docs"query, but old docs are deleted by the storage : when merging. Looks like this needs to be a lucene-module which can then be : configured by solr ? ... : Just like in hbase,cassandra,rocksdb, when you "select" a row/document that : has expired, it exists on the storage, but isn't returned by the db,
What you're describing is exactly how segment merges work in Lucene, it's just a question of terminology. In Lucene, "deleting" a document is a *logical* operation, the data still lives in the (existing) segments but the affected docs are recorded in a list of deletions (and automatically excluded from future searchers that are opened against them) ... once the segments are merged then the deleted documents are "expunged" rather then being copied over to the new segments. Where this diverges from what you describe is that as things stand in lucene, something has to "mark" the segements as deleted in order for them to later be expunged -- in Solr right now is the code in question that does this via (internal) DBQ. The disatisfaction you expressed with this approach confuses me... >> I did some search for TTL on solr, and found only a way to do it with a >> delete-query. But that ~sucks, because you have to do a lot of inserts >> (and queries). ...nothing about this approach does any "inserts" (or queries -- unless you mean the DBQ itself?) so w/o more elaboration on what exactly you find problematic about this approach, it's hard to make any sense of your objection or request for an alternative. With all those caveats out of the way... What you're ultimately requesting -- new code that hooks into segment merging to exclude "expired" documents from being copied into the the new merged segments --- should be theoretically possible with a custom MergePolicy, but I don't really see how it would be better then the current approach in typically use cases (ie: i want docs excluded from results after the expiration date is reached, with a min tollerance of X) ... 1) nothing would ensure that docs *ever* get removed during perioids when docs aren't being added (thus no new segments, thus no merging) 2) as you described, query clients would be required to specify date range filters on every query to identify the "logically live docs at this moment" on a per-request basis -- something that's far less efficient from a cachng standpoint then letting the system do a DBQ on the backened to affect the *global* set of logically live docs at the index level. Frankly: It seems to me that you've looked at how other non-lucene based systems X & Y handle TTL type logic and decided that's the best possible solution therefore the solution used by Solr "sucks" w/o taking into account that what's efficient in the underlying Lucene storage implementation might just be diff then what's efficient in the underlying storage implementation of X & Y. If you'd like to tackle implementing TTL as a lower level primitive concept in Lucene, then by all means be my guest -- but personally i don't think you're going to find any real perf improvements in an approach like you describe compared to what we offer today. i look forward to being proved wrong. -Hoss http://www.lucidworks.com/