On Fri, Dec 16, 2016 at 8:11 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> On 12/16/2016 11:13 AM, Dorian Hoxha wrote: > > Yep, that's what came in my search. See how TTL work in hbase/cassandra/ > > rocksdb <https://github.com/facebook/rocksdb/wiki/Time-to-Live>. There > > isn't a "delete old docs"query, but old docs are deleted by the > > storage when merging. Looks like this needs to be a lucene-module > > which can then be configured by solr ? > > No. Lucene doesn't know about expiration and doesn't need to know about > expiration. > It needs to know or else it will be ~non efficient in my case. > > The document expiration happens in Solr. In the background, Solr > finds/deletes old documents in the Lucene index according to how the > expiration feature is configured. What happens after that is basic > Lucene operation. If you index enough new data to trigger a merge (or > if you do an optimize/forceMerge), then Lucene will get rid of deleted > documents in the merged segments. The contents of the documents in your > index (whether that's a timestamp or something else) are completely > irrelevant for decisions made during Lucene's segment merging. > Shawn, I know how it works, I read the blog post. But I don't want it that way. So how to do it my way? Like a custom merge function on lucene or something else ? > > > Just like in hbase,cassandra,rocksdb, when you "select" a row/document > > that has expired, it exists on the storage, but isn't returned by the > > db, because it checks the timestamp and sees that it's expired. Looks > > like this also need to be in lucene? > > That's pretty much how Lucene (and by extension, Solr) works, except > it's not related to expiration, it is *deleted* documents that don't > show up in the results. > No it doesn't. But I want expirations to function that way. Just like you have "custom update processors", there should be a similar way for get (so on my custom-get-processor, I check the timestamp and return NotFound if it's expired) > > Thanks, > Shawn > > Makes sense ?