Thanks Otis for the response. I'm still not clear on few things, 1) I thought Solr can work with only one index at a time. In order to have multiple indexes you need multiple instances of Solr - isn't that right? How can we make Solr to read/ write from and to multiple indexes?
2) What does it mean by "partitioning outside of Solr"? If all the data is indexed by Solr into one index - how would one parition it outside Solr that is still searchable by Solr when needed? Our main problem is scaling with Solr. Our indexes grow so big (like 10G-20G everyday) that it's hard to optimize them and search on large indexes. That's why we are trying to partition them by time. We do need to keep up to 6 months of data. The only way I can think of limiting the index size is by running multiple Solr instances, but even then it's not a scalable solution if the indexes keep growing. Thanks, -vivek On Wed, Mar 25, 2009 at 6:59 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > > Hi, > > Yes, you can use Solr for this, but index partitioning should be done outside > of Solr. That is, your app will need to know where to send each doc based on > its timestamp, when and where to create new index (new Solr core), and so on. > Similarly, deleting older than N days is done by you, using a delete by > query with a date-based open-ended range query. The Solr setup is really > done the same as usual, since all the partitioning-related stuff lives > outside of Solr. Of course, you could come up with a "Solr Proxy" component > that abstract some/all of this and pretends to be Solr. > > > Otis -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- >> From: vivek sar <vivex...@gmail.com> >> To: solr-user@lucene.apache.org >> Sent: Wednesday, March 25, 2009 3:52:11 PM >> Subject: Partition index by time using Solr >> >> Hi, >> >> I've used Lucene before, but new to Solr. I've gone through the >> mailing list, but unable to find any clear idea on how to partition >> Solr indexes. Here is what we want, >> >> 1) Be able to partition indexes by timestamp - basically partition >> per day (create a new index directory every day) >> >> 2) Be able to search partitions based on timestamp. All our queries >> are time based, so instead of looking into all the partitions I want >> to go directly to the partitions where the data might be. >> >> 3) Be able to purge any data older than 6 months without bringing >> down the application. Since, partitions would be marked by timestamp >> we would just have to delete the old partitions. >> >> >> This is going to be a distributed system with 2 boxes each running >> an instance of Solr. I don't want to replicate data, but each box may >> have same timestamp partition with different data. We would be >> indexing on avg of 20 million documents (each document = 500 bytes) >> with estimate of 10g in index size - evenly distributed across >> machines >> (each machine would get roughly 5g of index everyday). >> >> My questions, >> >> 1) Is this all possible using Solr? If not, should I just do this >> using Lucene or is there any other out-of-box alternative? >> 2) If it's possible in Solr how do we do this - configuration, setup etc. >> 3) How would I optimize the partitions - would it be required when using >> Solr? >> >> Thanks, >> -vivek > >