Hi,

  I've used Lucene before, but new to Solr. I've gone through the
mailing list, but unable to find any clear idea on how to partition
Solr indexes. Here is what we want,

  1) Be able to partition indexes by timestamp - basically partition
per day (create a new index directory every day)

  2) Be  able to search partitions based on timestamp. All our queries
are time based, so instead of looking into all the partitions I want
to go directly to the partitions where the data might be.

  3) Be able to purge any data older than 6 months without bringing
down the application. Since, partitions would be marked by timestamp
we would just have to delete the old partitions.


  This is going to be a distributed system with 2 boxes each running
an instance of Solr. I don't  want to replicate data, but each box may
have same timestamp partition with different data. We would be
indexing on avg of  20 million documents (each document = 500 bytes)
with estimate of 10g in index size - evenly distributed across
machines
  (each machine would get roughly 5g of index everyday).

  My questions,

  1) Is this all possible using Solr? If not, should I just do this
using Lucene or is there any other out-of-box alternative?
  2) If it's possible in Solr how do we do this - configuration, setup etc.
  3) How would I optimize the partitions - would it be required when using Solr?

  Thanks,
  -vivek

Reply via email to