Thanks Otis for the response. I'm still not clear on few things,

1) I thought Solr can work with only one index at a time. In order to
have multiple indexes you need multiple instances of Solr - isn't that
right? How can we make Solr to read/ write from and to multiple
indexes?

2) What does it mean by "partitioning outside of Solr"? If all the
data is indexed by Solr into one index - how would one parition it
outside Solr that is still searchable by Solr when needed?

Our main problem is scaling with Solr. Our indexes grow so big (like
10G-20G everyday) that it's hard to optimize them and search on large
indexes. That's why we are trying to partition them by time. We do
need to keep up to 6 months of data.

The only way I can think of limiting the index size is by running
multiple Solr instances, but even then it's not a scalable solution if
the indexes keep growing.

Thanks,
-vivek


On Wed, Mar 25, 2009 at 6:59 PM, Otis Gospodnetic
<otis_gospodne...@yahoo.com> wrote:
>
> Hi,
>
> Yes, you can use Solr for this, but index partitioning should be done outside 
> of Solr.  That is, your app will need to know where to send each doc based on 
> its timestamp, when and where to create new index (new Solr core), and so on. 
>  Similarly, deleting older than N days is done by you, using a delete by 
> query with a date-based open-ended range query.  The Solr setup is really 
> done the same as usual, since all the partitioning-related stuff lives 
> outside of Solr.  Of course, you could come up with a "Solr Proxy" component 
> that abstract some/all of this and pretends to be Solr.
>
>
> Otis --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <vivex...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Wednesday, March 25, 2009 3:52:11 PM
>> Subject: Partition index by time using Solr
>>
>> Hi,
>>
>>   I've used Lucene before, but new to Solr. I've gone through the
>> mailing list, but unable to find any clear idea on how to partition
>> Solr indexes. Here is what we want,
>>
>>   1) Be able to partition indexes by timestamp - basically partition
>> per day (create a new index directory every day)
>>
>>   2) Be  able to search partitions based on timestamp. All our queries
>> are time based, so instead of looking into all the partitions I want
>> to go directly to the partitions where the data might be.
>>
>>   3) Be able to purge any data older than 6 months without bringing
>> down the application. Since, partitions would be marked by timestamp
>> we would just have to delete the old partitions.
>>
>>
>>   This is going to be a distributed system with 2 boxes each running
>> an instance of Solr. I don't  want to replicate data, but each box may
>> have same timestamp partition with different data. We would be
>> indexing on avg of  20 million documents (each document = 500 bytes)
>> with estimate of 10g in index size - evenly distributed across
>> machines
>>   (each machine would get roughly 5g of index everyday).
>>
>>   My questions,
>>
>>   1) Is this all possible using Solr? If not, should I just do this
>> using Lucene or is there any other out-of-box alternative?
>>   2) If it's possible in Solr how do we do this - configuration, setup etc.
>>   3) How would I optimize the partitions - would it be required when using 
>> Solr?
>>
>>   Thanks,
>>   -vivek
>
>

Reply via email to