Thank you very much Shawn for a detailed response. Let me read all the documentation you pointed to and digest it.
Sure, if I do use using solr and need to make this change, I would love to also submit it to the Lucene/Solr project. Regards, Vinay ________________________________ From: Shawn Heisey <s...@elyograg.org> To: solr-user@lucene.apache.org Sent: Thursday, April 25, 2013 11:32 PM Subject: Re: Question on storage and index/data management in solr On 4/25/2013 8:39 AM, Vinay Rai wrote: > 1. Keep each of last 24 hours segments separate. > 2. Segments generated between last 48 to 24 hours to be merged into one. > Similarly, for segments created between 72 to 48 hours and so on for last 1 > week. > 3. Similarly, merge previous 4 week's data into one segment each week. > 4. Merge all previous months data into one segment each month. > > I am not sure if there is a configuration possible in solr application. If > not, are there APIs which will allow me to do this? To accomplish this exact scenario, you would probably have to write a custom merge policy class for Lucene. If you do so, I hope you'll strongly consider donating it to the Lucene/Solr project. Another approach: Use distributed search and put the divisions you are looking at into separate indexes (shards) in their own cores. You can then manually do whatever index merging your situation requires. Constructing the shards parameter for your queries will take some work. Here's a blog post about this method and a video of the Lucene Revolution talk mentioned in the blog post: http://www.loggly.com/blog/2010/08/our-solr-system/ http://loggly.com/videos/lucene-revolution-2010/ I had the honor of being there for that talk in Boston. They've done some amazing things with Solr. > Also, I want to understand how solr stores data or does it have a dependency > on the way data is stored. Since the volumes are high, it would be great if > the data is compressed and stored (while still searchable). If it is > possible, I would like to know what kind of compression does solr do? Solr 4.1 uses compression for stored fields. Solr 4.2 also uses compression for term vectors. From a performance perspective, compression is probably not viable at this time for the indexed data, but if that changes in the future, I'm sure that it will be added. Here is documentation on the file format used by Solr 4.2: http://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description Thanks, Shawn