You could split your index. There are some tools available, e.g. this one
on github https://github.com/HON-Khresmoi/hash-based-index-splitter does
hash based index splitting. You could probably start with the tool and
modify it for your needs (like moving docs with certain timestamp criteria
to another Lucene index). After you have partitioned your index, you should
__in_principle__ be able to start up another SOLR instance with each of the
index parts.

Another approach is the one you have mentioned, that is querying SOLR and
e.g. storing output into an csv file. Then load that csv into another SOLR
instance. This one is probably the fastest solution in terms of coding
effort, but might put a strain onto your (production?) Solr instance. So
should be done during non biz hours. Could be done incrementally as well,
that as soon as you know the data became old enough to be moved, move it in
smaller pieces.

Be aware of the fact, that you will only be able to retrieve stored fields
of each document with both approaches.

Regards,

Dmitry

On Wed, Oct 17, 2012 at 6:12 AM, Zeng Lames <lezhi.z...@gmail.com> wrote:

> Thanks kan for your prompt help. It is really a great solution to recovery
> those deleted records.
>
> Another question is about Solr history data housekeep problem. the scenario
> is as below:
>
> we have a solr core to store biz records, which is large volume that the
> index files is more than 50GB in one month. Due to disk space limitation,
> we need to delete one month ago records from solr.  but in the other hand,
> we need to keep those data in the cheaper disk for analysis.
> the problem is how to keep those data one month ago into cheaper disk in
> quickly solution. one simple but so slow solution is that we search out
> those records and add into the solr in the cheaper disk.
>
> wanna to know is there any other solution for such kind of problem. e.g.
> move the index files directly?
>
> thanks a lot!
>
> On Wed, Oct 17, 2012 at 12:31 AM, Dmitry Kan <dmitry....@gmail.com> wrote:
>
> > Hello,
> >
> > One approach (not a solrish one, but still) would be to use Lucene API
> and
> > set up an IndexReader onto the solr index in question. You can then do:
> >
> > [code]
> > Directory indexDir = FSDirectory.open(new File(pathToDir));
> > IndexReader input = IndexReader.open(indexDir, true);
> >
> > FieldSelector fieldSelector = new SetBasedFieldSelector(
> >                 null, // to retrive all stored fields
> >                 Collections.<String>emptySet());
> >
> > int maxDoc = input.maxDoc();
> > for (int i = 0; i < maxDoc; i++) {
> > if (input.isDeleted(i)) {
> > // deleted document found, retrieve it
> > Document document = input.document(i, fieldSelector);
> > // analyze its field values here...
> > }
> > }
> > [/code]
> >
> > I haven't compiled this code myself, you'll need to experiment with it.
> >
> > Dmitry
> >
> > On Tue, Oct 16, 2012 at 11:06 AM, Zeng Lames <lezhi.z...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > as we know, when we delete document from solr, it will add .del file to
> > > related segment index files, then delete them from disk after optimize.
> > > Now, the question is that before optimized, can we retrieve those
> deleted
> > > records? if yes, how to?
> > >
> > > thanks a lot!
> > >
> > > Best Wishes
> > > Lames
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Dmitry Kan
> >
>



-- 
Regards,

Dmitry Kan

Reply via email to