Ah, OK, I misunderstood. OK, here's a couple of off-the-top-of-my-head
ideas.

make a backup of your index before anything else <G>...

Split up your current index into two parts by segments. That is, copy the whole
directory to another place, and remove some of the segments from each. I.e.
when you're done, you'll still have all the segments you used to have,
but some of
them will be in one directory and some in another. Of course all of
the segments files
with a common prefix should be in place (e.g. all the _0.* files in
the same dir, not
split between the two dirs).

Now run CheckIndex on them. That'll take a long time, but it _should_
spoof Solr/Lucene
into thinking that there are two complete indexes out there. Now your
idea of having an archival
search should work, but with two places to look, not one. NOTE:
Whether this plays nice with
the over 2B docs or deleted documents I can't guarantee.... I believe
that the deleted docs
are per-segment, if so this should be fine. This won't work if you've recently
optimized..... when you're done you should have two cores out there
(hmmm, these could
also be treated as shards?) that you point your solr at.

You might want to optimize in this case when you're done.
I suspect you could, with a magnetized needle and a steady hand, edit
some of the auxiliary files (segments*) but
I would feel more secure letting CheckIndex to the heavy lifting.

Here's another possibility
> Try a delete-by-query from a bit before the date you think things went over 
> 2B to now (really hope you have a date!)
    > perhaps you can walk the underlying index in Lucene somehow and
make this work if you don't have a date. Since the
           underlying Lucene IDs are segment_base +
local_segment_count this should be safely under 2B
           but I'm reaching here into areas I don't know much about.
> optimize (and wait. probably a really long time).
> re-index everything after the date (or whatever) you used above into a new 
> shard
> now treat the big index just as you were talking about.

Please understand that the over 2B docs might cause some grief here,
but since the underlying index
is segment based (i.e. the internal Lucene doc IDs are a base+offset
for each segment), this has a
decent chance of working (but anyone who really understands, please
chime in. I'm reaching).

Oh, and if it works, please let us know...

Best
Erick

On Wed, Jun 20, 2012 at 6:37 PM, avenka <ave...@gmail.com> wrote:
> Erick, thanks for the advice, but let me make sure you haven't misunderstood
> what I was asking.
>
> I am not trying to split the huge existing index in install1 into shards. I
> am also not trying to make the huge install1 index as one shard of a sharded
> solr setup. I plan to use a sharded setup only for future docs.
>
> I do want to avoid trying to re-index the docs in install1 and think of them
> as a slow "tape archive" index server if I ever need to go and query the
> past documents. So I was wondering if I could somehow use the existing
> segment files to run an isolated (unsharded) solr server that lets me query
> roughly the first 2B docs before the wraparound problem happened. If the
> "negative" internal doc IDs have pervasively corrupted the segment files,
> this would not be possible, but I am not able to imagine an underlying
> lucene design that would cause such a problem. Is my only option to re-index
> the past 2B docs if I want to be able to query them at this point or is
> there any way to use the existing segment files?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/solr-java-lang-NullPointerException-on-select-queries-tp3989974p3990615.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to