Re: Merging multicore indexes

Shalin Shekhar Mangar Mon, 05 Oct 2009 03:20:18 -0700

On Sun, Oct 4, 2009 at 8:05 PM, Paul Rosen <p...@performantsoftware.com>wrote:


> Hi,
>
> I've been trying to experiment with merging, but have been running into
> some problems.
>
> First, I'm using ruby and the solr-ruby-0.0.7 gem. It looks like there is
> no support in that gem for merging. Have I overlooked something?
>
>
> Second, I was attempting to just follow the instructions in
> http://wiki.apache.org/solr/MergingSolrIndexes so I could see merging
> work. I just tried putting the sample url in the address bar of my browser,
> but it just sent me to the admin page. (It does the same thing as if I had
> left off all the parameters.) Here is the URL I constructed:
>
>
> http://localhost:8983/solr/merged/admin/?action=mergeindexes&core=merged&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_marc/index&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_rdf/index
>
> Why didn't that work? Do I have to POST that instead of using GET?
>

The path on the wiki page was wrong. You need to use the adminPath in the
url. Look at the adminPath attribute in solr.xml. It is typically
/admin/cores

So the correct path for you would be:

http://localhost:8983/solr/admin/cores?action=mergeindexes&core=merged&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_marc/index&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_rdf/index<http://localhost:8983/solr/merged/admin/?action=mergeindexes&core=merged&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_marc/index&indexDir=/Users/my/path/solr_1.4/solr/data/reindexed_rdf/index>

I've fixed the wiki too.


> Alternately, is there a way to specify merging from the admin interface?
>
> Third, I've googled for info about merging and not come up with any
> solutions, but I did see a possible concern:
>
> Is it true that after merging, that your index can have duplicate
> documents? If so, then I need to create a step after merging for deleting
> the old copy of everything I merged.
>
>
Yes it can have duplicate documents. Merge is handled by Lucene which does
not have the concept of a uniqueKey. I'm not sure how you can do that in a
separate step.


> Given all the above, I'm wondering if it would make more sense to just
> retrieve each document from the old index and add it to the new index and
> forget about merging. I know that would be a slow process, but I'm not sure
> how much slower that would be than doing the merge (how long does that
> take?), then going through the entire index and eliminating duplicates.
>
>
It could be slow. But if in the end you need to merge, can you skip the
intermediate lucene index completely?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Merging multicore indexes

Reply via email to