Hmm looking in the code for the IndexMerger in Solr (org.apache.solr.update.DirectUpdateHandler(2)
See that the IndexWriter.addIndexesNoOptimize(dirs) is used (union of indexes) ? And the test class org.apache.solr.client.solrj.MergeIndexesExampleTestBase suggests: add doc A to index1 with id=AAA,name=core1 add doc B to index2 with id=BBB,name=core2 merge the two indexes into one index which then contains both docs. The resulting index will have 2 docs. Great but in my case I think it should work more like this. add doc A to index1 with id=X,title=blog entry title,description=blog entry description add doc B to index2 with id=X,score=1.2 somehow add index2 to index1 so id=XX has score=1.2 when searching in index1 The resulting index should have 1 doc. So this is not really what I want right ? Sorry for being a smart-ass... Kindly //Marcus On Sat, Apr 25, 2009 at 5:10 PM, Marcus Herou <marcus.he...@tailsweep.com>wrote: > Guys! > > Thanks for these insights, I think we will head for Lucene level merging > strategy (two or more indexes). > When merging I guess the second index need to have the same doc ids > somehow. This is an internal id in Lucene, not that easy to get hold of > right ? > > So you are saying the the solr: ExternalFileField + FunctionQuery stuff > would not work very well performance wise or what do you mean ? > > I sure like bleeding edge :) > > Cheers dudes > > //Marcus > > > > > > On Sat, Apr 25, 2009 at 3:46 PM, Otis Gospodnetic < > otis_gospodne...@yahoo.com> wrote: > >> >> I should emphasize that the PR trick I mentioned is something you'd do at >> the Lucene level, outside Solr, and then you'd just slip the modified index >> back into Solr. >> Of, if you like the bleeding edge, perhaps you can make use of Ning Li's >> Solr index merging functionality (patch in JIRA). >> >> >> Otis -- >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> >> >> ----- Original Message ---- >> > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> >> > To: solr-user@lucene.apache.org >> > Sent: Saturday, April 25, 2009 9:41:45 AM >> > Subject: Re: Date faceting - howto improve performance >> > >> > >> > Yes, you could simply round the date, no need for a non-date type field. >> > Yes, you can add a field after the fact by making use of ParallelReader >> and >> > merging (I don't recall the details, search the ML for ParallelReader >> and >> > Andrzej), I remember he once provided the working recipe. >> > >> > >> > Otis -- >> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> > >> > >> > >> > ----- Original Message ---- >> > > From: Marcus Herou >> > > To: solr-user@lucene.apache.org >> > > Sent: Saturday, April 25, 2009 6:54:02 AM >> > > Subject: Date faceting - howto improve performance >> > > >> > > Hi. >> > > >> > > One of our faceting use-cases: >> > > We are creating trend graphs of how many blog posts that contains a >> certain >> > > term and groups it by day/week/year etc. with the nice DateMathParser >> > > functions. >> > > >> > > The performance degrades really fast and consumes a lot of memory >> which >> > > forces OOM from time to time >> > > We think it is due the fact that the cardinality of the field >> publishedDate >> > > in our index is huge, almost equal to the nr of documents in the >> index. >> > > >> > > We need to address that... >> > > >> > > Some questions: >> > > >> > > 1. Can a datefield have other date-formats than the default of >> yyyy-MM-dd >> > > HH:mm:ssZ ? >> > > >> > > 2. We are thinking of adding a field to the index which have the >> format >> > > yyyy-MM-dd to reduce the cardinality, if that field can't be a date, >> it >> > > could perhaps be a string, but the question then is if faceting can be >> used >> > > ? >> > > >> > > 3. Since we now already have such a huge index, is there a way to add >> a >> > > field afterwards and apply it to all documents without actually >> reindexing >> > > the whole shebang ? >> > > >> > > 4. If the field cannot be a string can we just leave out the >> > > hour/minute/second information and to reduce the cardinality and >> improve >> > > performance ? Example: 2009-01-01 00:00:00Z >> > > >> > > 5. I am afraid that we need to reindex everything to get this to work >> > > (negates Q3). We have 8 shards as of current, what would the most >> efficient >> > > way be to reindexing the whole shebang ? Dump the entire database to >> disk >> > > (sigh), create many xml file splits and use curl in a >> > > random/hash(numServers) manner on them ? >> > > >> > > >> > > Kindly >> > > >> > > //Marcus >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > -- >> > > Marcus Herou CTO and co-founder Tailsweep AB >> > > +46702561312 >> > > marcus.he...@tailsweep.com >> > > http://www.tailsweep.com/ >> > > http://blogg.tailsweep.com/ >> >> > > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > marcus.he...@tailsweep.com > http://www.tailsweep.com/ > http://blogg.tailsweep.com/ > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/