Really? The last time I looked at AIE, I am pretty sure there was Solr core msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at Lucene level or on top of multiple Solr cores or what?
-- Jan Høydahl - search architect Cominvent AS - www.cominvent.com On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote: > Minor correction re Attivio - their stuff runs on top of Lucene, not Solr. I > *think* they are trying to patent this. > > Otis > ---- > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Hadoop ecosystem search :: http://search-hadoop.com/ > > > > ----- Original Message ---- >> From: Jan Høydahl / Cominvent <jan....@cominvent.com> >> To: solr-user@lucene.apache.org >> Sent: Mon, February 8, 2010 3:33:41 PM >> Subject: Re: Collating results from multiple indexes >> >> Hi, >> >> There is no JOIN functionality in Solr. The common solution is either to >> accept >> the high volume update churn, or to add client side code to build a "join" >> layer >> on top of the two indices. I know that Attivio (www.attivio.com) have built >> some >> kind of JOIN functionality on top of Solr in their AIE product, but do not >> know >> the details or the actual performance. >> >> Why not open a JIRA issue, if there is no such already, to request this as a >> feature? >> >> -- >> Jan Høydahl - search architect >> Cominvent AS - www.cominvent.com >> >> On 25. jan. 2010, at 22.01, Aaron McKee wrote: >> >>> >>> Is there any somewhat convenient way to collate/integrate fields from >>> separate >> indices during result writing, if the indices use the same unique keys? >> Basically, some sort of cross-index JOIN? >>> >>> As a bit of background, I have a rather heavyweight dataset of every US >> business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours >> to >> fully index on a decent box). Given the size and relatively stability of the >> dataset, I generally only update this monthly. However, I have separate >> advertising-related datasets that need to be updated either hourly or daily >> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser >> feeds >> reference the same keyspace that I use in the main index, but are otherwise >> significantly lighter weight. Importing and indexing them discretely only >> takes >> a couple minutes. Given that Solr/Lucene doesn't support field updating, >> without >> having to drop and re-add an entire document, it doesn't seem practical to >> integrate this data into the main index (the system would be under a >> constant >> state of churn, if we did document re-inserts, and the performance impact >> would >> probably be debilitating). It may be nice if this data could participate in >> filtering (e.g. only show advertisers), but it doesn't need to participate >> in >> scoring/ranking. >>> >>> I'm guessing that someone else has had a similar need, at some point? I >>> can >> have our front-end query the smaller indices separately, using the keys >> returned >> by the primary index, but would prefer to avoid the extra sequential >> roundtrips. >> I'm hoping to also avoid a coding solution, if only to avoid the maintenance >> overhead as we drop in new builds of Solr, but that's also feasible. >>> >>> Thank you for your insight, >>> Aaron >>> >