Re: Collating results from multiple indexes

Jan Høydahl / Cominvent Fri, 12 Feb 2010 07:31:19 -0800

Really? The last time I looked at AIE, I am pretty sure there was Solr core 
msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may be 
mistaken. Anyone from Attivio here who can elaborate? Is the join stuff at 
Lucene level or on top of multiple Solr cores or what?


--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:

> Minor correction re Attivio - their stuff runs on top of Lucene, not Solr.  I 
> *think* they are trying to patent this.
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> ----- Original Message ----
>> From: Jan Høydahl / Cominvent <jan....@cominvent.com>
>> To: solr-user@lucene.apache.org
>> Sent: Mon, February 8, 2010 3:33:41 PM
>> Subject: Re: Collating results from multiple indexes
>> 
>> Hi,
>> 
>> There is no JOIN functionality in Solr. The common solution is either to 
>> accept 
>> the high volume update churn, or to add client side code to build a "join" 
>> layer 
>> on top of the two indices. I know that Attivio (www.attivio.com) have built 
>> some 
>> kind of JOIN functionality on top of Solr in their AIE product, but do not 
>> know 
>> the details or the actual performance.
>> 
>> Why not open a JIRA issue, if there is no such already, to request this as a 
>> feature?
>> 
>> --
>> Jan Høydahl  - search architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 25. jan. 2010, at 22.01, Aaron McKee wrote:
>> 
>>> 
>>> Is there any somewhat convenient way to collate/integrate fields from 
>>> separate 
>> indices during result writing, if the indices use the same unique keys? 
>> Basically, some sort of cross-index JOIN?
>>> 
>>> As a bit of background, I have a rather heavyweight dataset of every US 
>> business (~25m records, an on-disk index footprint of ~30g, and 5-10 hours 
>> to 
>> fully index on a decent box). Given the size and relatively stability of the 
>> dataset, I generally only update this monthly. However, I have separate 
>> advertising-related datasets that need to be updated either hourly or daily 
>> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser 
>> feeds 
>> reference the same keyspace that I use in the main index, but are otherwise 
>> significantly lighter weight. Importing and indexing them discretely only 
>> takes 
>> a couple minutes. Given that Solr/Lucene doesn't support field updating, 
>> without 
>> having to drop and re-add an entire document, it doesn't seem practical to 
>> integrate this data into the main index (the system would be under a 
>> constant 
>> state of churn, if we did document re-inserts, and the performance impact 
>> would 
>> probably be debilitating). It may be nice if this data could participate in 
>> filtering (e.g. only show advertisers), but it doesn't need to participate 
>> in 
>> scoring/ranking.
>>> 
>>> I'm guessing that someone else has had a similar need, at some point?  I 
>>> can 
>> have our front-end query the smaller indices separately, using the keys 
>> returned 
>> by the primary index, but would prefer to avoid the extra sequential 
>> roundtrips. 
>> I'm hoping to also avoid a coding solution, if only to avoid the maintenance 
>> overhead as we drop in new builds of Solr, but that's also feasible.
>>> 
>>> Thank you for your insight,
>>> Aaron
>>> 
>

Re: Collating results from multiple indexes

Reply via email to