Re: Collating results from multiple indexes

Jan Høydahl / Cominvent Wed, 17 Feb 2010 16:28:39 -0800

Thanks for your clarification and link, Will.

Back to Aaron's question. There is some ongoing work to try to support updating 
single fields within documents (http://issues.apache.org/jira/browse/SOLR-139 
and http://issues.apache.org/jira/browse/SOLR-828) which could perhaps be part 
of a future solution.


Is it an option for you to write a smart "join" component which can live on top 
of multiple cores and do multiple sub queries in an efficient way and 
transparently return the final result? Forking the shards query code could be a 
starting point? Donating this component back to Solr may free you of 
maintenance burden, as I'm sure it will be useful to a larger audience?

--
Jan Høydahl  - search architect
Cominvent AS - www.cominvent.com

On 17. feb. 2010, at 03.27, Will Johnson wrote:

> Jan Hoydal / Otis,
> 
> 
> 
> First off, Thanks for mentioning us.  We do use some utility functions from
> SOLR but our index engine is built on top of Lucene only, there are no Solr
> cores involved.  We do have a JOIN operator that allows us to perform
> relational searches while still acting like a search engine in terms of
> performance, ranking, faceting, etc.  Our CTO wrote a blog article about it
> a month ago that does a pretty good of explaining how it’s used:
> http://www.attivio.com/blog/55-industry-insights/507-can-a-search-engine-replace-a-relational-database.html
> 
> 
> 
> The join functionality and most of our other higher level features use
> separate data structures and don’t really have much to do with Lucene/SOLR
> except where they integrate with the query execution.  If you want to learn
> more feel free to check out www.attivio.com.
> 
> 
> 
> -              w...@attivio.com
> 
> 
> On Fri, Feb 12, 2010 at 10:35 AM, Jan Høydahl / Cominvent <
> jan....@cominvent.com> wrote:
> 
>> Really? The last time I looked at AIE, I am pretty sure there was Solr core
>> msgs in the logs, so I assumed it used EmbeddedSolr or something. But I may
>> be mistaken. Anyone from Attivio here who can elaborate? Is the join stuff
>> at Lucene level or on top of multiple Solr cores or what?
>> 
>> --
>> Jan Høydahl  - search architect
>> Cominvent AS - www.cominvent.com
>> 
>> On 11. feb. 2010, at 23.02, Otis Gospodnetic wrote:
>> 
>>> Minor correction re Attivio - their stuff runs on top of Lucene, not
>> Solr.  I *think* they are trying to patent this.
>>> 
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>> 
>>> 
>>> 
>>> ----- Original Message ----
>>>> From: Jan Høydahl / Cominvent <jan....@cominvent.com>
>>>> To: solr-user@lucene.apache.org
>>>> Sent: Mon, February 8, 2010 3:33:41 PM
>>>> Subject: Re: Collating results from multiple indexes
>>>> 
>>>> Hi,
>>>> 
>>>> There is no JOIN functionality in Solr. The common solution is either to
>> accept
>>>> the high volume update churn, or to add client side code to build a
>> "join" layer
>>>> on top of the two indices. I know that Attivio (www.attivio.com) have
>> built some
>>>> kind of JOIN functionality on top of Solr in their AIE product, but do
>> not know
>>>> the details or the actual performance.
>>>> 
>>>> Why not open a JIRA issue, if there is no such already, to request this
>> as a
>>>> feature?
>>>> 
>>>> --
>>>> Jan Høydahl  - search architect
>>>> Cominvent AS - www.cominvent.com
>>>> 
>>>> On 25. jan. 2010, at 22.01, Aaron McKee wrote:
>>>> 
>>>>> 
>>>>> Is there any somewhat convenient way to collate/integrate fields from
>> separate
>>>> indices during result writing, if the indices use the same unique keys?
>>>> Basically, some sort of cross-index JOIN?
>>>>> 
>>>>> As a bit of background, I have a rather heavyweight dataset of every US
>>>> business (~25m records, an on-disk index footprint of ~30g, and 5-10
>> hours to
>>>> fully index on a decent box). Given the size and relatively stability of
>> the
>>>> dataset, I generally only update this monthly. However, I have separate
>>>> advertising-related datasets that need to be updated either hourly or
>> daily
>>>> (e.g. today's coupon, click revenue remaining, etc.) . These advertiser
>> feeds
>>>> reference the same keyspace that I use in the main index, but are
>> otherwise
>>>> significantly lighter weight. Importing and indexing them discretely
>> only takes
>>>> a couple minutes. Given that Solr/Lucene doesn't support field updating,
>> without
>>>> having to drop and re-add an entire document, it doesn't seem practical
>> to
>>>> integrate this data into the main index (the system would be under a
>> constant
>>>> state of churn, if we did document re-inserts, and the performance
>> impact would
>>>> probably be debilitating). It may be nice if this data could participate
>> in
>>>> filtering (e.g. only show advertisers), but it doesn't need to
>> participate in
>>>> scoring/ranking.
>>>>> 
>>>>> I'm guessing that someone else has had a similar need, at some point?
>> I can
>>>> have our front-end query the smaller indices separately, using the keys
>> returned
>>>> by the primary index, but would prefer to avoid the extra sequential
>> roundtrips.
>>>> I'm hoping to also avoid a coding solution, if only to avoid the
>> maintenance
>>>> overhead as we drop in new builds of Solr, but that's also feasible.
>>>>> 
>>>>> Thank you for your insight,
>>>>> Aaron
>>>>> 
>>> 
>> 
>>

Re: Collating results from multiple indexes

Reply via email to