Re: Solr 4.0 - Join performance

Smiley, David W. Tue, 14 Aug 2012 13:15:58 -0700

Stepping back a bit, the reason you are using multiple cores with a join is 
because Solr doesn't have a multi-valued numeric range type.  The spatial work 
I'm doing in Lucene-spatial does, and it's 2-dimensional for an x & y whereas 
your case calls for one dimension.  It's taking a bit of time, but when 
finished you should be able to use it for your use case ignoring the 'y'.  
Eventually I'd like to develop  such a Solr field type for a numeric/time range 
to do it more natively but that's a ways off.


Cheers,
  ~ David Smiley

On Aug 2, 2012, at 10:45 AM, Eric Khoury wrote:

> 
> 
> 
> 
> 
> 
> Hello all,
> 
> 
> 
> I’m testing out the new join feature, hitting some perf
> issues, as described in Erick’s article 
> (http://architects.dzone.com/articles/solr-experimenting-join).
> 
> Basically, I’m using 2 objects in solr (this is a simplified
> view):
> 
> 
> 
> Item
> 
> - Id
> 
> - Name
> 
> 
> 
> Grant
> 
> - ItemId
> 
> - AvailabilityStartTime
> 
> - AvailabilityEndTime
> 
> 
> 
> Each item can have multiple grants attached to it.
> 
> 
> 
> The query I'm using is the following, to find items by
> name, filtered by grants availability window:
> 
> 
> 
> solr/select?fq=Name:XXX&q={!join
> from=ItemId to=Id} AvailabilityStartTime:[* TO NOW] AND 
> -AvailabilityEndTime:[*
> TO NOW]
> 
> 
> 
> With a hundred thousand items, this query can take multiple seconds
> to perform, due to the large number or ItemIds returned from the join query.
> 
> Has anyone come up with a better way to use joins for these types of queries? 
>  Are there improvements planned in 4.0 rtm in this area?
> 
> 
> 
> Btw, I’ve explored simply adding Start-End times to items, but
> the flat data model makes it hard to maintain start-end pairs.
> 
> 
> 
> Thanks for the help!
> 
> Eric.
> 
> 
> 
>

Re: Solr 4.0 - Join performance

Reply via email to