Re: Searching Across Multiple Cores

Jonathan Rochkind Thu, 14 Oct 2010 10:58:04 -0700

The point/use-case of sharding/distributed search is for performance,not for segregating different data in different places. Distributedsearch assumes the same schema in each shard -- do you have that?

I don't think distributed search means to support the kind of "joining"you describe, that's not really what Solr does.

But if you actually do have the same schema accross your shards, andhave distributed search set up properly -- then you don't need to do anyspecial "joining", the shards end up forming one 'logical' index, that'sthe point of it. I don't think you can do what you describe. Solrdoesn't do "joins" like an rdbms, Solr works on a single set ofdocuments, not multiple "tables" or "collections".If you describe your data and the kind of queries you want to run,someone might be able to figure out a way to "de-normalize" the data tosupport what you want to do. Which won't really have anything to dowith shards/distributed search -- you add in distributed search forperformance or giant-size-of-index purposes, but it doesn't change yourschema design or queries.


Lohrenz, Steven wrote:

Ken,Ok, I understand how the distributed search works, but I don't understand how to build my query appropriately so that the results returned from the two shards only return values that exist in both result sets.In essence, I'm doing a join across the two shards on the resourceId.

So Core0 has:
resourceId (unique key)

titletag1tag2tag3


And Core1 has:
resourceId + folder + userId + grade (concatenated - this is the uniqueId)
resourceId
folder
userId
grade

For example, I would want to find all the content with userId = 893489 and tag1 = 'contentTagX'.My thought of how to do this is to search Core1 for all the items with userId = 893489. This would return a set of results for that user with resourceId. Then I would need to search Core0 for where tag1 = 'contentTagX' and where resourceId = those returned in the result set from Core1.

I can probably do this in a search handler (say Core3 with a mashup of the 2 
schemas but just redirects to the other shards), but is there an easier way to 
do it?

Or am I missing something?

Thanks for your help,
Steve


-----Original Message-----

From: Ken Stanley [mailto:doh...@gmail.com]Sent: 14 October 2010 18:19

To: solr-user@lucene.apache.org
Subject: Re: Searching Across Multiple Cores

Steve,

Using shards is actually quite simple; it's just a matter of setting up your
shards (via multiple cores, or multiple instances of SOLR) and then passing
the shards parameter in the query string. The shards parameter is a
comma-separated list of the servers/cores you wish to use together.

So, let's try this using a fictitious example. You have two cores, one
called main for your main data set of metadata and favorites for your user
favorites meta data. You set up each schema accordingly, and you've indexed
your data. When you want to do a query on both sets of data you would build
your query appropriately, and then use the following URL (the host is
assumed to be localhost for simplicity):

http://localhost/solr/main/select?q=id:[*+TO+*]&shards=localhost/solr/main,localhost/solr/favorites&rows=100&start=0

I am personally investigating using this technique to tie together two cores
that utilize different schemas; one schema will contain news articles,
blogs, and similar types of data, while another schema will contain
company-specific information, such as addresses, etc. If you're still having
trouble after trying this, let me know and I'd be more than happy to share
any findings that I come across.

I hope that this helps to clear things up for you. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"

On Thu, Oct 14, 2010 at 4:25 AM, Lohrenz, Steven
<steven.lohr...@hmhpub.com>wrote:

Ken,

I have been through that page many times. I could use Distributed search
for what? The first scenario or the second?

The question is: can I merge a set of results from the two cores/shards and
only return results that exist in both (determined by the resourceId, which
exists on both)?

Cheers,
Steve

-----Original Message-----
From: Ken Stanley [mailto:doh...@gmail.com]
Sent: 13 October 2010 20:08
To: solr-user@lucene.apache.org
Subject: Re: Searching Across Multiple Cores

On Wed, Oct 13, 2010 at 2:11 PM, Lohrenz, Steven
<steven.lohr...@hmhpub.com>wrote:

Hi,

I am trying to figure out if how I can accomplish the following:

I have a fairly static and large set of resources I need to have indexed
and searchable. Solr seems to be a perfect fit for that. In addition I

need

to have the ability for my users to add resources from the main data set

to

a 'Favourites' folder (which can include a few more tags added by them).

The

Favourites needs to be searchable in the same manner as the main data

set,

across all the same fields.

My first thought was to have two separate schemas
- the first  for the main data set and its metadata
- the second for the Favourites folder with all of the metadata from the
main set copied over and then adding the additional fields.

Then I thought that would probably waste quite a bit of space (the number
of users is much larger than the number of main resources).

So then I thought I could have the main data set with its metadata. Then
there would be second one for the Favourites folder with the unique id

from

the first and the additional fields it needs (userId, grade, folder,

tag).

In addition, I would create another schema/core with all the fields from

the

other two and have a request handler defined on it that searches across

the

other 2 cores and returns the results through this core.

This third core would have searches run against it where the results

would

expect to only be returned for a single user. For example, a user

searches

their Favourites folder for all the items with Foo. The result is only

those

items the user has added to their Favourites with Foo somewhere in their
main data set metadata.

Could this be made to work? What would the consequences be? Any

alternative

suggestions?

Thanks,
Steve

Steve,

From your description, it really sounds like you could reap the benefits of
using Distributed Search in SOLR:

http://wiki.apache.org/solr/DistributedSearch

I hope that this helps.

- Ken

Re: Searching Across Multiple Cores

Reply via email to