Re: abt Multicore

Shalin Shekhar Mangar Mon, 17 Nov 2008 21:21:51 -0800

Some high level thoughts:

On Mon, Nov 17, 2008 at 11:10 PM, Nguyen, Joe <[EMAIL PROTECTED]>wrote:

> "Are all the documents in the same search space?  That is, for a given
> query, could any of the 10MM docs be returned?
>
> If so, I don't think you need to worry about multicore.  You may however
> need to put part of the index on various machines:
> http://wiki.apache.org/solr/DistributedSearch "
>
> I also try to make decision whether going with muticore or distributed
> search. My concern is as follow:
>
> Does that mean having a single big schema with lot of fields?

Yes and that's the use-case behind multi-valued fields. De-normalizing and
avoiding joins helps to scale.

> Distributed Search requires that each document must have a unique key.
> In this case, the unique key cannot be a primary key of a table.
>
> I wonder how Solr performs in this case (distributed search vs.
> multicore)
> 1.  Distributed Search
>    a.  All documents are in a single index.  Indexing a single document
> would lock the index and affect query performance?

Indexing does not lock out searchers. Solr is designed to be queried
regardless of indexing. However, depending on your machine's performance and
your configuration, you may see slow queries during commits/auto-warming.

Also, in distributed search, you have different Solr instances handling
disjoint sets of data. Indexing on one instance does not affect the rest.

>    b.  If multi machines are used, Solr will need to query each machine
> and merge the result.  This also could impact performance.

Yes, but in most scenarios where distributed search is required, it is just
not possible to use a single box for the while index. If you set out to
write similar kind of querying for multi-cores, it will be difficult to
optimize it as well as Solr's implementation.

>
>    C.  Support MoreLikeThis query given a document id.

MoreLikeThis is not implemented for distributed environments (yet).

>
> 2.  Multicore
>    a.  Each table will be associated with a single core.  Indexing a
> single document would lock only a specific core index.  Thus,quering
> documents on other cores won't be impacted.

With multi-core, all cores are on a single box, you may see slow queries on
other cores too (again, it depends on your box's strength).

>
>    B.  Querying documents across multicore must be handle by the
> caller.

That is not a use-case for which Lucene/Solr were designed. Joins are
discouraged most of the times.

>
>    C.  Can't support MoreLikeThis query since document id from one core
> has no meaning on other cores.

MoreLikeThis makes no sense in this case because the document structure
(schema) is totally different.

>
>
> -----Original Message-----
> From: Ryan McKinley [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 17, 2008 6:09 Joe
> To: solr-user@lucene.apache.org
> Subject: Re: abt Multicore
>
> Are all the documents in the same search space?  That is, for a given
> query, could any of the 10MM docs be returned?
>
> If so, I don't think you need to worry about multicore.  You may however
> need to put part of the index on various machines:
> http://wiki.apache.org/solr/DistributedSearch
>
> ryan
>
>
> On Nov 17, 2008, at 3:47 AM, Raghunandan Rao wrote:
>
> > Hi,
> >
> > I have an app running on weblogic and oracle. Oracle DB is quite huge;
>
> > say some 10 millions of records. I need to integrate Solr for this and
>
> > I am planning to use multicore. How can multicore feature can be at
> > the best?
> >
> >
> >
> > -Raghu
> >
>
>

-- 
Regards,
Shalin Shekhar Mangar.

Re: abt Multicore

Reply via email to