Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

Otis Gospodnetic Sat, 29 Dec 2007 07:21:31 -0800

Hi Geert-Jan,

Have you considered storing this data in an external data store and not Lucene 
index?  In other words, use the Lucene index only to index the content you need 
to search.  Then, when you search this index, just pull out the single stored 
fields, the unique ID for each of top N hits, and use those ID to pull the 
actual content for display purposes from the external store.  This external 
store could be a RDBMS, an ODBMS, a BDB, etc.  I've worked with very large 
indices where we successfully used BDBs for this purpose.

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Geert-Jan Brits <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, December 27, 2007 11:44:13 AM
Subject: Re: big perf-difference between solr-server vs. SOlrJ 
req.process(solrserver)

yeah, that makes sense.
so, in in all, could scanning all the fields and loading the 10 fields
 add
up to cost about the same or even more as performing the intial query?
 (Just
making sure)

I am wondering if the following change to the schema would help in this
case:

current setup:
It's possible to have up to 2000 product-variants.
each product-variant has:
- 1 price field (stored / indexed)
- 1 multivalued field which contains product-variant characteristics
(strored / not indexed).

This adds up to the 4000 fields described. Moreover there are some
 fields on
the product level but these would contibute just a tiny bit to the
 overall
scanning / loading costs (about 50 -stored and indexed- fields in
 total)

possible new setup (only the changes) :
- index but not store the price-field.
- store the price as just another one of the product-variant
 characteristics
in the multivalued product-variant field.

as a result this would bring back the maximum number of stored fields
 to
about 2050 from 4050 and thereby about halving scanning / loading costs
while leaving the current quering-costs intact.
Indexing costs would increase a bit.

Would you expect the same performance gain?

Thanks,
Geert-Jan

2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>:
>
> On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote:
> > after inspecting solrconfig.xml I see that I already have enabled
 lazy
> field
> > loading by:
> > <enableLazyFieldLoading>true</enableLazyFieldLoading> (I guess it
 was
> > enabled by default)
> >
> > Since any query returns about 10 fields (which differ from query to
> query) ,
> > would this mean that only these 10 of about 2000-4000 fields are
> retrieved /
> > loaded?
>
> Yes, but that's not the whole story.
> Lucene stores all of the fields back-to-back with no index (there is
> no random access to particular stored fields)... so all of the fields
> must be at least scanned.
>
> -Yonik
>

Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

Reply via email to