Hi Otis,
I don't really see how this would minimize my number of fields.
At the moment I have 1 pricefield (stored / indexed) and 1 multivalued field
(stored) per product-variant. I have about 2000 product variants.
I could indeed replace each multivalued field by a singlevaluedfield with an
id pointing to a external store, where I get the needed fields. However this
would not change the number of fields in my index (correct?) and thus
wouldn't matter for the big scanning-time I'm seeing. Moreover, it wouldn't
matter for the query-time either I guess.
Thanks,
Geert-Jan
2007/12/29, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> Hi Geert-Jan,
>
> Have you considered storing this data in an external data store and not
> Lucene index? In other words, use the Lucene index only to index the
> content you need to search. Then, when you search this index, just pull out
> the single stored fields, the unique ID for each of top N hits, and use
> those ID to pull the actual content for display purposes from the external
> store. This external store could be a RDBMS, an ODBMS, a BDB, etc. I've
> worked with very large indices where we successfully used BDBs for this
> purpose.
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message
> From: Geert-Jan Brits <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, December 27, 2007 11:44:13 AM
> Subject: Re: big perf-difference between solr-server vs. SOlrJ req.process
> (solrserver)
>
> yeah, that makes sense.
> so, in in all, could scanning all the fields and loading the 10 fields
> add
> up to cost about the same or even more as performing the intial query?
> (Just
> making sure)
>
> I am wondering if the following change to the schema would help in this
> case:
>
> current setup:
> It's possible to have up to 2000 product-variants.
> each product-variant has:
> - 1 price field (stored / indexed)
> - 1 multivalued field which contains product-variant characteristics
> (strored / not indexed).
>
> This adds up to the 4000 fields described. Moreover there are some
> fields on
> the product level but these would contibute just a tiny bit to the
> overall
> scanning / loading costs (about 50 -stored and indexed- fields in
> total)
>
> possible new setup (only the changes) :
> - index but not store the price-field.
> - store the price as just another one of the product-variant
> characteristics
> in the multivalued product-variant field.
>
> as a result this would bring back the maximum number of stored fields
> to
> about 2050 from 4050 and thereby about halving scanning / loading costs
> while leaving the current quering-costs intact.
> Indexing costs would increase a bit.
>
> Would you expect the same performance gain?
>
> Thanks,
> Geert-Jan
>
> 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>:
> >
> > On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote:
> > > after inspecting solrconfig.xml I see that I already have enabled
> lazy
> > field
> > > loading by:
> > > true (I guess it
> was
> > > enabled by default)
> > >
> > > Since any query returns about 10 fields (which differ from query to
> > query) ,
> > > would this mean that only these 10 of about 2000-4000 fields are
> > retrieved /
> > > loaded?
> >
> > Yes, but that's not the whole story.
> > Lucene stores all of the fields back-to-back with no index (there is
> > no random access to particular stored fields)... so all of the fields
> > must be at least scanned.
> >
> > -Yonik
> >
>
>
>
>