Hi Otis, after some thought (I must have been sleeping or something) it seems that it is indeed possible to remove the 2000 product-variant fields from the index and store them in an external store. I was doubting this option before as I mistakingly thought that I would still need to have the 2000 stored fields in place to store the product-variant keys for accessing the database. However I have some way of identifying the product-variants client-side, once Solr returns the products.
This however makes that an external datastore must have 1 row per product-variant. Having an upper-range of about 200.000 products and up to 2000 product variants per product this would give a maximum of 400.000.000product-variant records in the external datastore. I really don't have a clue about possible performance given these numbers but it sounds rather large to me, although it may sound peanuts to you ;-) . The query would be to return 10 rows based on 10 product-variant id's. Any rough guestimates whether this sounds doable? I guess I'm just going to find out. Thanks for helping me think out of the box! Geert-Jan 2008/1/2, Otis Gospodnetic <[EMAIL PROTECTED]>: > > Maybe I'm not following your situation 100%, but it sounded like pulling > the values of purely stored fields is the slow part. *Perhaps* using a > non-Lucene data store just for the saved fields would be faster. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Geert-Jan Brits <[EMAIL PROTECTED] > > To: solr-user@lucene.apache.org > Sent: Monday, December 31, 2007 8:49:43 AM > Subject: Re: big perf-difference between solr-server vs. SOlrJ > req.process(solrserver) > > > Hi Otis, > > I don't really see how this would minimize my number of fields. > At the moment I have 1 pricefield (stored / indexed) and 1 multivalued > field > (stored) per product-variant. I have about 2000 product variants. > > I could indeed replace each multivalued field by a singlevaluedfield > with an > id pointing to a external store, where I get the needed fields. However > this > would not change the number of fields in my index (correct?) and thus > wouldn't matter for the big scanning-time I'm seeing. Moreover, it > wouldn't > matter for the query-time either I guess. > > Thanks, > Geert-Jan > > > > > > 2007/12/29, Otis Gospodnetic < [EMAIL PROTECTED]>: > > > > Hi Geert-Jan, > > > > Have you considered storing this data in an external data store and > not > > Lucene index? In other words, use the Lucene index only to index the > > content you need to search. Then, when you search this index, just > pull out > > the single stored fields, the unique ID for each of top N hits, and > use > > those ID to pull the actual content for display purposes from the > external > > store. This external store could be a RDBMS, an ODBMS, a BDB, etc. > I've > > worked with very large indices where we successfully used BDBs for > this > > purpose. > > > > Otis > > > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Geert-Jan Brits < [EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Thursday, December 27, 2007 11:44:13 AM > > Subject: Re: big perf-difference between solr-server vs. SOlrJ > req.process > > (solrserver) > > > > yeah, that makes sense. > > so, in in all, could scanning all the fields and loading the 10 > fields > > add > > up to cost about the same or even more as performing the intial > query? > > (Just > > making sure) > > > > I am wondering if the following change to the schema would help in > this > > case: > > > > current setup: > > It's possible to have up to 2000 product-variants. > > each product-variant has: > > - 1 price field (stored / indexed) > > - 1 multivalued field which contains product-variant characteristics > > (strored / not indexed). > > > > This adds up to the 4000 fields described. Moreover there are some > > fields on > > the product level but these would contibute just a tiny bit to the > > overall > > scanning / loading costs (about 50 -stored and indexed- fields in > > total) > > > > possible new setup (only the changes) : > > - index but not store the price-field. > > - store the price as just another one of the product-variant > > characteristics > > in the multivalued product-variant field. > > > > as a result this would bring back the maximum number of stored fields > > to > > about 2050 from 4050 and thereby about halving scanning / loading > costs > > while leaving the current quering-costs intact. > > Indexing costs would increase a bit. > > > > Would you expect the same performance gain? > > > > Thanks, > > Geert-Jan > > > > 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>: > > > > > > On Dec 27, 2007 11:01 AM, Britske < [EMAIL PROTECTED]> wrote: > > > > after inspecting solrconfig.xml I see that I already have enabled > > lazy > > > field > > > > loading by: > > > > <enableLazyFieldLoading>true</enableLazyFieldLoading> (I guess it > > was > > > > enabled by default) > > > > > > > > Since any query returns about 10 fields (which differ from query > to > > > query) , > > > > would this mean that only these 10 of about 2000-4000 fields are > > > retrieved / > > > > loaded? > > > > > > Yes, but that's not the whole story. > > > Lucene stores all of the fields back-to-back with no index (there > is > > > no random access to particular stored fields)... so all of the > fields > > > must be at least scanned. > > > > > > -Yonik > > > > > > > > > > > > > > >