Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

2007-12-31 Thread Geert-Jan Brits
Hi Otis,

I don't really see how this would minimize my number of fields.
At the moment I have 1 pricefield (stored / indexed) and 1 multivalued field
(stored) per  product-variant. I have about 2000 product variants.

I could indeed replace each multivalued field by a singlevaluedfield with an
id pointing to a external store, where I get the needed fields. However this
would not change the number of fields in my index (correct?) and thus
wouldn't matter for the big scanning-time I'm seeing. Moreover, it wouldn't
matter for the query-time either I guess.

Thanks,
Geert-Jan





2007/12/29, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> Hi Geert-Jan,
>
> Have you considered storing this data in an external data store and not
> Lucene index?  In other words, use the Lucene index only to index the
> content you need to search.  Then, when you search this index, just pull out
> the single stored fields, the unique ID for each of top N hits, and use
> those ID to pull the actual content for display purposes from the external
> store.  This external store could be a RDBMS, an ODBMS, a BDB, etc.  I've
> worked with very large indices where we successfully used BDBs for this
> purpose.
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> - Original Message 
> From: Geert-Jan Brits <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, December 27, 2007 11:44:13 AM
> Subject: Re: big perf-difference between solr-server vs. SOlrJ req.process
> (solrserver)
>
> yeah, that makes sense.
> so, in in all, could scanning all the fields and loading the 10 fields
> add
> up to cost about the same or even more as performing the intial query?
> (Just
> making sure)
>
> I am wondering if the following change to the schema would help in this
> case:
>
> current setup:
> It's possible to have up to 2000 product-variants.
> each product-variant has:
> - 1 price field (stored / indexed)
> - 1 multivalued field which contains product-variant characteristics
> (strored / not indexed).
>
> This adds up to the 4000 fields described. Moreover there are some
> fields on
> the product level but these would contibute just a tiny bit to the
> overall
> scanning / loading costs (about 50 -stored and indexed- fields in
> total)
>
> possible new setup (only the changes) :
> - index but not store the price-field.
> - store the price as just another one of the product-variant
> characteristics
> in the multivalued product-variant field.
>
> as a result this would bring back the maximum number of stored fields
> to
> about 2050 from 4050 and thereby about halving scanning / loading costs
> while leaving the current quering-costs intact.
> Indexing costs would increase a bit.
>
> Would you expect the same performance gain?
>
> Thanks,
> Geert-Jan
>
> 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>:
> >
> > On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote:
> > > after inspecting solrconfig.xml I see that I already have enabled
> lazy
> > field
> > > loading by:
> > > true (I guess it
> was
> > > enabled by default)
> > >
> > > Since any query returns about 10 fields (which differ from query to
> > query) ,
> > > would this mean that only these 10 of about 2000-4000 fields are
> > retrieved /
> > > loaded?
> >
> > Yes, but that's not the whole story.
> > Lucene stores all of the fields back-to-back with no index (there is
> > no random access to particular stored fields)... so all of the fields
> > must be at least scanned.
> >
> > -Yonik
> >
>
>
>
>


Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

2007-12-31 Thread Britske

I imagine then that this "scanning-cost" is proportional to the number of
stored fields, correct? 

I tested this with generating a second index with 1/10th of the
product-variants (and thus 1/10th) of the stored fields. However I really
don't see the expected (at least by me) drop in post-processing time (which
includes lazy loading the needed fields and scanning all the stored fields).

Moreover, I realized that I'm using an xsl-transform in the post-processing
phase. This would contribute to the high cost I'm seeing as well I think.
Can this XSL-transform in general be considered small in relation to the
abovementioned costs?

Thanks, 
Geert-Jan 


Yonik Seeley wrote:
> 
> On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote:
>> after inspecting solrconfig.xml I see that I already have enabled lazy
>> field
>> loading by:
>> true (I guess it was
>> enabled by default)
>>
>> Since any query returns about 10 fields (which differ from query to
>> query) ,
>> would this mean that only these 10 of about 2000-4000 fields are
>> retrieved /
>> loaded?
> 
> Yes, but that's not the whole story.
> Lucene stores all of the fields back-to-back with no index (there is
> no random access to particular stored fields)... so all of the fields
> must be at least scanned.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-perf-difference-between-solr-server-vs.--SOlrJ-req.process%28solrserver%29-tp14513964p14557779.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

2007-12-31 Thread Yonik Seeley
On Dec 31, 2007 8:58 AM, Britske <[EMAIL PROTECTED]> wrote:
> Moreover, I realized that I'm using an xsl-transform in the post-processing
> phase. This would contribute to the high cost I'm seeing as well I think.
> Can this XSL-transform in general be considered small in relation to the
> abovementioned costs?

The XSLT transform could possibly be expensive.  The easiest thing is
to try it without the transform and see.

-Yonik


Re: Indexing multiple selects

2007-12-31 Thread Yonik Seeley
On Dec 30, 2007 11:43 PM, Gavin <[EMAIL PROTECTED]> wrote:
> Hi,
> In the web application we are developing a user can add the the
> proficiency of a given language such as
>
> English:-   Reading, Good
> Writing, Average
> Speaking, Good
>
> French:-Reading, Good
> Writing, Average
> Speaking, Good
>
> The user can add as many languages as he likes. The language is part 
> of
> the resume the user creates. I would like to store the resume and search
> the resumes. Please explain to me how I can add more than one language
> and reading, writing, speaking abilities also to the language.

There are many ways... the best way sort of depends on how you want to
search or facet.
You could have a single "language" field, with tokens like
English_Reading=Good as the value
You could have an English field with Reading=Good as the value
You could have an English_Reading field with Good as the value

So start from your use-cases (how you want to query the data... what
you want to get from Solr) and try to write down the solr queries that
would be used to accomplish those cases based on different ways of
storing/indexing the data.

-Yonik