date:20080728

Exact match

2008-07-28 Thread Sunil

Hi,

I am sending a request to solr for exact match.

Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web
2.0" OR "Social Networking"))


But in the results I am getting stories matching "Social", "Web" etc.

Please let me know what's going wrong.

Thanks,
Sunil

Re: Exact match

2008-07-28 Thread Erik Hatcher

Look at what Solr returns when adding &debugQuery=true for the parsed  
query, and also consider how your fields are analyzed (their  
associated type, etc).


Erik


On Jul 28, 2008, at 4:56 AM, Sunil wrote:


Hi,

I am sending a request to solr for exact match.

Example: (title:("Web 2.0" OR "Social Networking") OR description:  
("Web

2.0" OR "Social Networking"))


But in the results I am getting stories matching "Social", "Web" etc.

Please let me know what's going wrong.

Thanks,
Sunil

RE: Exact match

2008-07-28 Thread Sunil

Both the fields are "text" type:

How "&debugQuery=true" will help? I am not familiar with the output.

Thanks,
Sunil

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 28, 2008 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact match

Look at what Solr returns when adding &debugQuery=true for the parsed  
query, and also consider how your fields are analyzed (their  
associated type, etc).

Erik

On Jul 28, 2008, at 4:56 AM, Sunil wrote:

> Hi,
>
> I am sending a request to solr for exact match.
>
> Example: (title:("Web 2.0" OR "Social Networking") OR description:  
> ("Web
> 2.0" OR "Social Networking"))
>
>
> But in the results I am getting stories matching "Social", "Web" etc.
>
> Please let me know what's going wrong.
>
> Thanks,
> Sunil
>

Re: Exact match

2008-07-28 Thread Erik Hatcher



On Jul 28, 2008, at 5:31 AM, Sunil wrote:


Both the fields are "text" type:





The definition of the field type is important - perhaps it is  
stripping "2.0"?   You can find out by using Solr analysis.jsp (see  
the Solr admin area in your installation).



How "&debugQuery=true" will help? I am not familiar with the output.


It provides, among other things, a parsed query and a toString of the  
query - both are useful in troubleshooting issues with queries not  
doing what you expect.   Couple that output with the analysis.jsp  
information and you should have the reason.


An exact match of an analyzed field is not generally possible - it'll  
be a phrase match, but not necessarily only matching strings that were  
fed in exactly as the values you're querying on.


Erik





Thanks,
Sunil

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, July 28, 2008 2:33 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact match

Look at what Solr returns when adding &debugQuery=true for the parsed
query, and also consider how your fields are analyzed (their
associated type, etc).

Erik


On Jul 28, 2008, at 4:56 AM, Sunil wrote:


Hi,

I am sending a request to solr for exact match.

Example: (title:("Web 2.0" OR "Social Networking") OR description:
("Web
2.0" OR "Social Networking"))


But in the results I am getting stories matching "Social", "Web" etc.

Please let me know what's going wrong.

Thanks,
Sunil

nested data structure definition

2008-07-28 Thread Ranjeet

Hi,

Can we defined nested data structure in schema.xml for searching? is it 
prossible or not?



Thanks & Regards,
Ranjeet Jha

Re: nested data structure definition

2008-07-28 Thread Shalin Shekhar Mangar

Hi Ranjeet,

Solr supports multi-valued fields and you can always denormalize your data.
Can you give more details on the problem you are trying to solve?

On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED]>wrote:

> Hi,
>
> Can we defined nested data structure in schema.xml for searching? is it
> prossible or not?
>
>
>
> Thanks & Regards,
> Ranjeet Jha

-- 
Regards,
Shalin Shekhar Mangar.

Re: nested data structure definition

2008-07-28 Thread Ranjeet


Hi,

In our case there is Category object under Catalog object, so I do not want 
to defined the data structure for the Category. I want to give the reference 
of Category uder Catalog, how can I do this.



Regards,
Ranjeet
- Original Message - 
From: "Shalin Shekhar Mangar" <[EMAIL PROTECTED]>

To: 
Sent: Monday, July 28, 2008 3:55 PM
Subject: Re: nested data structure definition



Hi Ranjeet,

Solr supports multi-valued fields and you can always denormalize your 
data.

Can you give more details on the problem you are trying to solve?

On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet 
<[EMAIL PROTECTED]>wrote:



Hi,

Can we defined nested data structure in schema.xml for searching? is it
prossible or not?



Thanks & Regards,
Ranjeet Jha





--
Regards,
Shalin Shekhar Mangar.

Re: nested data structure definition

2008-07-28 Thread Shalin Shekhar Mangar

Hi,

In Solr there is no hierarchy of objects. De-normalize everything into one
schema using multi-valued fields where applicable. Decide on what the
document should be. What do you want to return as individual results -- are
they catalogs or categories?

You can get more help if you give an example of what you are trying to
achieve.

On Mon, Jul 28, 2008 at 4:18 PM, Ranjeet <[EMAIL PROTECTED]>wrote:

> Hi,
>
> In our case there is Category object under Catalog object, so I do not want
> to defined the data structure for the Category. I want to give the reference
> of Category uder Catalog, how can I do this.
>
>
> Regards,
> Ranjeet
> - Original Message - From: "Shalin Shekhar Mangar" <
> [EMAIL PROTECTED]>
> To: 
> Sent: Monday, July 28, 2008 3:55 PM
> Subject: Re: nested data structure definition
>
>
>
>  Hi Ranjeet,
>>
>> Solr supports multi-valued fields and you can always denormalize your
>> data.
>> Can you give more details on the problem you are trying to solve?
>>
>> On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED]
>> >wrote:
>>
>>  Hi,
>>>
>>> Can we defined nested data structure in schema.xml for searching? is it
>>> prossible or not?
>>>
>>>
>>>
>>> Thanks & Regards,
>>> Ranjeet Jha
>>>
>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>


-- 
Regards,
Shalin Shekhar Mangar.

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

Shalin - yes the allfields field exists in my schema.xml file.  It is a field 
that has all of the text from all of the fields concatenated together into one 
field.

My spellCheckIndexDir is created and has 2 segment files, but I think the index 
is empty.  When I initiate the 1st spellcheck.build=true ... the results load 
immediately ... I would imagine some time delay as it builds the index.

Any other ideas?

Andrew

> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 25, 2008 3:35 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple search components in one handler - ie
> spellchecker
>
> On Sat, Jul 26, 2008 at 12:37 AM, Andrew Nagy
> <[EMAIL PROTECTED]>
> wrote:
>
> > Exactly - however the spellcheck component is not working for my
> setup.
> >  The spelling suggestions never show in the response.  I think I have
> the
> > solrconfig setup incorrectly.  Also my solr/data/spell index that is
> created
> > is empty.  Something is not configured correctly, any ideas?
> >
> > Andrew
> > 
> > From: Geoffrey Young [EMAIL PROTECTED]
> > Sent: Friday, July 25, 2008 3:04 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Multiple search components in one handler - ie
> spellchecker
> >
> > Andrew Nagy wrote:
> > > Thanks for getting back to me Geoff.  Although, that is pretty much
> > > what I have.  Maybe if I show my solrconfig someone might be able
> to
> > > point out what I have incorrect?  The problem is that nothing
> related
> > > to the spelling options are show in the results, just the normal
> > > expected search results.
> >
> > right.  the spellcheck component does not issue a separate query
> *after*
> > running the spellcheck, it merely offers suggestions in parallel with
> > your existing query.
> >
> > the results are more like
> >
> >   "below are the results for $query.  did you mean $suggestions?"
> >
> > HTH
> >
> > --Geoff
> >
> >
> >
> Is the allfields in your spell checker configuration in your
> schema.xml? Can
> you see the spellcheckIndexDir created inside the Solr's data
> directory?
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Unsure about omitNorms, termVectors...

2008-07-28 Thread Grant Ingersoll



On Jul 24, 2008, at 9:48 AM, Fuad Efendi wrote:


Hi,

It's unclear... found in schema.xml:


omitNorms: (expert) set to true to omit the norms associated with
  this field (this disables length normalization and index-time
  boosting for the field, and saves some memory).  Only full-text
  fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a  
given field.
  When using MoreLikeThis, fields used for similarity should be  
stored for

  best performance.


Questions:

omitNorms: do I need it for full-text fields even if I don't need  
index-time boosting? I don't want to boost text where keyword  
repeated several time. Is my understanding correct?


I'm not sure what you are asking  Do you mean you don't want term  
frequency factored in or you don't want length normalization and  
document/field boosting factored in?





termVectors: do I need it for MoreLikeThis only?


They can help speed up MLT, but are not required.  If they are not  
available, than MLT has to re-analyze the field.





What are memory requirements for Lucene caches warming up if I use  
term vectors and norms?


I don't believe Term Vectors are cached anywhere, other than via the  
OS.  I'd have to go dig around for norms info, or maybe someone else  
can chime in.


-Grant

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar

Can you show us the query you are issuing? Make sure you add spellcheck=true
to the query as a parameter to turn on spell checking.

On Mon, Jul 28, 2008 at 6:16 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote:

> Shalin - yes the allfields field exists in my schema.xml file.  It is a
> field that has all of the text from all of the fields concatenated together
> into one field.
>
> My spellCheckIndexDir is created and has 2 segment files, but I think the
> index is empty.  When I initiate the 1st spellcheck.build=true ... the
> results load immediately ... I would imagine some time delay as it builds
> the index.
>
> Any other ideas?
>
> Andrew
>
> > -Original Message-
> > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > Sent: Friday, July 25, 2008 3:35 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Multiple search components in one handler - ie
> > spellchecker
> >
> > On Sat, Jul 26, 2008 at 12:37 AM, Andrew Nagy
> > <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Exactly - however the spellcheck component is not working for my
> > setup.
> > >  The spelling suggestions never show in the response.  I think I have
> > the
> > > solrconfig setup incorrectly.  Also my solr/data/spell index that is
> > created
> > > is empty.  Something is not configured correctly, any ideas?
> > >
> > > Andrew
> > > 
> > > From: Geoffrey Young [EMAIL PROTECTED]
> > > Sent: Friday, July 25, 2008 3:04 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Multiple search components in one handler - ie
> > spellchecker
> > >
> > > Andrew Nagy wrote:
> > > > Thanks for getting back to me Geoff.  Although, that is pretty much
> > > > what I have.  Maybe if I show my solrconfig someone might be able
> > to
> > > > point out what I have incorrect?  The problem is that nothing
> > related
> > > > to the spelling options are show in the results, just the normal
> > > > expected search results.
> > >
> > > right.  the spellcheck component does not issue a separate query
> > *after*
> > > running the spellcheck, it merely offers suggestions in parallel with
> > > your existing query.
> > >
> > > the results are more like
> > >
> > >   "below are the results for $query.  did you mean $suggestions?"
> > >
> > > HTH
> > >
> > > --Geoff
> > >
> > >
> > >
> > Is the allfields in your spell checker configuration in your
> > schema.xml? Can
> > you see the spellcheckIndexDir created inside the Solr's data
> > directory?
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 10:09 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent problems (was: Multiple search
> components in one handler - ie spellchecker)
>
> Can you show us the query you are issuing? Make sure you add
> spellcheck=true
> to the query as a parameter to turn on spell checking.

http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=scandanava&spellcheck.build=true

Shows this:


0
73


...



Andrew

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar

Hi Andrew,

Your configuration which you specified in the earlier thread looks fine.
Your query is also ok. The complete lack of spell check results in the
response you pasted suggests that the SpellCheckComponent is not added to
the SearchHandler's list of components.

Can you check your solrconfig.xml again? I'm sorry but it doesn't seem like
a problem with the spell checker itself. Also check if there are any
exceptions in the Solr log/console.

On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote:

> > -Original Message-
> > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > Sent: Monday, July 28, 2008 10:09 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > components in one handler - ie spellchecker)
> >
> > Can you show us the query you are issuing? Make sure you add
> > spellcheck=true
> > to the query as a parameter to turn on spell checking.
>
>
> http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=scandanava&spellcheck.build=true
>
> Shows this:
> 
> 
> 0
> 73
> 
> 
> ...
> 
> 
>
> Andrew
>

-- 
Regards,
Shalin Shekhar Mangar.

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

I was just reviewing the solr logs and I noticed the following:

Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.component.SpellCheckComponent'

It looks like the SpellCheckComponent is not getting loaded.  What could cause 
this?  Im running the july25 nightly build.

Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir:
-rw-r--r--  1 root root  84199 Jul 25 08:14 apache-solr-common-nightly.jar
-rw-r--r--  1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar
-rw-r--r--  1 root root  46725 May 10  2007 commons-codec-1.3.jar
-rw-r--r--  1 root root  22017 Jan  6  2008 commons-csv-1.0-SNAPSHOT-r609327.jar
-rw-r--r--  1 root root  53082 Mar  1  2007 commons-fileupload-1.2.jar
-rw-r--r--  1 root root 305001 Sep 11  2007 commons-httpclient-3.1.jar
-rw-r--r--  1 root root  83613 Jun 15  2007 commons-io-1.3.1.jar
-rw-r--r--  1 root root  38015 Jun 14  2007 commons-logging-1.0.4.jar
-rw-r--r--  1 root root 249154 Sep 21  2007 junit-4.3.jar
-rw-r--r--  1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-dev.jar
-rw-r--r--  1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar
-rw-r--r--  1 root root  87390 Jun 19 13:46 lucene-highlighter-2.4-dev.jar
-rw-r--r--  1 root root  32693 Jun 19 13:46 lucene-queries-2.4-dev.jar
-rw-r--r--  1 root root  91029 Jun 19 13:46 lucene-snowball-2.4-dev.jar
-rw-r--r--  1 root root  18422 Jun 19 13:46 lucene-spellchecker-2.4-dev.jar
-rw-r--r--  1 root root 179348 Jun 14  2007 stax-1.2.0-dev.jar
-rw-r--r--  1 root root  25863 Jun 14  2007 stax-api-1.0.jar
-rw-r--r--  1 root root 128475 Jun 14  2007 stax-utils.jar

could I be missing a jar?

Thanks
Andrew

> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent problems (was: Multiple search
> components in one handler - ie spellchecker)
>
> Hi Andrew,
>
> Your configuration which you specified in the earlier thread looks
> fine.
> Your query is also ok. The complete lack of spell check results in the
> response you pasted suggests that the SpellCheckComponent is not added
> to
> the SearchHandler's list of components.
>
> Can you check your solrconfig.xml again? I'm sorry but it doesn't seem
> like
> a problem with the spell checker itself. Also check if there are any
> exceptions in the Solr log/console.
>
> On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy
> <[EMAIL PROTECTED]>wrote:
>
> > > -Original Message-
> > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, July 28, 2008 10:09 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > > components in one handler - ie spellchecker)
> > >
> > > Can you show us the query you are issuing? Make sure you add
> > > spellcheck=true
> > > to the query as a parameter to turn on spell checking.
> >
> >
> >
> http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc
> andanava&spellcheck.build=true
> >
> > Shows this:
> > 
> > 
> > 0
> > 73
> > 
> > 
> > ...
> > 
> > 
> >
> > Andrew
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

RE: solr synonyms behaviour

2008-07-28 Thread Laurent Gilles

Hi,

I was faced with the same issues reguarding multiwords synonyms
Let's say a synonyms list like:

club, bar, night cabaret

Now if we have a document containing "club", with the default synonyms
filter behaviour with expand=true, we will end up in the lucene index with a
document containing "club|bar|night cabaret".
So if the user search for "night", the query-time will search for "night" in
the index and will match our document since it had been "enriched" @
index-time, and it really contains the token "night".

The only valid solution I've founded was to create a field-type exclusively
used for synonyms search where: 

@IndexTime

@QueryTime

And with a customised synonyms file that looks like:

SYN_ID_1, club, bar, night cabaret

So for our document containing "club", the synonym filter at index time with
expand=false will replace every matching token/expression in the document
with the SYN_ID_1.

And at query time, when an user search for "night", since "night" is not
alone in synonyms definition, it will not be matched, even by "normal"
search, because every document containing "club" or "bar" would have been
"enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the
final indexed document will not contains isolated token from synonyms
expression that risks to be matched later without notice.

In order to match our document containing "club", the user HAVE TO type the
entire expression "night cabaret", and not only part of the expression.

Of course, as I said before, this field was exclusively used for synonym
matching, so it requires another field for normal full-text-stemmed search
to add normal results, this approach give us the opportunity to setup
Boosting separately on full-text-stemmed search VS synonyms search, let's
say :

"title_stem":"club"^100 OR "title_syns":"club"^10

I hope to have been clear, even if I dont believe to.. Fact is this
approach have fixed your problem, since we didn't what synonym matching if
the user only types part of synonymic expression.

Regards,
Laurent

-Message d'origine-
De : swarag [mailto:[EMAIL PROTECTED] 
Envoyé : vendredi 25 juillet 2008 23:48
À : solr-user@lucene.apache.org
Objet : Re: solr synonyms behaviour

swarag wrote:
> 
> 
> Yonik Seeley wrote:
>> 
>> On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]>
>> wrote:
>>> To my understanding, this means I am using synonyms at index time and
>>> NOT
>>> query time. And yet, I am still having these problems with synonyms.
>> 
>> Can you give a specific example?  Use debugQuery=true to see what the
>> resulting query is.
>> You can also use the admin analysis page to see what the output of the
>> index and query analyzers.
>> 
>> -Yonik
>> 
>> 
> 
> So it sounds like using the '=>' operator for synonyms that may or may not
> contain multiple words causes problems.  So I changed my synonyms.txt to
> the following:
> 
> club,bar,night cabaret
> 
> In schema.xml, I now have the following:
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true" expand="true"/>
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
>   
>   
>  words="stopwords.txt"/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
>   
> 
> 
> As you can see, 'night cabaret' is my only multi-word synonym term.
> Searches for 'bar' and 'club' now behave as expected.  However, if I
> search for JUST 'night' or JUST 'cabaret', it looks like it is still using
> the synonyms 'bar' and 'club', which is not what is desired.  I only want
> 'bar' and 'club' to be returned if a search for the complete 'night
> cabaret' is submitted.
> 
> Since query-time synonyms is turned "off", the resulting
> parsedquery_toString is simply "name:night", "name:cabaret", etc...
> 
> Thanks!
> 

We are still having problems. Searches for single words that are part of a
multi-word synonym seem to be affected by the synonyms, when they should
not.  Anyone else experience this?  If not, would you mind explaining your
config and the format of your synonyms.txt file?
-- 
View this message in context:
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18660135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar

No, SpellCheckComponent was in the nightly long before July 25. There must
be a stack trace after that error message. Can you post that?

On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote:

> I was just reviewing the solr logs and I noticed the following:
>
> Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.component.SpellCheckComponent'
>
> It looks like the SpellCheckComponent is not getting loaded.  What could
> cause this?  Im running the july25 nightly build.
>
> Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir:
> -rw-r--r--  1 root root  84199 Jul 25 08:14 apache-solr-common-nightly.jar
> -rw-r--r--  1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar
> -rw-r--r--  1 root root  46725 May 10  2007 commons-codec-1.3.jar
> -rw-r--r--  1 root root  22017 Jan  6  2008
> commons-csv-1.0-SNAPSHOT-r609327.jar
> -rw-r--r--  1 root root  53082 Mar  1  2007 commons-fileupload-1.2.jar
> -rw-r--r--  1 root root 305001 Sep 11  2007 commons-httpclient-3.1.jar
> -rw-r--r--  1 root root  83613 Jun 15  2007 commons-io-1.3.1.jar
> -rw-r--r--  1 root root  38015 Jun 14  2007 commons-logging-1.0.4.jar
> -rw-r--r--  1 root root 249154 Sep 21  2007 junit-4.3.jar
> -rw-r--r--  1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-dev.jar
> -rw-r--r--  1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar
> -rw-r--r--  1 root root  87390 Jun 19 13:46 lucene-highlighter-2.4-dev.jar
> -rw-r--r--  1 root root  32693 Jun 19 13:46 lucene-queries-2.4-dev.jar
> -rw-r--r--  1 root root  91029 Jun 19 13:46 lucene-snowball-2.4-dev.jar
> -rw-r--r--  1 root root  18422 Jun 19 13:46 lucene-spellchecker-2.4-dev.jar
> -rw-r--r--  1 root root 179348 Jun 14  2007 stax-1.2.0-dev.jar
> -rw-r--r--  1 root root  25863 Jun 14  2007 stax-api-1.0.jar
> -rw-r--r--  1 root root 128475 Jun 14  2007 stax-utils.jar
>
> could I be missing a jar?
>
> Thanks
> Andrew
>
> > -Original Message-
> > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > Sent: Monday, July 28, 2008 11:24 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > components in one handler - ie spellchecker)
> >
> > Hi Andrew,
> >
> > Your configuration which you specified in the earlier thread looks
> > fine.
> > Your query is also ok. The complete lack of spell check results in the
> > response you pasted suggests that the SpellCheckComponent is not added
> > to
> > the SearchHandler's list of components.
> >
> > Can you check your solrconfig.xml again? I'm sorry but it doesn't seem
> > like
> > a problem with the spell checker itself. Also check if there are any
> > exceptions in the Solr log/console.
> >
> > On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy
> > <[EMAIL PROTECTED]>wrote:
> >
> > > > -Original Message-
> > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > > > Sent: Monday, July 28, 2008 10:09 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > > > components in one handler - ie spellchecker)
> > > >
> > > > Can you show us the query you are issuing? Make sure you add
> > > > spellcheck=true
> > > > to the query as a parameter to turn on spell checking.
> > >
> > >
> > >
> > http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc
> > andanava&spellcheck.build=true
> > >
> > > Shows this:
> > > 
> > > 
> > > 0
> > > 73
> > > 
> > > 
> > > ...
> > > 
> > > 
> > >
> > > Andrew
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

Hmm ... sorry, that was the output of a java program that uses solr that I ran 
and noticed the error.  That error doesn't happen when I start solr.  Sorry for 
the confusion.

I just changed my schema to have a dedicated field for spelling called 
"spelling" and I created a new field type for the spellcheck component called 
"textSpell".
Here is the segment of my solrconfig.xml:


  
spelling
0.7 
./spellchecker
  
  textSpell



  
explicit
  
  
spellcheck

  


I will need to reindex my documents again - I will check to see if that has any 
effect on my problem.

Andrew


> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent problems (was: Multiple search
> components in one handler - ie spellchecker)
>
> No, SpellCheckComponent was in the nightly long before July 25. There
> must
> be a stack trace after that error message. Can you post that?
>
> On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy
> <[EMAIL PROTECTED]>wrote:
>
> > I was just reviewing the solr logs and I noticed the following:
> >
> > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException: Error loading class
> > 'org.apache.solr.handler.component.SpellCheckComponent'
> >
> > It looks like the SpellCheckComponent is not getting loaded.  What
> could
> > cause this?  Im running the july25 nightly build.
> >
> > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir:
> > -rw-r--r--  1 root root  84199 Jul 25 08:14 apache-solr-common-
> nightly.jar
> > -rw-r--r--  1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar
> > -rw-r--r--  1 root root  46725 May 10  2007 commons-codec-1.3.jar
> > -rw-r--r--  1 root root  22017 Jan  6  2008
> > commons-csv-1.0-SNAPSHOT-r609327.jar
> > -rw-r--r--  1 root root  53082 Mar  1  2007 commons-fileupload-
> 1.2.jar
> > -rw-r--r--  1 root root 305001 Sep 11  2007 commons-httpclient-
> 3.1.jar
> > -rw-r--r--  1 root root  83613 Jun 15  2007 commons-io-1.3.1.jar
> > -rw-r--r--  1 root root  38015 Jun 14  2007 commons-logging-1.0.4.jar
> > -rw-r--r--  1 root root 249154 Sep 21  2007 junit-4.3.jar
> > -rw-r--r--  1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-
> dev.jar
> > -rw-r--r--  1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar
> > -rw-r--r--  1 root root  87390 Jun 19 13:46 lucene-highlighter-2.4-
> dev.jar
> > -rw-r--r--  1 root root  32693 Jun 19 13:46 lucene-queries-2.4-
> dev.jar
> > -rw-r--r--  1 root root  91029 Jun 19 13:46 lucene-snowball-2.4-
> dev.jar
> > -rw-r--r--  1 root root  18422 Jun 19 13:46 lucene-spellchecker-2.4-
> dev.jar
> > -rw-r--r--  1 root root 179348 Jun 14  2007 stax-1.2.0-dev.jar
> > -rw-r--r--  1 root root  25863 Jun 14  2007 stax-api-1.0.jar
> > -rw-r--r--  1 root root 128475 Jun 14  2007 stax-utils.jar
> >
> > could I be missing a jar?
> >
> > Thanks
> > Andrew
> >
> > > -Original Message-
> > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, July 28, 2008 11:24 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > > components in one handler - ie spellchecker)
> > >
> > > Hi Andrew,
> > >
> > > Your configuration which you specified in the earlier thread looks
> > > fine.
> > > Your query is also ok. The complete lack of spell check results in
> the
> > > response you pasted suggests that the SpellCheckComponent is not
> added
> > > to
> > > the SearchHandler's list of components.
> > >
> > > Can you check your solrconfig.xml again? I'm sorry but it doesn't
> seem
> > > like
> > > a problem with the spell checker itself. Also check if there are
> any
> > > exceptions in the Solr log/console.
> > >
> > > On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > > -Original Message-
> > > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > > > > Sent: Monday, July 28, 2008 10:09 AM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > > > > components in one handler - ie spellchecker)
> > > > >
> > > > > Can you show us the query you are issuing? Make sure you add
> > > > > spellcheck=true
> > > > > to the query as a parameter to turn on spell checking.
> > > >
> > > >
> > > >
> > >
> http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc
> > > andanava&spellcheck.build=true
> > > >
> > > > Shows this:
> > > > 
> > > > 
> > > > 0
> > > > 73
> > > > 
> > > > 
> > > > ...
> > > > 
> > > > 
> > > >
> > > > Andrew
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

Well I will include the stack trace for the aforementioned error:

Jul 28, 2008 12:20:17 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class 
'org.apache.solr.handler.component.SpellCheckComponent'
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:227)
at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:232)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at 
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:565)
at org.apache.solr.core.SolrCore.(SolrCore.java:371)
at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95)
at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559)
Caused by: java.lang.ClassNotFoundException: 
org.apache.solr.handler.component.SpellCheckComponent
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:580)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:242)
at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:211)
... 7 more

The line 95 of MarcImporter.java (the solr import program I am using) is the 
instantiation of SolrCore.  So maybe somehow the spellCheckComponent is not 
getting loaded?

This is the error output I get thrown by instantiating SolrCore:
org.apache.solr.common.SolrException: Unknown Search Component: spellcheck
at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:597)
at 
org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:107)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:264)
at org.apache.solr.core.SolrCore.(SolrCore.java:398)
at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95)
at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559)

Andrew

> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent problems (was: Multiple search
> components in one handler - ie spellchecker)
>
> No, SpellCheckComponent was in the nightly long before July 25. There
> must
> be a stack trace after that error message. Can you post that?
>
> On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy
> <[EMAIL PROTECTED]>wrote:
>
> > I was just reviewing the solr logs and I noticed the following:
> >
> > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log
> > SEVERE: org.apache.solr.common.SolrException: Error loading class
> > 'org.apache.solr.handler.component.SpellCheckComponent'
> >
> > It looks like the SpellCheckComponent is not getting loaded.  What
> could
> > cause this?  Im running the july25 nightly build.
> >
> > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir:
> > -rw-r--r--  1 root root  84199 Jul 25 08:14 apache-solr-common-
> nightly.jar
> > -rw-r--r--  1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar
> > -rw-r--r--  1 root root  46725 May 10  2007 commons-codec-1.3.jar
> > -rw-r--r--  1 root root  22017 Jan  6  2008
> > commons-csv-1.0-SNAPSHOT-r609327.jar
> > -rw-r--r--  1 root root  53082 Mar  1  2007 commons-fileupload-
> 1.2.jar
> > -rw-r--r--  1 root root 305001 Sep 11  2007 commons-httpclient-
> 3.1.jar
> > -rw-r--r--  1 root root  83613 Jun 15  2007 commons-io-1.3.1.jar
> > -rw-r--r--  1 root root  38015 Jun 14  2007 commons-logging-1.0.4.jar
> > -rw-r--r--  1 root root 249154 Sep 21  2007 junit-4.3.jar
> > -rw-r--r--  1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-
> dev.jar
> > -rw-r--r--  1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar
> > -rw-r--r--  1 root root  87390 Jun 19 13:46 lucene-highlighter-2.4-
> dev.jar
> > -rw-r--r--  1 root root  32693 Jun 19 13:46 lucene-queries-2.4-
> dev.jar
> > -rw-r--r--  1 root root  91029 Jun 19 13:46 lucene-snowball-2.4-
> dev.jar
> > -rw-r--r--  1 root root  18422 Jun 19 13:46 lucene-spellchecker-2.4-
> dev.jar
> > -rw-r--r--  1 root root 179348 Jun 14  2007 stax-1.2.0-dev.jar
> > -rw-r--r--  1 root root  25863 Jun 14  2007 stax-api-1.0.jar
> > -rw-r--r--  1 root root 128475 Jun 14  2007 stax-utils.jar
> >
> > could I be missing a jar?
> >
> > Thanks
> > Andrew
> >
> > > -Original Message-
> > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > > Sent: Monday, July 28, 20

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar

Well that means the nightly solr jar you are using is older than you think
it is. Try running solr normally without the program and see if you can get
it working.

On Mon, Jul 28, 2008 at 9:54 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote:

> Well I will include the stack trace for the aforementioned error:
>
> Jul 28, 2008 12:20:17 PM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Error loading class
> 'org.apache.solr.handler.component.SpellCheckComponent'
> at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:227)
>at
> org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:232)
>at
> org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
>at
> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
>at
> org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:565)
>at org.apache.solr.core.SolrCore.(SolrCore.java:371)
>at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95)
>at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.solr.handler.component.SpellCheckComponent
>at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>at java.security.AccessController.doPrivileged(Native Method)
>at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:580)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>at java.lang.Class.forName0(Native Method)
>at java.lang.Class.forName(Class.java:242)
>at
> org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:211)
>... 7 more
>
> The line 95 of MarcImporter.java (the solr import program I am using) is
> the instantiation of SolrCore.  So maybe somehow the spellCheckComponent is
> not getting loaded?
>
> This is the error output I get thrown by instantiating SolrCore:
> org.apache.solr.common.SolrException: Unknown Search Component: spellcheck
>at
> org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:597)
>at
> org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:107)
>at
> org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:264)
>at org.apache.solr.core.SolrCore.(SolrCore.java:398)
>at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95)
>at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559)
>
> Andrew
>
> > -Original Message-
> > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> > Sent: Monday, July 28, 2008 12:07 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: SpellCheckComponent problems (was: Multiple search
> > components in one handler - ie spellchecker)
> >
> > No, SpellCheckComponent was in the nightly long before July 25. There
> > must
> > be a stack trace after that error message. Can you post that?
> >
> > On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy
> > <[EMAIL PROTECTED]>wrote:
> >
> > > I was just reviewing the solr logs and I noticed the following:
> > >
> > > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log
> > > SEVERE: org.apache.solr.common.SolrException: Error loading class
> > > 'org.apache.solr.handler.component.SpellCheckComponent'
> > >
> > > It looks like the SpellCheckComponent is not getting loaded.  What
> > could
> > > cause this?  Im running the july25 nightly build.
> > >
> > > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir:
> > > -rw-r--r--  1 root root  84199 Jul 25 08:14 apache-solr-common-
> > nightly.jar
> > > -rw-r--r--  1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar
> > > -rw-r--r--  1 root root  46725 May 10  2007 commons-codec-1.3.jar
> > > -rw-r--r--  1 root root  22017 Jan  6  2008
> > > commons-csv-1.0-SNAPSHOT-r609327.jar
> > > -rw-r--r--  1 root root  53082 Mar  1  2007 commons-fileupload-
> > 1.2.jar
> > > -rw-r--r--  1 root root 305001 Sep 11  2007 commons-httpclient-
> > 3.1.jar
> > > -rw-r--r--  1 root root  83613 Jun 15  2007 commons-io-1.3.1.jar
> > > -rw-r--r--  1 root root  38015 Jun 14  2007 commons-logging-1.0.4.jar
> > > -rw-r--r--  1 root root 249154 Sep 21  2007 junit-4.3.jar
> > > -rw-r--r--  1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-
> > dev.jar
> > > -rw-r--r--  1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar
> > > -rw-r--r--  1 root root  87390 Jun 19 13:46 lucene-highlighter-2.4-
> > dev.jar
> > > -rw-r--r--  1 root root  32693 Jun 19 13:46 lucene-queries-2.4-
> > dev.jar
> > > -rw-r--r--  1 root root  91029 Jun 19 13:46 lucene-snowball-2.4-
> > dev.jar
> > > -rw-r--r--  1 root root  18422 Jun 19 13:46 lucene-spellchecker-2.4-

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy

> -Original Message-
> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 28, 2008 12:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SpellCheckComponent problems (was: Multiple search
> components in one handler - ie spellchecker)
>
> Well that means the nightly solr jar you are using is older than you
> think
> it is. Try running solr normally without the program and see if you can
> get
> it working.

Well my import program has an older copy of the solr libs ...  so we can ignore 
that problem.

However my problem still stands when I run solr normally from my July25 
snapshot.  There are no errors - and no output to the solr logs when I post a 
query.

Have you or anyone been able to successfully add the spellcheckcomponent to the 
default select searchhandler?


Thanks
Andrew

Unsynchronized FIFOCache - 9x times performance boost on 8-CPU system

2008-07-28 Thread Fuad Efendi


Please see discussion at http://issues.apache.org/jira/browse/SOLR-665

Very simple:
map = new LinkedHashMap(initialSize, 0.75f, true)  - LRU Cache
(and we need synchronized get())

map = new LinkedHashMap(initialSize, 0.75f, false) - FIFO
(and we do not need synchronized get())

--
Thanks,
Fuad Efendi
http://www.linkedin.com/in/liferay

RE: nested data structure definition

2008-07-28 Thread Lance Norskog

If you want to think of Solr in database terms, it has only one table. The
fields in this table have very flexible type definitions. There can be many
optional fields. They also can have various indexes which used together can
search text in useful ways. 

If you want to model multiple tables, you have to denormalize them into one.
The optional fields feature can be useful here.

Lance

-Original Message-
From: Ranjeet [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 28, 2008 3:48 AM
To: solr-user@lucene.apache.org
Subject: Re: nested data structure definition

Hi,

In our case there is Category object under Catalog object, so I do not want
to defined the data structure for the Category. I want to give the reference
of Category uder Catalog, how can I do this.

Regards,
Ranjeet
- Original Message -
From: "Shalin Shekhar Mangar" <[EMAIL PROTECTED]>
To: 
Sent: Monday, July 28, 2008 3:55 PM
Subject: Re: nested data structure definition

> Hi Ranjeet,
>
> Solr supports multi-valued fields and you can always denormalize your 
> data.
> Can you give more details on the problem you are trying to solve?
>
> On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet 
> <[EMAIL PROTECTED]>wrote:
>
>> Hi,
>>
>> Can we defined nested data structure in schema.xml for searching? is it
>> prossible or not?
>>
>>
>>
>> Thanks & Regards,
>> Ranjeet Jha
>
>
>
>
> -- 
> Regards,
> Shalin Shekhar Mangar.
>

Multiple Update servers

2008-07-28 Thread Rakesh Godhani


Hi, we are currently evaluating Solr and have been browsing the archives for
one particular issue but can¹t seem to find the answer, so please forgive me
if I¹m asking a repetitive question.  We like the idea of having multiple
slave servers serving up queries and a master performing updates.  However
the the issue for us there is no redundancy for the master.  So a couple of
questions:

1. Can there be multiple masters (or update servers) sharing the same index
files, performing updates at the same time (ie. Hosting the index on a SAN)?

2. Is there a recommended architecture utilizing a SAN.   (For example 2
slaves and 2 masters sharing a SAN).  We current don¹t have that many
records  prob about a million and growing.  We are mainly concerned about
redundancy, then performance.

Thanks 
-Rakesh

big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske


Hi all,

For some queries I need to return a lot of rows at once (say 100). 
When performing these queries I notice a big difference between qTime (which
is mostly in the 15-30 ms range due to caching) and total time taken to
return the response (measured through SolrJ's elapsedTime), which takes
between 500-1600 ms. 

For queries which return less rows the difference becomes less big.

I presume (after reading some threads in the past) that this is due to solr
constructing and streaming the response (which includes retrieving the
stored fields) , which is something that is not calculated in qTime. 

Documents have a lot of stored fields (more than 10.000), but at any given
query a maximum of say 20 are returned (through fl-field ) or used (as part
of filtering, faceting, sorting)

I would have thought that enabling enableLazyFieldLoading for this situation
would mean a lot, since so many stored fields can be skipped, but I notice
no real difference in measuring total elapsed time (or qTime for that
matter). 

Am I missing something here? What criteria would need to be met for a field
to not be loaded for instance? Should I see a big performance boost in this
situation?

Thanks,
Britske
-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley

That high of a difference is due to the part of the index containing
these particular stored fields not being in OS cache.  What's the size
on disk of your index compared to your physical RAM?

-Yonik

On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> For some queries I need to return a lot of rows at once (say 100).
> When performing these queries I notice a big difference between qTime (which
> is mostly in the 15-30 ms range due to caching) and total time taken to
> return the response (measured through SolrJ's elapsedTime), which takes
> between 500-1600 ms.
>
> For queries which return less rows the difference becomes less big.
>
> I presume (after reading some threads in the past) that this is due to solr
> constructing and streaming the response (which includes retrieving the
> stored fields) , which is something that is not calculated in qTime.
>
> Documents have a lot of stored fields (more than 10.000), but at any given
> query a maximum of say 20 are returned (through fl-field ) or used (as part
> of filtering, faceting, sorting)
>
> I would have thought that enabling enableLazyFieldLoading for this situation
> would mean a lot, since so many stored fields can be skipped, but I notice
> no real difference in measuring total elapsed time (or qTime for that
> matter).
>
> Am I missing something here? What criteria would need to be met for a field
> to not be loaded for instance? Should I see a big performance boost in this
> situation?
>
> Thanks,
> Britske
> --
> View this message in context: 
> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske


Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters)
Physical RAM is 2 GB with -Xmx800M set to Solr. 


Yonik Seeley wrote:
> 
> That high of a difference is due to the part of the index containing
> these particular stored fields not being in OS cache.  What's the size
> on disk of your index compared to your physical RAM?
> 
> -Yonik
> 
> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:
>>
>> Hi all,
>>
>> For some queries I need to return a lot of rows at once (say 100).
>> When performing these queries I notice a big difference between qTime
>> (which
>> is mostly in the 15-30 ms range due to caching) and total time taken to
>> return the response (measured through SolrJ's elapsedTime), which takes
>> between 500-1600 ms.
>>
>> For queries which return less rows the difference becomes less big.
>>
>> I presume (after reading some threads in the past) that this is due to
>> solr
>> constructing and streaming the response (which includes retrieving the
>> stored fields) , which is something that is not calculated in qTime.
>>
>> Documents have a lot of stored fields (more than 10.000), but at any
>> given
>> query a maximum of say 20 are returned (through fl-field ) or used (as
>> part
>> of filtering, faceting, sorting)
>>
>> I would have thought that enabling enableLazyFieldLoading for this
>> situation
>> would mean a lot, since so many stored fields can be skipped, but I
>> notice
>> no real difference in measuring total elapsed time (or qTime for that
>> matter).
>>
>> Am I missing something here? What criteria would need to be met for a
>> field
>> to not be loaded for instance? Should I see a big performance boost in
>> this
>> situation?
>>
>> Thanks,
>> Britske
>> --
>> View this message in context:
>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley

That's a bit too tight to have *all* of the index cached...your best
bet is to go to 4GB+, or figure out a way not to have to retrieve so
many stored fields.

-Yonik

On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:
>
> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters)
> Physical RAM is 2 GB with -Xmx800M set to Solr.
>
>
> Yonik Seeley wrote:
>>
>> That high of a difference is due to the part of the index containing
>> these particular stored fields not being in OS cache.  What's the size
>> on disk of your index compared to your physical RAM?
>>
>> -Yonik
>>
>> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi all,
>>>
>>> For some queries I need to return a lot of rows at once (say 100).
>>> When performing these queries I notice a big difference between qTime
>>> (which
>>> is mostly in the 15-30 ms range due to caching) and total time taken to
>>> return the response (measured through SolrJ's elapsedTime), which takes
>>> between 500-1600 ms.
>>>
>>> For queries which return less rows the difference becomes less big.
>>>
>>> I presume (after reading some threads in the past) that this is due to
>>> solr
>>> constructing and streaming the response (which includes retrieving the
>>> stored fields) , which is something that is not calculated in qTime.
>>>
>>> Documents have a lot of stored fields (more than 10.000), but at any
>>> given
>>> query a maximum of say 20 are returned (through fl-field ) or used (as
>>> part
>>> of filtering, faceting, sorting)
>>>
>>> I would have thought that enabling enableLazyFieldLoading for this
>>> situation
>>> would mean a lot, since so many stored fields can be skipped, but I
>>> notice
>>> no real difference in measuring total elapsed time (or qTime for that
>>> matter).
>>>
>>> Am I missing something here? What criteria would need to be met for a
>>> field
>>> to not be loaded for instance? Should I see a big performance boost in
>>> this
>>> situation?
>>>
>>> Thanks,
>>> Britske
>>> --
>>> View this message in context:
>>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Mike Klaas

Another possibility is to partition the stored fields into a  
frequently-accessed set and a full set.  If the frequently-accessed  
set is significantly smaller (in terms of # bytes), then the documents  
will be tightly-packed on disk and the os caching will be much more  
effective given the same amount of ram.


The situation you are experiencing is one-seek-per-doc, which is  
performance death.


-Mike

On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote:


That's a bit too tight to have *all* of the index cached...your best
bet is to go to 4GB+, or figure out a way not to have to retrieve so
many stored fields.

-Yonik

On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:


Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that  
matters)

Physical RAM is 2 GB with -Xmx800M set to Solr.


Yonik Seeley wrote:


That high of a difference is due to the part of the index containing
these particular stored fields not being in OS cache.  What's the  
size

on disk of your index compared to your physical RAM?

-Yonik

On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:


Hi all,

For some queries I need to return a lot of rows at once (say 100).
When performing these queries I notice a big difference between  
qTime

(which
is mostly in the 15-30 ms range due to caching) and total time  
taken to
return the response (measured through SolrJ's elapsedTime), which  
takes

between 500-1600 ms.

For queries which return less rows the difference becomes less big.

I presume (after reading some threads in the past) that this is  
due to

solr
constructing and streaming the response (which includes  
retrieving the
stored fields) , which is something that is not calculated in  
qTime.


Documents have a lot of stored fields (more than 10.000), but at  
any

given
query a maximum of say 20 are returned (through fl-field ) or  
used (as

part
of filtering, faceting, sorting)

I would have thought that enabling enableLazyFieldLoading for this
situation
would mean a lot, since so many stored fields can be skipped, but I
notice
no real difference in measuring total elapsed time (or qTime for  
that

matter).

Am I missing something here? What criteria would need to be met  
for a

field
to not be loaded for instance? Should I see a big performance  
boost in

this
situation?

Thanks,
Britske
--
View this message in context:
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske

I'm on a development box currently and production servers will be bigger, but
at the same time the index will be too. 

Each query requests at most 20 stored fields. Why doesn't help
lazyfieldloading in this situation? 
I don't need to retrieve all stored fields and I thought I wasn't doing this
(through limiting the fields returned using the FL-param), but if I read
your comment correctly, apparently I am retrieving them all, I'm just not
displaying them all? 

Also, if I understand correctly, for optimal performance I need to have at
least enough RAM to put the entire Index size in OS cache (thus RAM) + the
amount of RAM that SOLR / Lucene consumes directly through the JVM? (which
among other things includes the Lucene field-cache + all of SOlr's caches on
top of that). 

I've never read the requirement of having the entire index in OS cache
before, is this because in normal situations (with less stored fields) it
doesn't matter much? I'm just surprised to hear of this for the first time,
since it will likely give a big impact on my design.

Luckily most of the normal queries return 10 documents each, which results
in a discrepancy between total elapsed time and qTIme of about 15-30 ms.
Doesn't this seem strange, since to me it would seem logical that the
discrepancy would be at least 1/10th of fetching 100 documents. 

hmm, hope you can shine some light on this,

Thanks a lot,
Britske

Yonik Seeley wrote:
> 
> That's a bit too tight to have *all* of the index cached...your best
> bet is to go to 4GB+, or figure out a way not to have to retrieve so
> many stored fields.
> 
> -Yonik
> 
> On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:
>>
>> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that
>> matters)
>> Physical RAM is 2 GB with -Xmx800M set to Solr.
>>
>>
>> Yonik Seeley wrote:
>>>
>>> That high of a difference is due to the part of the index containing
>>> these particular stored fields not being in OS cache.  What's the size
>>> on disk of your index compared to your physical RAM?
>>>
>>> -Yonik
>>>
>>> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:

 Hi all,

 For some queries I need to return a lot of rows at once (say 100).
 When performing these queries I notice a big difference between qTime
 (which
 is mostly in the 15-30 ms range due to caching) and total time taken to
 return the response (measured through SolrJ's elapsedTime), which takes
 between 500-1600 ms.

 For queries which return less rows the difference becomes less big.

 I presume (after reading some threads in the past) that this is due to
 solr
 constructing and streaming the response (which includes retrieving the
 stored fields) , which is something that is not calculated in qTime.

 Documents have a lot of stored fields (more than 10.000), but at any
 given
 query a maximum of say 20 are returned (through fl-field ) or used (as
 part
 of filtering, faceting, sorting)

 I would have thought that enabling enableLazyFieldLoading for this
 situation
 would mean a lot, since so many stored fields can be skipped, but I
 notice
 no real difference in measuring total elapsed time (or qTime for that
 matter).

 Am I missing something here? What criteria would need to be met for a
 field
 to not be loaded for instance? Should I see a big performance boost in
 this
 situation?

 Thanks,
 Britske
 --
 View this message in context:
 http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
 Sent from the Solr - User mailing list archive at Nabble.com.

>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley

On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote:
> Each query requests at most 20 stored fields. Why doesn't help
> lazyfieldloading in this situation?

It's the disk seek that kills you... loading 1 byte or 1000 bytes per
document would be about the same speed.

> Also, if I understand correctly, for optimal performance I need to have at
> least enough RAM to put the entire Index size in OS cache (thus RAM) + the
> amount of RAM that SOLR / Lucene consumes directly through the JVM?

The normal usage is to just retrieve the stored fields for the top 10
(or a window of 10 or 20) documents.  Under this scenario, the
slowdown from not having all of the stored fields cached is usually
acceptable.  Faster disks (seek time) can also help.

> Luckily most of the normal queries return 10 documents each, which results
> in a discrepancy between total elapsed time and qTIme of about 15-30 ms.
> Doesn't this seem strange, since to me it would seem logical that the
> discrepancy would be at least 1/10th of fetching 100 documents.

Yes, in general 1/10th the cost is what one would expect on average.
But some of the docs you are trying to retrieve *will* be in cache, so
it's hard to control this test.
You could try forcing the index out of memory by "cat"ing some other
big files multiple times and then re-trying or do a reboot to be
sure.

-Yonik

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Grant Ingersoll


What version of Solr/Lucene are you using?

On Jul 28, 2008, at 4:53 PM, Britske wrote:



I'm on a development box currently and production servers will be  
bigger, but

at the same time the index will be too.

Each query requests at most 20 stored fields. Why doesn't help
lazyfieldloading in this situation?
I don't need to retrieve all stored fields and I thought I wasn't  
doing this
(through limiting the fields returned using the FL-param), but if I  
read
your comment correctly, apparently I am retrieving them all, I'm  
just not

displaying them all?

Also, if I understand correctly, for optimal performance I need to  
have at
least enough RAM to put the entire Index size in OS cache (thus RAM)  
+ the
amount of RAM that SOLR / Lucene consumes directly through the JVM?  
(which
among other things includes the Lucene field-cache + all of SOlr's  
caches on

top of that).

I've never read the requirement of having the entire index in OS cache
before, is this because in normal situations (with less stored  
fields) it
doesn't matter much? I'm just surprised to hear of this for the  
first time,

since it will likely give a big impact on my design.

Luckily most of the normal queries return 10 documents each, which  
results
in a discrepancy between total elapsed time and qTIme of about 15-30  
ms.

Doesn't this seem strange, since to me it would seem logical that the
discrepancy would be at least 1/10th of fetching 100 documents.

hmm, hope you can shine some light on this,

Thanks a lot,
Britske



Yonik Seeley wrote:


That's a bit too tight to have *all* of the index cached...your best
bet is to go to 4GB+, or figure out a way not to have to retrieve so
many stored fields.

-Yonik

On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:


Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that
matters)
Physical RAM is 2 GB with -Xmx800M set to Solr.


Yonik Seeley wrote:


That high of a difference is due to the part of the index  
containing
these particular stored fields not being in OS cache.  What's the  
size

on disk of your index compared to your physical RAM?

-Yonik

On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:


Hi all,

For some queries I need to return a lot of rows at once (say 100).
When performing these queries I notice a big difference between  
qTime

(which
is mostly in the 15-30 ms range due to caching) and total time  
taken to
return the response (measured through SolrJ's elapsedTime),  
which takes

between 500-1600 ms.

For queries which return less rows the difference becomes less  
big.


I presume (after reading some threads in the past) that this is  
due to

solr
constructing and streaming the response (which includes  
retrieving the
stored fields) , which is something that is not calculated in  
qTime.


Documents have a lot of stored fields (more than 10.000), but at  
any

given
query a maximum of say 20 are returned (through fl-field ) or  
used (as

part
of filtering, faceting, sorting)

I would have thought that enabling enableLazyFieldLoading for this
situation
would mean a lot, since so many stored fields can be skipped,  
but I

notice
no real difference in measuring total elapsed time (or qTime for  
that

matter).

Am I missing something here? What criteria would need to be met  
for a

field
to not be loaded for instance? Should I see a big performance  
boost in

this
situation?

Thanks,
Britske
--
View this message in context:
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context:
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske


Thanks for clearing that up for me.
I'm going to investigate some more...



Yonik Seeley wrote:
> 
> On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote:
>> Each query requests at most 20 stored fields. Why doesn't help
>> lazyfieldloading in this situation?
> 
> It's the disk seek that kills you... loading 1 byte or 1000 bytes per
> document would be about the same speed.
> 
>> Also, if I understand correctly, for optimal performance I need to have
>> at
>> least enough RAM to put the entire Index size in OS cache (thus RAM) +
>> the
>> amount of RAM that SOLR / Lucene consumes directly through the JVM?
> 
> The normal usage is to just retrieve the stored fields for the top 10
> (or a window of 10 or 20) documents.  Under this scenario, the
> slowdown from not having all of the stored fields cached is usually
> acceptable.  Faster disks (seek time) can also help.
> 
>> Luckily most of the normal queries return 10 documents each, which
>> results
>> in a discrepancy between total elapsed time and qTIme of about 15-30 ms.
>> Doesn't this seem strange, since to me it would seem logical that the
>> discrepancy would be at least 1/10th of fetching 100 documents.
> 
> Yes, in general 1/10th the cost is what one would expect on average.
> But some of the docs you are trying to retrieve *will* be in cache, so
> it's hard to control this test.
> You could try forcing the index out of memory by "cat"ing some other
> big files multiple times and then re-trying or do a reboot to be
> sure.
> 
> -Yonik
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p1861.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske


I'm using the solr-nightly of 2008-04-05



Grant Ingersoll-6 wrote:
> 
> What version of Solr/Lucene are you using?
> 
> On Jul 28, 2008, at 4:53 PM, Britske wrote:
> 
>>
>> I'm on a development box currently and production servers will be  
>> bigger, but
>> at the same time the index will be too.
>>
>> Each query requests at most 20 stored fields. Why doesn't help
>> lazyfieldloading in this situation?
>> I don't need to retrieve all stored fields and I thought I wasn't  
>> doing this
>> (through limiting the fields returned using the FL-param), but if I  
>> read
>> your comment correctly, apparently I am retrieving them all, I'm  
>> just not
>> displaying them all?
>>
>> Also, if I understand correctly, for optimal performance I need to  
>> have at
>> least enough RAM to put the entire Index size in OS cache (thus RAM)  
>> + the
>> amount of RAM that SOLR / Lucene consumes directly through the JVM?  
>> (which
>> among other things includes the Lucene field-cache + all of SOlr's  
>> caches on
>> top of that).
>>
>> I've never read the requirement of having the entire index in OS cache
>> before, is this because in normal situations (with less stored  
>> fields) it
>> doesn't matter much? I'm just surprised to hear of this for the  
>> first time,
>> since it will likely give a big impact on my design.
>>
>> Luckily most of the normal queries return 10 documents each, which  
>> results
>> in a discrepancy between total elapsed time and qTIme of about 15-30  
>> ms.
>> Doesn't this seem strange, since to me it would seem logical that the
>> discrepancy would be at least 1/10th of fetching 100 documents.
>>
>> hmm, hope you can shine some light on this,
>>
>> Thanks a lot,
>> Britske
>>
>>
>>
>> Yonik Seeley wrote:
>>>
>>> That's a bit too tight to have *all* of the index cached...your best
>>> bet is to go to 4GB+, or figure out a way not to have to retrieve so
>>> many stored fields.
>>>
>>> -Yonik
>>>
>>> On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:

 Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that
 matters)
 Physical RAM is 2 GB with -Xmx800M set to Solr.


 Yonik Seeley wrote:
>
> That high of a difference is due to the part of the index  
> containing
> these particular stored fields not being in OS cache.  What's the  
> size
> on disk of your index compared to your physical RAM?
>
> -Yonik
>
> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:
>>
>> Hi all,
>>
>> For some queries I need to return a lot of rows at once (say 100).
>> When performing these queries I notice a big difference between  
>> qTime
>> (which
>> is mostly in the 15-30 ms range due to caching) and total time  
>> taken to
>> return the response (measured through SolrJ's elapsedTime),  
>> which takes
>> between 500-1600 ms.
>>
>> For queries which return less rows the difference becomes less  
>> big.
>>
>> I presume (after reading some threads in the past) that this is  
>> due to
>> solr
>> constructing and streaming the response (which includes  
>> retrieving the
>> stored fields) , which is something that is not calculated in  
>> qTime.
>>
>> Documents have a lot of stored fields (more than 10.000), but at  
>> any
>> given
>> query a maximum of say 20 are returned (through fl-field ) or  
>> used (as
>> part
>> of filtering, faceting, sorting)
>>
>> I would have thought that enabling enableLazyFieldLoading for this
>> situation
>> would mean a lot, since so many stored fields can be skipped,  
>> but I
>> notice
>> no real difference in measuring total elapsed time (or qTime for  
>> that
>> matter).
>>
>> Am I missing something here? What criteria would need to be met  
>> for a
>> field
>> to not be loaded for instance? Should I see a big performance  
>> boost in
>> this
>> situation?
>>
>> Thanks,
>> Britske
>> --
>> View this message in context:
>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>

 --
 View this message in context:
 http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>
>> -- 
>> View this message in context:
>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> L

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Steven A Rowe

Hi Frances,

HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an 
HTMLStripReader.

You could extend HTMLStripReader to not decode named character entities, e.g. 
by overriding HTMLStripReader.read() so that it calls an alternative 
readEntity(), which instead of converting entity references to characters would 
just leave the entity references as-is, something like:

public class MyHTMLStripReader extends HTMLStripReader {

  / override read() to call myReadEntity(), but no other changes
  public int read() throws IOException {
...
switch (ch) {
  case '&':
saveState();
ch = myReadEntity(); / Change this line to call new method
if (ch>=0) return ch;
if (ch==MISMATCH) {
  restoreState();
  return '&';
}
break;
  ...
}
  }

  private int myReadEntity() throws IOException {
int ch = next();
if (ch=='#') return readNumericEntity();
return MISMATCH;  / Always a mismatch, except for numeric entities
  }
}

Then you could create a new Factory, something like:

public class MyHTMLStripWhitespaceTokenizerFactory extends BaseTokenizerFactory 
{
  public TokenStream create(Reader input) {
return new WhitespaceTokenizer(new MyHTMLStripReader(input));
  }
}

Steve

On 07/24/2008 at 9:53 AM, F Knudson wrote:
> 
> Greetings:
> 
> I am working with many different data sources - some source
> employ "entity references" ; others do not.  My goal is to
> make the searching across sources as consistent as possible.
> 
> Example text -
> 
> Source1:   weakening Hδ absorption
> Source1:   zero-field gap ω
> 
> Source2:  weakening H delta absorption
> Source2:  zero-field gap omega
> 
> Using the tokenizer solr.HTMLStripWhitespaceTokenizerFactory
> for Source1 - the entity is replaced with the "named character
> entity" - This works great.
> 
> But I want the searching tokens to be identical for each
> source.  I need to capture δ  as a token.
> 
>  positionIncrementGap="100">
>   
>
> ignoreCase="true" expand="true"/>
>
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateA ll="0"/>
> 
> 
>   
> 
> 
> Is this possible with the SOLR supplied tokenizers?  I
> experimented with different combinations and orders and was
> not successful.
> 
> Is this possible using synonyms?  I also experimented with
> this route but again was not successful.
> 
> Do I need to create a custom tokenizer?
> 
> Thanks 
> Frances

Re: Expansion stemming

2008-07-28 Thread Chris Hostetter


: "Expansion stemming ? Takes a root word and 'expands' it to all of its
: various forms ? can be used either at insertion time or at query
: time."
: 
: How do I specify that I want the expansion stemming instead of the porter
: stemming?

there isn't anexpclit expansion stemming filter included with Solr.  As 
far as i know the only way to accomplish expansion stemming is with a 
dictionary of word mappings -- which could be achieved using the 
SynonymFilterFactory ... i've added a note aboutthis to the wiki.



-Hoss

Re: morphology and queryPrase

2008-07-28 Thread Chris Hostetter


: When i'm looking for words taking care of distance between them, i'm using
: lucene syntax "A B"~distance... unfortunaly if A leads to A1 and A2 forms i
: should split this into syntax +("A1 B"~dist "A2 B"~dist ") - this grows with
: progression depending of normal forms quantity of each term.
: 
: Can i search within distance using something like (+(A1 A2) +(B))~dist...
: i heard that dismax can handle distance between words ignoring quotes -
: could you advice in this?

Internally there are types of Lucene queries that can manage structure 
like what you are describing: SpanNearQuery being the most flexible, 
MultiPhraseQuery being less flexible but (in theory) faster.

Neither of these are directly usable from the query parser -- but you 
could write your own query parser (or custom request handler) that built 
them up.




-Hoss

Re: Best way to return ExternalFileField in the results

2008-07-28 Thread Chris Hostetter


: I've been trying to return a field of type ExternalFileField in the search
: result. Upon examining XMLWriter class, it seems like Solr can't do this out
: of the box. Therefore, I've tried to hack Solr to enable  this behaviour.
: The goal is to call to ExternalFileField.getValueSource(SchemaField
: field,QParser parser) in XMLWriter.writeDoc(String name, Document
: document,...) method. There are two issues with doing this:

Some of what you're specificly asking about could probably be achieved by 
modifying the XMLWriter constructor to hang on to the SolrCore associated 
with the request.

In general though i wondering if steping back a bit and modifying your 
request handler to use a SolrDocumentList where you've already flattened 
the ExternalFileField into each SolrDocument would be an easier approach 
-- then you wouldnt' need to modify the ResponseWriter at all.




-Hoss

Re: Unsure about omitNorms, termVectors...

2008-07-28 Thread Chris Hostetter


: > omitNorms: do I need it for full-text fields even if I don't need index-time
: > boosting? I don't want to boost text where keyword repeated several time. Is
: > my understanding correct?

if you omitNorms="true" then you not only lose index-time doc/field 
boosting, but you also loose lengthNorms -- it won't matter how long a 
field is, if a term occurs once in a 5 term field value it will score the 
same as if it appears once in a 5000 term field value.

if you don't wnat docs to score higher when the word is repeated omitNorms 
won't help you -- you'll need a custom similarity where you override the 
tf() method.

: > What are memory requirements for Lucene caches warming up if I use term
: > vectors and norms?
: 
: I don't believe Term Vectors are cached anywhere, other than via the OS.  I'd
: have to go dig around for norms info, or maybe someone else can chime in.

norms is one byte per doc per field.


-Hoss

Re: Best way to return ExternalFileField in the results

2008-07-28 Thread Ryan McKinley



In general though i wondering if steping back a bit and modifying your
request handler to use a SolrDocumentList where you've already  
flattened
the ExternalFileField into each SolrDocument would be an easier  
approach

-- then you wouldnt' need to modify the ResponseWriter at all.




Consider using a search component at the end of the chain that adds  
fields to your document...  this way things work for any writer (json,  
xml, whatever)


We really should add an example to do this... but in the meantime, a  
good example (though a bit complex) is with the local lucene:

http://sourceforge.net/projects/locallucene/

this adds a calculated distance to each document before it gets passed  
to the writer

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Chris Hostetter


: You could extend HTMLStripReader to not decode named character entities, 
: e.g. by overriding HTMLStripReader.read() so that it calls an 
: alternative readEntity(), which instead of converting entity references 
: to characters would just leave the entity references as-is, something 
: like:

Alternately: use SynonymFilterFactory to map any entity "names" to the 
real Unicode character so your "Source2" style docs get "omega" replaced 
with the same character the HTMLStrip*TokenizerFactories generate when 
they encounter the HTML entities.

generating the list of synonyms from the comment at the end of 
HTMLSripReader.java should be easy.


: > Source1:   weakening Hδ absorption
: > Source1:   zero-field gap ω
: > 
: > Source2:  weakening H delta absorption
: > Source2:  zero-field gap omega



-Hoss

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Mike Klaas



On 28-Jul-08, at 1:53 PM, Britske wrote:


Each query requests at most 20 stored fields. Why doesn't help
lazyfieldloading in this situation?


It does help, but not enough.  With lots of data per document and not  
a lot of memory, it becomes probabilistically likely that each doc  
resides in a separate uncached disk block, thus requiring a disk seek  
(~10ms), which then dominates total time regardless of the amount of  
bytes read.


I don't need to retrieve all stored fields and I thought I wasn't  
doing this
(through limiting the fields returned using the FL-param), but if I  
read
your comment correctly, apparently I am retrieving them all, I'm  
just not

displaying them all?


No, they are not read.  It is important to understand the performance  
characteristic of disks in random access vs. serial reading in this  
case.


Also, if I understand correctly, for optimal performance I need to  
have at
least enough RAM to put the entire Index size in OS cache (thus RAM)  
+ the
amount of RAM that SOLR / Lucene consumes directly through the JVM?  
(which
among other things includes the Lucene field-cache + all of SOlr's  
caches on

top of that).


Not necessarily all, no.  The type of data you store and the request  
characteristics affect the size of the "hot spot" of the index, the  
specific blocks that need to be in memory to achieve good  
performance.  If you are retrieving the stored fields for 100 docs per  
query, the doc data should probably be all in cache.  One way to  
mitigate this is to partition the fields like I suggested in the other  
reply.


-Mike

javax.xml.stream.XMLStreamException while indexing

2008-07-28 Thread Pieter Berkel

I've recently encountered a strange error while batch indexing around 500
average-sized documents:

HTTP Status 500 - null

javax.xml.stream.XMLStreamException
at com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3700)
at com.bea.xml.stream.MXParser.more(MXParser.java:3715)
at com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1756)
at com.bea.xml.stream.MXParser.next(MXParser.java:1333)
at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197)
at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1038)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.hyperic.hq.product.servlet.filter.JMXFilter.doFilter(JMXFilter.java:324)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685)
at java.lang.Thread.run(Thread.java:595)

Most other reports of this exception refer to an XML parse error on a
particular line / column, however this is not the case in this situation.
It doesn't seem to be a problem with the data either, since it fails on
different sets of documents on every occasion (i.e. I can't find specific
input data to reproduce this problem).  Increasing / decreasing the number
of documents still results in the same error.

The system I'm using consists of Solr 1.3 dev (compiled from SVN on
2008-07-21), Tomcat 5.5.23, and Sun Java SDK 1.5.0-11-1 running on Ubuntu
Server 7.10 with all current updates applied.  Has anybody else experienced
a similar problem to this? Would upgrading either Tomcat / Java help in this
instance?  Thanks in advance for any help.

regards,
Pieter

RE: solr synonyms behaviour

2008-07-28 Thread swarag


Hi Laurent


Laurent Gilles wrote:
> 
> Hi,
> 
> I was faced with the same issues reguarding multiwords synonyms
> Let's say a synonyms list like:
> 
> club, bar, night cabaret
> 
> Now if we have a document containing "club", with the default synonyms
> filter behaviour with expand=true, we will end up in the lucene index with
> a
> document containing "club|bar|night cabaret".
> So if the user search for "night", the query-time will search for "night"
> in
> the index and will match our document since it had been "enriched" @
> index-time, and it really contains the token "night".
> 
> The only valid solution I've founded was to create a field-type
> exclusively
> used for synonyms search where: 
> 
> @IndexTime
>  ignoreCase="true" expand="false" />
> @QueryTime
>  ignoreCase="true" expand="false" />
> 
> And with a customised synonyms file that looks like:
> 
> SYN_ID_1, club, bar, night cabaret
> 
> So for our document containing "club", the synonym filter at index time
> with
> expand=false will replace every matching token/expression in the document
> with the SYN_ID_1.
> 
> And at query time, when an user search for "night", since "night" is not
> alone in synonyms definition, it will not be matched, even by "normal"
> search, because every document containing "club" or "bar" would have been
> "enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the
> final indexed document will not contains isolated token from synonyms
> expression that risks to be matched later without notice.
> 
> In order to match our document containing "club", the user HAVE TO type
> the
> entire expression "night cabaret", and not only part of the expression.
> 
> 
> Of course, as I said before, this field was exclusively used for synonym
> matching, so it requires another field for normal full-text-stemmed search
> to add normal results, this approach give us the opportunity to setup
> Boosting separately on full-text-stemmed search VS synonyms search, let's
> say :
> 
> "title_stem":"club"^100 OR "title_syns":"club"^10
> 
> I hope to have been clear, even if I dont believe to.. Fact is this
> approach have fixed your problem, since we didn't what synonym matching if
> the user only types part of synonymic expression.
> 
> Regards,
> Laurent
> 
> 

This has seemed to solve our problem. Thank you very much for your help. 
Once we have our environment setup and all of our data indexed, it may even
provide an extra 'bonus' to be able to add different weights/boosts for the
different fields.

Now, not to be too greedy, but I am wondering if there is a way to utilize
this technique for "Explicit synonym matching" (i.e. synonym mappings that
use the '=>' operator).  For example, we may have a couple mappings like the
following:
night club=>club, bar
swim club=>club, team

As you can see, both night clubs and swim clubs are clubs, but are not
necessarily equivalent with the term "club".  It would be nice to be able to
search for "night club" and only see results for "clubs" and "bars", but not
necessarily "teams", which otherwise, would show up in the results if we use
Equivalent synonyms.

Just wondering if you have been able to do this as well.

Again, thank you for your help!

-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18703520.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske


That sounds interesting. Let me explain my situation, which may be a variant
of what you are proposing. My documents contain more than 10.000 fields, but
these fields are divided like: 

1. about 20 general purpose fields, of which more than 1 can be selected in
a query. 
2. about 10.000 fields of which each query based on some criteria exactly
selects one field. 

Obviously 2. is killing me here, but given the above perhaps it would be
possible to make 10.000 vertical slices/ indices, and based on the field to
be selected (from point 2) select the slice/index to search in. 
The 10.000 indices would run on the same box, and the 20 general purpose
fields have have to be copied to all slices (which means some increase in
overall index size, but managable), but this would give me far more
reasonable sized and compact documents, which would mean (documents are far
more likely to be in the same cached slot, and be accessed in the same disk
-seek. 

Does this make sense? Am I correct that this has nothing to do with
Distributed search, since that really is all about horizontal splitting /
sharding of the index, and what I'm suggesting is splitting vertically? Is
there some other part of Solr that I can use for this, or would it be all
home-grown?

Thanks,
Britske


Mike Klaas wrote:
> 
> Another possibility is to partition the stored fields into a  
> frequently-accessed set and a full set.  If the frequently-accessed  
> set is significantly smaller (in terms of # bytes), then the documents  
> will be tightly-packed on disk and the os caching will be much more  
> effective given the same amount of ram.
> 
> The situation you are experiencing is one-seek-per-doc, which is  
> performance death.
> 
> -Mike
> 
> On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote:
> 
>> That's a bit too tight to have *all* of the index cached...your best
>> bet is to go to 4GB+, or figure out a way not to have to retrieve so
>> many stored fields.
>>
>> -Yonik
>>
>> On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote:
>>>
>>> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that  
>>> matters)
>>> Physical RAM is 2 GB with -Xmx800M set to Solr.
>>>
>>>
>>> Yonik Seeley wrote:

 That high of a difference is due to the part of the index containing
 these particular stored fields not being in OS cache.  What's the  
 size
 on disk of your index compared to your physical RAM?

 -Yonik

 On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote:
>
> Hi all,
>
> For some queries I need to return a lot of rows at once (say 100).
> When performing these queries I notice a big difference between  
> qTime
> (which
> is mostly in the 15-30 ms range due to caching) and total time  
> taken to
> return the response (measured through SolrJ's elapsedTime), which  
> takes
> between 500-1600 ms.
>
> For queries which return less rows the difference becomes less big.
>
> I presume (after reading some threads in the past) that this is  
> due to
> solr
> constructing and streaming the response (which includes  
> retrieving the
> stored fields) , which is something that is not calculated in  
> qTime.
>
> Documents have a lot of stored fields (more than 10.000), but at  
> any
> given
> query a maximum of say 20 are returned (through fl-field ) or  
> used (as
> part
> of filtering, faceting, sorting)
>
> I would have thought that enabling enableLazyFieldLoading for this
> situation
> would mean a lot, since so many stored fields can be skipped, but I
> notice
> no real difference in measuring total elapsed time (or qTime for  
> that
> matter).
>
> Am I missing something here? What criteria would need to be met  
> for a
> field
> to not be loaded for instance? Should I see a big performance  
> boost in
> this
> situation?
>
> Thanks,
> Britske
> --
> View this message in context:
> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18706099.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: nested data structure definition

2008-07-28 Thread matt connolly


In my site, I have a document, which may have multiple comments.

For each comment, I would like to know several pieces of information, like:
text, author, and date.

-Matt


Shalin Shekhar Mangar wrote:
> 
> Hi Ranjeet,
> 
> Solr supports multi-valued fields and you can always denormalize your
> data.
> Can you give more details on the problem you are trying to solve?
> 
> On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet
> <[EMAIL PROTECTED]>wrote:
> 
>> Hi,
>>
>> Can we defined nested data structure in schema.xml for searching? is it
>> prossible or not?
>>
> 
-- 
View this message in context: 
http://www.nabble.com/nested-data-structure-definition-tp18687164p18706307.html
Sent from the Solr - User mailing list archive at Nabble.com.

45 matches

Mail list logo