Exact match
Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web 2.0" OR "Social Networking")) But in the results I am getting stories matching "Social", "Web" etc. Please let me know what's going wrong. Thanks, Sunil
Re: Exact match
Look at what Solr returns when adding &debugQuery=true for the parsed query, and also consider how your fields are analyzed (their associated type, etc). Erik On Jul 28, 2008, at 4:56 AM, Sunil wrote: Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web 2.0" OR "Social Networking")) But in the results I am getting stories matching "Social", "Web" etc. Please let me know what's going wrong. Thanks, Sunil
RE: Exact match
Both the fields are "text" type: How "&debugQuery=true" will help? I am not familiar with the output. Thanks, Sunil -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, July 28, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Exact match Look at what Solr returns when adding &debugQuery=true for the parsed query, and also consider how your fields are analyzed (their associated type, etc). Erik On Jul 28, 2008, at 4:56 AM, Sunil wrote: > Hi, > > I am sending a request to solr for exact match. > > Example: (title:("Web 2.0" OR "Social Networking") OR description: > ("Web > 2.0" OR "Social Networking")) > > > But in the results I am getting stories matching "Social", "Web" etc. > > Please let me know what's going wrong. > > Thanks, > Sunil >
Re: Exact match
On Jul 28, 2008, at 5:31 AM, Sunil wrote: Both the fields are "text" type: The definition of the field type is important - perhaps it is stripping "2.0"? You can find out by using Solr analysis.jsp (see the Solr admin area in your installation). How "&debugQuery=true" will help? I am not familiar with the output. It provides, among other things, a parsed query and a toString of the query - both are useful in troubleshooting issues with queries not doing what you expect. Couple that output with the analysis.jsp information and you should have the reason. An exact match of an analyzed field is not generally possible - it'll be a phrase match, but not necessarily only matching strings that were fed in exactly as the values you're querying on. Erik Thanks, Sunil -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, July 28, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Exact match Look at what Solr returns when adding &debugQuery=true for the parsed query, and also consider how your fields are analyzed (their associated type, etc). Erik On Jul 28, 2008, at 4:56 AM, Sunil wrote: Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web 2.0" OR "Social Networking")) But in the results I am getting stories matching "Social", "Web" etc. Please let me know what's going wrong. Thanks, Sunil
nested data structure definition
Hi, Can we defined nested data structure in schema.xml for searching? is it prossible or not? Thanks & Regards, Ranjeet Jha
Re: nested data structure definition
Hi Ranjeet, Solr supports multi-valued fields and you can always denormalize your data. Can you give more details on the problem you are trying to solve? On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED]>wrote: > Hi, > > Can we defined nested data structure in schema.xml for searching? is it > prossible or not? > > > > Thanks & Regards, > Ranjeet Jha -- Regards, Shalin Shekhar Mangar.
Re: nested data structure definition
Hi, In our case there is Category object under Catalog object, so I do not want to defined the data structure for the Category. I want to give the reference of Category uder Catalog, how can I do this. Regards, Ranjeet - Original Message - From: "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> To: Sent: Monday, July 28, 2008 3:55 PM Subject: Re: nested data structure definition Hi Ranjeet, Solr supports multi-valued fields and you can always denormalize your data. Can you give more details on the problem you are trying to solve? On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED]>wrote: Hi, Can we defined nested data structure in schema.xml for searching? is it prossible or not? Thanks & Regards, Ranjeet Jha -- Regards, Shalin Shekhar Mangar.
Re: nested data structure definition
Hi, In Solr there is no hierarchy of objects. De-normalize everything into one schema using multi-valued fields where applicable. Decide on what the document should be. What do you want to return as individual results -- are they catalogs or categories? You can get more help if you give an example of what you are trying to achieve. On Mon, Jul 28, 2008 at 4:18 PM, Ranjeet <[EMAIL PROTECTED]>wrote: > Hi, > > In our case there is Category object under Catalog object, so I do not want > to defined the data structure for the Category. I want to give the reference > of Category uder Catalog, how can I do this. > > > Regards, > Ranjeet > - Original Message - From: "Shalin Shekhar Mangar" < > [EMAIL PROTECTED]> > To: > Sent: Monday, July 28, 2008 3:55 PM > Subject: Re: nested data structure definition > > > > Hi Ranjeet, >> >> Solr supports multi-valued fields and you can always denormalize your >> data. >> Can you give more details on the problem you are trying to solve? >> >> On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED] >> >wrote: >> >> Hi, >>> >>> Can we defined nested data structure in schema.xml for searching? is it >>> prossible or not? >>> >>> >>> >>> Thanks & Regards, >>> Ranjeet Jha >>> >> >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> >> > -- Regards, Shalin Shekhar Mangar.
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Shalin - yes the allfields field exists in my schema.xml file. It is a field that has all of the text from all of the fields concatenated together into one field. My spellCheckIndexDir is created and has 2 segment files, but I think the index is empty. When I initiate the 1st spellcheck.build=true ... the results load immediately ... I would imagine some time delay as it builds the index. Any other ideas? Andrew > -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Friday, July 25, 2008 3:35 PM > To: solr-user@lucene.apache.org > Subject: Re: Multiple search components in one handler - ie > spellchecker > > On Sat, Jul 26, 2008 at 12:37 AM, Andrew Nagy > <[EMAIL PROTECTED]> > wrote: > > > Exactly - however the spellcheck component is not working for my > setup. > > The spelling suggestions never show in the response. I think I have > the > > solrconfig setup incorrectly. Also my solr/data/spell index that is > created > > is empty. Something is not configured correctly, any ideas? > > > > Andrew > > > > From: Geoffrey Young [EMAIL PROTECTED] > > Sent: Friday, July 25, 2008 3:04 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Multiple search components in one handler - ie > spellchecker > > > > Andrew Nagy wrote: > > > Thanks for getting back to me Geoff. Although, that is pretty much > > > what I have. Maybe if I show my solrconfig someone might be able > to > > > point out what I have incorrect? The problem is that nothing > related > > > to the spelling options are show in the results, just the normal > > > expected search results. > > > > right. the spellcheck component does not issue a separate query > *after* > > running the spellcheck, it merely offers suggestions in parallel with > > your existing query. > > > > the results are more like > > > > "below are the results for $query. did you mean $suggestions?" > > > > HTH > > > > --Geoff > > > > > > > Is the allfields in your spell checker configuration in your > schema.xml? Can > you see the spellcheckIndexDir created inside the Solr's data > directory? > > -- > Regards, > Shalin Shekhar Mangar.
Re: Unsure about omitNorms, termVectors...
On Jul 24, 2008, at 9:48 AM, Fuad Efendi wrote: Hi, It's unclear... found in schema.xml: omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms. termVectors: [false] set to true to store the term vector for a given field. When using MoreLikeThis, fields used for similarity should be stored for best performance. Questions: omitNorms: do I need it for full-text fields even if I don't need index-time boosting? I don't want to boost text where keyword repeated several time. Is my understanding correct? I'm not sure what you are asking Do you mean you don't want term frequency factored in or you don't want length normalization and document/field boosting factored in? termVectors: do I need it for MoreLikeThis only? They can help speed up MLT, but are not required. If they are not available, than MLT has to re-analyze the field. What are memory requirements for Lucene caches warming up if I use term vectors and norms? I don't believe Term Vectors are cached anywhere, other than via the OS. I'd have to go dig around for norms info, or maybe someone else can chime in. -Grant
Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Can you show us the query you are issuing? Make sure you add spellcheck=true to the query as a parameter to turn on spell checking. On Mon, Jul 28, 2008 at 6:16 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > Shalin - yes the allfields field exists in my schema.xml file. It is a > field that has all of the text from all of the fields concatenated together > into one field. > > My spellCheckIndexDir is created and has 2 segment files, but I think the > index is empty. When I initiate the 1st spellcheck.build=true ... the > results load immediately ... I would imagine some time delay as it builds > the index. > > Any other ideas? > > Andrew > > > -Original Message- > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > Sent: Friday, July 25, 2008 3:35 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Multiple search components in one handler - ie > > spellchecker > > > > On Sat, Jul 26, 2008 at 12:37 AM, Andrew Nagy > > <[EMAIL PROTECTED]> > > wrote: > > > > > Exactly - however the spellcheck component is not working for my > > setup. > > > The spelling suggestions never show in the response. I think I have > > the > > > solrconfig setup incorrectly. Also my solr/data/spell index that is > > created > > > is empty. Something is not configured correctly, any ideas? > > > > > > Andrew > > > > > > From: Geoffrey Young [EMAIL PROTECTED] > > > Sent: Friday, July 25, 2008 3:04 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Multiple search components in one handler - ie > > spellchecker > > > > > > Andrew Nagy wrote: > > > > Thanks for getting back to me Geoff. Although, that is pretty much > > > > what I have. Maybe if I show my solrconfig someone might be able > > to > > > > point out what I have incorrect? The problem is that nothing > > related > > > > to the spelling options are show in the results, just the normal > > > > expected search results. > > > > > > right. the spellcheck component does not issue a separate query > > *after* > > > running the spellcheck, it merely offers suggestions in parallel with > > > your existing query. > > > > > > the results are more like > > > > > > "below are the results for $query. did you mean $suggestions?" > > > > > > HTH > > > > > > --Geoff > > > > > > > > > > > Is the allfields in your spell checker configuration in your > > schema.xml? Can > > you see the spellcheckIndexDir created inside the Solr's data > > directory? > > > > -- > > Regards, > > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
> -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 10:09 AM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > Can you show us the query you are issuing? Make sure you add > spellcheck=true > to the query as a parameter to turn on spell checking. http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=scandanava&spellcheck.build=true Shows this: 0 73 ... Andrew
Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Hi Andrew, Your configuration which you specified in the earlier thread looks fine. Your query is also ok. The complete lack of spell check results in the response you pasted suggests that the SpellCheckComponent is not added to the SearchHandler's list of components. Can you check your solrconfig.xml again? I'm sorry but it doesn't seem like a problem with the spell checker itself. Also check if there are any exceptions in the Solr log/console. On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > > -Original Message- > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > Sent: Monday, July 28, 2008 10:09 AM > > To: solr-user@lucene.apache.org > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > components in one handler - ie spellchecker) > > > > Can you show us the query you are issuing? Make sure you add > > spellcheck=true > > to the query as a parameter to turn on spell checking. > > > http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=scandanava&spellcheck.build=true > > Shows this: > > > 0 > 73 > > > ... > > > > Andrew > -- Regards, Shalin Shekhar Mangar.
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
I was just reviewing the solr logs and I noticed the following: Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.SpellCheckComponent' It looks like the SpellCheckComponent is not getting loaded. What could cause this? Im running the july25 nightly build. Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir: -rw-r--r-- 1 root root 84199 Jul 25 08:14 apache-solr-common-nightly.jar -rw-r--r-- 1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar -rw-r--r-- 1 root root 46725 May 10 2007 commons-codec-1.3.jar -rw-r--r-- 1 root root 22017 Jan 6 2008 commons-csv-1.0-SNAPSHOT-r609327.jar -rw-r--r-- 1 root root 53082 Mar 1 2007 commons-fileupload-1.2.jar -rw-r--r-- 1 root root 305001 Sep 11 2007 commons-httpclient-3.1.jar -rw-r--r-- 1 root root 83613 Jun 15 2007 commons-io-1.3.1.jar -rw-r--r-- 1 root root 38015 Jun 14 2007 commons-logging-1.0.4.jar -rw-r--r-- 1 root root 249154 Sep 21 2007 junit-4.3.jar -rw-r--r-- 1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-dev.jar -rw-r--r-- 1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar -rw-r--r-- 1 root root 87390 Jun 19 13:46 lucene-highlighter-2.4-dev.jar -rw-r--r-- 1 root root 32693 Jun 19 13:46 lucene-queries-2.4-dev.jar -rw-r--r-- 1 root root 91029 Jun 19 13:46 lucene-snowball-2.4-dev.jar -rw-r--r-- 1 root root 18422 Jun 19 13:46 lucene-spellchecker-2.4-dev.jar -rw-r--r-- 1 root root 179348 Jun 14 2007 stax-1.2.0-dev.jar -rw-r--r-- 1 root root 25863 Jun 14 2007 stax-api-1.0.jar -rw-r--r-- 1 root root 128475 Jun 14 2007 stax-utils.jar could I be missing a jar? Thanks Andrew > -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 11:24 AM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > Hi Andrew, > > Your configuration which you specified in the earlier thread looks > fine. > Your query is also ok. The complete lack of spell check results in the > response you pasted suggests that the SpellCheckComponent is not added > to > the SearchHandler's list of components. > > Can you check your solrconfig.xml again? I'm sorry but it doesn't seem > like > a problem with the spell checker itself. Also check if there are any > exceptions in the Solr log/console. > > On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy > <[EMAIL PROTECTED]>wrote: > > > > -Original Message- > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > > Sent: Monday, July 28, 2008 10:09 AM > > > To: solr-user@lucene.apache.org > > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > > components in one handler - ie spellchecker) > > > > > > Can you show us the query you are issuing? Make sure you add > > > spellcheck=true > > > to the query as a parameter to turn on spell checking. > > > > > > > http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc > andanava&spellcheck.build=true > > > > Shows this: > > > > > > 0 > > 73 > > > > > > ... > > > > > > > > Andrew > > > > > > -- > Regards, > Shalin Shekhar Mangar.
RE: solr synonyms behaviour
Hi, I was faced with the same issues reguarding multiwords synonyms Let's say a synonyms list like: club, bar, night cabaret Now if we have a document containing "club", with the default synonyms filter behaviour with expand=true, we will end up in the lucene index with a document containing "club|bar|night cabaret". So if the user search for "night", the query-time will search for "night" in the index and will match our document since it had been "enriched" @ index-time, and it really contains the token "night". The only valid solution I've founded was to create a field-type exclusively used for synonyms search where: @IndexTime @QueryTime And with a customised synonyms file that looks like: SYN_ID_1, club, bar, night cabaret So for our document containing "club", the synonym filter at index time with expand=false will replace every matching token/expression in the document with the SYN_ID_1. And at query time, when an user search for "night", since "night" is not alone in synonyms definition, it will not be matched, even by "normal" search, because every document containing "club" or "bar" would have been "enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the final indexed document will not contains isolated token from synonyms expression that risks to be matched later without notice. In order to match our document containing "club", the user HAVE TO type the entire expression "night cabaret", and not only part of the expression. Of course, as I said before, this field was exclusively used for synonym matching, so it requires another field for normal full-text-stemmed search to add normal results, this approach give us the opportunity to setup Boosting separately on full-text-stemmed search VS synonyms search, let's say : "title_stem":"club"^100 OR "title_syns":"club"^10 I hope to have been clear, even if I dont believe to.. Fact is this approach have fixed your problem, since we didn't what synonym matching if the user only types part of synonymic expression. Regards, Laurent -Message d'origine- De : swarag [mailto:[EMAIL PROTECTED] Envoyé : vendredi 25 juillet 2008 23:48 À : solr-user@lucene.apache.org Objet : Re: solr synonyms behaviour swarag wrote: > > > Yonik Seeley wrote: >> >> On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> >> wrote: >>> To my understanding, this means I am using synonyms at index time and >>> NOT >>> query time. And yet, I am still having these problems with synonyms. >> >> Can you give a specific example? Use debugQuery=true to see what the >> resulting query is. >> You can also use the admin analysis page to see what the output of the >> index and query analyzers. >> >> -Yonik >> >> > > So it sounds like using the '=>' operator for synonyms that may or may not > contain multiple words causes problems. So I changed my synonyms.txt to > the following: > > club,bar,night cabaret > > In schema.xml, I now have the following: > positionIncrementGap="100"> > > > ignoreCase="true" expand="true"/> > words="stopwords.txt" enablePositionIncrements="true"/> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > words="stopwords.txt"/> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > As you can see, 'night cabaret' is my only multi-word synonym term. > Searches for 'bar' and 'club' now behave as expected. However, if I > search for JUST 'night' or JUST 'cabaret', it looks like it is still using > the synonyms 'bar' and 'club', which is not what is desired. I only want > 'bar' and 'club' to be returned if a search for the complete 'night > cabaret' is submitted. > > Since query-time synonyms is turned "off", the resulting > parsedquery_toString is simply "name:night", "name:cabaret", etc... > > Thanks! > We are still having problems. Searches for single words that are part of a multi-word synonym seem to be affected by the synonyms, when they should not. Anyone else experience this? If not, would you mind explaining your config and the format of your synonyms.txt file? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18660135.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
No, SpellCheckComponent was in the nightly long before July 25. There must be a stack trace after that error message. Can you post that? On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > I was just reviewing the solr logs and I noticed the following: > > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'org.apache.solr.handler.component.SpellCheckComponent' > > It looks like the SpellCheckComponent is not getting loaded. What could > cause this? Im running the july25 nightly build. > > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir: > -rw-r--r-- 1 root root 84199 Jul 25 08:14 apache-solr-common-nightly.jar > -rw-r--r-- 1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar > -rw-r--r-- 1 root root 46725 May 10 2007 commons-codec-1.3.jar > -rw-r--r-- 1 root root 22017 Jan 6 2008 > commons-csv-1.0-SNAPSHOT-r609327.jar > -rw-r--r-- 1 root root 53082 Mar 1 2007 commons-fileupload-1.2.jar > -rw-r--r-- 1 root root 305001 Sep 11 2007 commons-httpclient-3.1.jar > -rw-r--r-- 1 root root 83613 Jun 15 2007 commons-io-1.3.1.jar > -rw-r--r-- 1 root root 38015 Jun 14 2007 commons-logging-1.0.4.jar > -rw-r--r-- 1 root root 249154 Sep 21 2007 junit-4.3.jar > -rw-r--r-- 1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4-dev.jar > -rw-r--r-- 1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar > -rw-r--r-- 1 root root 87390 Jun 19 13:46 lucene-highlighter-2.4-dev.jar > -rw-r--r-- 1 root root 32693 Jun 19 13:46 lucene-queries-2.4-dev.jar > -rw-r--r-- 1 root root 91029 Jun 19 13:46 lucene-snowball-2.4-dev.jar > -rw-r--r-- 1 root root 18422 Jun 19 13:46 lucene-spellchecker-2.4-dev.jar > -rw-r--r-- 1 root root 179348 Jun 14 2007 stax-1.2.0-dev.jar > -rw-r--r-- 1 root root 25863 Jun 14 2007 stax-api-1.0.jar > -rw-r--r-- 1 root root 128475 Jun 14 2007 stax-utils.jar > > could I be missing a jar? > > Thanks > Andrew > > > -Original Message- > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > Sent: Monday, July 28, 2008 11:24 AM > > To: solr-user@lucene.apache.org > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > components in one handler - ie spellchecker) > > > > Hi Andrew, > > > > Your configuration which you specified in the earlier thread looks > > fine. > > Your query is also ok. The complete lack of spell check results in the > > response you pasted suggests that the SpellCheckComponent is not added > > to > > the SearchHandler's list of components. > > > > Can you check your solrconfig.xml again? I'm sorry but it doesn't seem > > like > > a problem with the spell checker itself. Also check if there are any > > exceptions in the Solr log/console. > > > > On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy > > <[EMAIL PROTECTED]>wrote: > > > > > > -Original Message- > > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > > > Sent: Monday, July 28, 2008 10:09 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > > > components in one handler - ie spellchecker) > > > > > > > > Can you show us the query you are issuing? Make sure you add > > > > spellcheck=true > > > > to the query as a parameter to turn on spell checking. > > > > > > > > > > > http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc > > andanava&spellcheck.build=true > > > > > > Shows this: > > > > > > > > > 0 > > > 73 > > > > > > > > > ... > > > > > > > > > > > > Andrew > > > > > > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Hmm ... sorry, that was the output of a java program that uses solr that I ran and noticed the error. That error doesn't happen when I start solr. Sorry for the confusion. I just changed my schema to have a dedicated field for spelling called "spelling" and I created a new field type for the spellcheck component called "textSpell". Here is the segment of my solrconfig.xml: spelling 0.7 ./spellchecker textSpell explicit spellcheck I will need to reindex my documents again - I will check to see if that has any effect on my problem. Andrew > -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 12:07 PM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > No, SpellCheckComponent was in the nightly long before July 25. There > must > be a stack trace after that error message. Can you post that? > > On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy > <[EMAIL PROTECTED]>wrote: > > > I was just reviewing the solr logs and I noticed the following: > > > > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log > > SEVERE: org.apache.solr.common.SolrException: Error loading class > > 'org.apache.solr.handler.component.SpellCheckComponent' > > > > It looks like the SpellCheckComponent is not getting loaded. What > could > > cause this? Im running the july25 nightly build. > > > > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir: > > -rw-r--r-- 1 root root 84199 Jul 25 08:14 apache-solr-common- > nightly.jar > > -rw-r--r-- 1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar > > -rw-r--r-- 1 root root 46725 May 10 2007 commons-codec-1.3.jar > > -rw-r--r-- 1 root root 22017 Jan 6 2008 > > commons-csv-1.0-SNAPSHOT-r609327.jar > > -rw-r--r-- 1 root root 53082 Mar 1 2007 commons-fileupload- > 1.2.jar > > -rw-r--r-- 1 root root 305001 Sep 11 2007 commons-httpclient- > 3.1.jar > > -rw-r--r-- 1 root root 83613 Jun 15 2007 commons-io-1.3.1.jar > > -rw-r--r-- 1 root root 38015 Jun 14 2007 commons-logging-1.0.4.jar > > -rw-r--r-- 1 root root 249154 Sep 21 2007 junit-4.3.jar > > -rw-r--r-- 1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4- > dev.jar > > -rw-r--r-- 1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar > > -rw-r--r-- 1 root root 87390 Jun 19 13:46 lucene-highlighter-2.4- > dev.jar > > -rw-r--r-- 1 root root 32693 Jun 19 13:46 lucene-queries-2.4- > dev.jar > > -rw-r--r-- 1 root root 91029 Jun 19 13:46 lucene-snowball-2.4- > dev.jar > > -rw-r--r-- 1 root root 18422 Jun 19 13:46 lucene-spellchecker-2.4- > dev.jar > > -rw-r--r-- 1 root root 179348 Jun 14 2007 stax-1.2.0-dev.jar > > -rw-r--r-- 1 root root 25863 Jun 14 2007 stax-api-1.0.jar > > -rw-r--r-- 1 root root 128475 Jun 14 2007 stax-utils.jar > > > > could I be missing a jar? > > > > Thanks > > Andrew > > > > > -Original Message- > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > > Sent: Monday, July 28, 2008 11:24 AM > > > To: solr-user@lucene.apache.org > > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > > components in one handler - ie spellchecker) > > > > > > Hi Andrew, > > > > > > Your configuration which you specified in the earlier thread looks > > > fine. > > > Your query is also ok. The complete lack of spell check results in > the > > > response you pasted suggests that the SpellCheckComponent is not > added > > > to > > > the SearchHandler's list of components. > > > > > > Can you check your solrconfig.xml again? I'm sorry but it doesn't > seem > > > like > > > a problem with the spell checker itself. Also check if there are > any > > > exceptions in the Solr log/console. > > > > > > On Mon, Jul 28, 2008 at 8:32 PM, Andrew Nagy > > > <[EMAIL PROTECTED]>wrote: > > > > > > > > -Original Message- > > > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > > > > Sent: Monday, July 28, 2008 10:09 AM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > > > > components in one handler - ie spellchecker) > > > > > > > > > > Can you show us the query you are issuing? Make sure you add > > > > > spellcheck=true > > > > > to the query as a parameter to turn on spell checking. > > > > > > > > > > > > > > > > http://localhost:8080/solr/select?q=*:*&spellcheck=true&spellcheck.q=sc > > > andanava&spellcheck.build=true > > > > > > > > Shows this: > > > > > > > > > > > > 0 > > > > 73 > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > -- > > > Regards, > > > Shalin Shekhar Mangar. > > > > > > -- > Regards, > Shalin Shekhar Mangar.
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Well I will include the stack trace for the aforementioned error: Jul 28, 2008 12:20:17 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.SpellCheckComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:227) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:232) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:565) at org.apache.solr.core.SolrCore.(SolrCore.java:371) at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559) Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.component.SpellCheckComponent at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:580) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:211) ... 7 more The line 95 of MarcImporter.java (the solr import program I am using) is the instantiation of SolrCore. So maybe somehow the spellCheckComponent is not getting loaded? This is the error output I get thrown by instantiating SolrCore: org.apache.solr.common.SolrException: Unknown Search Component: spellcheck at org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:597) at org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:107) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:264) at org.apache.solr.core.SolrCore.(SolrCore.java:398) at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559) Andrew > -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 12:07 PM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > No, SpellCheckComponent was in the nightly long before July 25. There > must > be a stack trace after that error message. Can you post that? > > On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy > <[EMAIL PROTECTED]>wrote: > > > I was just reviewing the solr logs and I noticed the following: > > > > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log > > SEVERE: org.apache.solr.common.SolrException: Error loading class > > 'org.apache.solr.handler.component.SpellCheckComponent' > > > > It looks like the SpellCheckComponent is not getting loaded. What > could > > cause this? Im running the july25 nightly build. > > > > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir: > > -rw-r--r-- 1 root root 84199 Jul 25 08:14 apache-solr-common- > nightly.jar > > -rw-r--r-- 1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar > > -rw-r--r-- 1 root root 46725 May 10 2007 commons-codec-1.3.jar > > -rw-r--r-- 1 root root 22017 Jan 6 2008 > > commons-csv-1.0-SNAPSHOT-r609327.jar > > -rw-r--r-- 1 root root 53082 Mar 1 2007 commons-fileupload- > 1.2.jar > > -rw-r--r-- 1 root root 305001 Sep 11 2007 commons-httpclient- > 3.1.jar > > -rw-r--r-- 1 root root 83613 Jun 15 2007 commons-io-1.3.1.jar > > -rw-r--r-- 1 root root 38015 Jun 14 2007 commons-logging-1.0.4.jar > > -rw-r--r-- 1 root root 249154 Sep 21 2007 junit-4.3.jar > > -rw-r--r-- 1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4- > dev.jar > > -rw-r--r-- 1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar > > -rw-r--r-- 1 root root 87390 Jun 19 13:46 lucene-highlighter-2.4- > dev.jar > > -rw-r--r-- 1 root root 32693 Jun 19 13:46 lucene-queries-2.4- > dev.jar > > -rw-r--r-- 1 root root 91029 Jun 19 13:46 lucene-snowball-2.4- > dev.jar > > -rw-r--r-- 1 root root 18422 Jun 19 13:46 lucene-spellchecker-2.4- > dev.jar > > -rw-r--r-- 1 root root 179348 Jun 14 2007 stax-1.2.0-dev.jar > > -rw-r--r-- 1 root root 25863 Jun 14 2007 stax-api-1.0.jar > > -rw-r--r-- 1 root root 128475 Jun 14 2007 stax-utils.jar > > > > could I be missing a jar? > > > > Thanks > > Andrew > > > > > -Original Message- > > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > > Sent: Monday, July 28, 20
Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
Well that means the nightly solr jar you are using is older than you think it is. Try running solr normally without the program and see if you can get it working. On Mon, Jul 28, 2008 at 9:54 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > Well I will include the stack trace for the aforementioned error: > > Jul 28, 2008 12:20:17 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Error loading class > 'org.apache.solr.handler.component.SpellCheckComponent' > at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:227) >at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:232) >at > org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83) >at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) >at > org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:565) >at org.apache.solr.core.SolrCore.(SolrCore.java:371) >at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95) >at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559) > Caused by: java.lang.ClassNotFoundException: > org.apache.solr.handler.component.SpellCheckComponent >at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >at java.security.AccessController.doPrivileged(Native Method) >at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >at java.lang.ClassLoader.loadClass(ClassLoader.java:306) >at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:580) >at java.lang.ClassLoader.loadClass(ClassLoader.java:251) >at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) >at java.lang.Class.forName0(Native Method) >at java.lang.Class.forName(Class.java:242) >at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:211) >... 7 more > > The line 95 of MarcImporter.java (the solr import program I am using) is > the instantiation of SolrCore. So maybe somehow the spellCheckComponent is > not getting loaded? > > This is the error output I get thrown by instantiating SolrCore: > org.apache.solr.common.SolrException: Unknown Search Component: spellcheck >at > org.apache.solr.core.SolrCore.getSearchComponent(SolrCore.java:597) >at > org.apache.solr.handler.component.SearchHandler.inform(SearchHandler.java:107) >at > org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:264) >at org.apache.solr.core.SolrCore.(SolrCore.java:398) >at org.solrmarc.marc.MarcImporter.(MarcImporter.java:95) >at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:559) > > Andrew > > > -Original Message- > > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > > Sent: Monday, July 28, 2008 12:07 PM > > To: solr-user@lucene.apache.org > > Subject: Re: SpellCheckComponent problems (was: Multiple search > > components in one handler - ie spellchecker) > > > > No, SpellCheckComponent was in the nightly long before July 25. There > > must > > be a stack trace after that error message. Can you post that? > > > > On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy > > <[EMAIL PROTECTED]>wrote: > > > > > I was just reviewing the solr logs and I noticed the following: > > > > > > Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log > > > SEVERE: org.apache.solr.common.SolrException: Error loading class > > > 'org.apache.solr.handler.component.SpellCheckComponent' > > > > > > It looks like the SpellCheckComponent is not getting loaded. What > > could > > > cause this? Im running the july25 nightly build. > > > > > > Here is a list of the libs from my /tmp/jetty/webapp/WEB-INF/lib dir: > > > -rw-r--r-- 1 root root 84199 Jul 25 08:14 apache-solr-common- > > nightly.jar > > > -rw-r--r-- 1 root root 889903 Jul 25 08:14 apache-solr-nightly.jar > > > -rw-r--r-- 1 root root 46725 May 10 2007 commons-codec-1.3.jar > > > -rw-r--r-- 1 root root 22017 Jan 6 2008 > > > commons-csv-1.0-SNAPSHOT-r609327.jar > > > -rw-r--r-- 1 root root 53082 Mar 1 2007 commons-fileupload- > > 1.2.jar > > > -rw-r--r-- 1 root root 305001 Sep 11 2007 commons-httpclient- > > 3.1.jar > > > -rw-r--r-- 1 root root 83613 Jun 15 2007 commons-io-1.3.1.jar > > > -rw-r--r-- 1 root root 38015 Jun 14 2007 commons-logging-1.0.4.jar > > > -rw-r--r-- 1 root root 249154 Sep 21 2007 junit-4.3.jar > > > -rw-r--r-- 1 root root 115101 Jun 19 13:46 lucene-analyzers-2.4- > > dev.jar > > > -rw-r--r-- 1 root root 730352 Jun 19 13:46 lucene-core-2.4-dev.jar > > > -rw-r--r-- 1 root root 87390 Jun 19 13:46 lucene-highlighter-2.4- > > dev.jar > > > -rw-r--r-- 1 root root 32693 Jun 19 13:46 lucene-queries-2.4- > > dev.jar > > > -rw-r--r-- 1 root root 91029 Jun 19 13:46 lucene-snowball-2.4- > > dev.jar > > > -rw-r--r-- 1 root root 18422 Jun 19 13:46 lucene-spellchecker-2.4-
RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)
> -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 12:38 PM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > Well that means the nightly solr jar you are using is older than you > think > it is. Try running solr normally without the program and see if you can > get > it working. Well my import program has an older copy of the solr libs ... so we can ignore that problem. However my problem still stands when I run solr normally from my July25 snapshot. There are no errors - and no output to the solr logs when I post a query. Have you or anyone been able to successfully add the spellcheckcomponent to the default select searchhandler? Thanks Andrew
Unsynchronized FIFOCache - 9x times performance boost on 8-CPU system
Please see discussion at http://issues.apache.org/jira/browse/SOLR-665 Very simple: map = new LinkedHashMap(initialSize, 0.75f, true) - LRU Cache (and we need synchronized get()) map = new LinkedHashMap(initialSize, 0.75f, false) - FIFO (and we do not need synchronized get()) -- Thanks, Fuad Efendi http://www.linkedin.com/in/liferay
RE: nested data structure definition
If you want to think of Solr in database terms, it has only one table. The fields in this table have very flexible type definitions. There can be many optional fields. They also can have various indexes which used together can search text in useful ways. If you want to model multiple tables, you have to denormalize them into one. The optional fields feature can be useful here. Lance -Original Message- From: Ranjeet [mailto:[EMAIL PROTECTED] Sent: Monday, July 28, 2008 3:48 AM To: solr-user@lucene.apache.org Subject: Re: nested data structure definition Hi, In our case there is Category object under Catalog object, so I do not want to defined the data structure for the Category. I want to give the reference of Category uder Catalog, how can I do this. Regards, Ranjeet - Original Message - From: "Shalin Shekhar Mangar" <[EMAIL PROTECTED]> To: Sent: Monday, July 28, 2008 3:55 PM Subject: Re: nested data structure definition > Hi Ranjeet, > > Solr supports multi-valued fields and you can always denormalize your > data. > Can you give more details on the problem you are trying to solve? > > On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet > <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> Can we defined nested data structure in schema.xml for searching? is it >> prossible or not? >> >> >> >> Thanks & Regards, >> Ranjeet Jha > > > > > -- > Regards, > Shalin Shekhar Mangar. >
Multiple Update servers
Hi, we are currently evaluating Solr and have been browsing the archives for one particular issue but can¹t seem to find the answer, so please forgive me if I¹m asking a repetitive question. We like the idea of having multiple slave servers serving up queries and a master performing updates. However the the issue for us there is no redundancy for the master. So a couple of questions: 1. Can there be multiple masters (or update servers) sharing the same index files, performing updates at the same time (ie. Hosting the index on a SAN)? 2. Is there a recommended architecture utilizing a SAN. (For example 2 slaves and 2 masters sharing a SAN). We current don¹t have that many records prob about a million and growing. We are mainly concerned about redundancy, then performance. Thanks -Rakesh
big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Hi all, For some queries I need to return a lot of rows at once (say 100). When performing these queries I notice a big difference between qTime (which is mostly in the 15-30 ms range due to caching) and total time taken to return the response (measured through SolrJ's elapsedTime), which takes between 500-1600 ms. For queries which return less rows the difference becomes less big. I presume (after reading some threads in the past) that this is due to solr constructing and streaming the response (which includes retrieving the stored fields) , which is something that is not calculated in qTime. Documents have a lot of stored fields (more than 10.000), but at any given query a maximum of say 20 are returned (through fl-field ) or used (as part of filtering, faceting, sorting) I would have thought that enabling enableLazyFieldLoading for this situation would mean a lot, since so many stored fields can be skipped, but I notice no real difference in measuring total elapsed time (or qTime for that matter). Am I missing something here? What criteria would need to be met for a field to not be loaded for instance? Should I see a big performance boost in this situation? Thanks, Britske -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
That high of a difference is due to the part of the index containing these particular stored fields not being in OS cache. What's the size on disk of your index compared to your physical RAM? -Yonik On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: > > Hi all, > > For some queries I need to return a lot of rows at once (say 100). > When performing these queries I notice a big difference between qTime (which > is mostly in the 15-30 ms range due to caching) and total time taken to > return the response (measured through SolrJ's elapsedTime), which takes > between 500-1600 ms. > > For queries which return less rows the difference becomes less big. > > I presume (after reading some threads in the past) that this is due to solr > constructing and streaming the response (which includes retrieving the > stored fields) , which is something that is not calculated in qTime. > > Documents have a lot of stored fields (more than 10.000), but at any given > query a maximum of say 20 are returned (through fl-field ) or used (as part > of filtering, faceting, sorting) > > I would have thought that enabling enableLazyFieldLoading for this situation > would mean a lot, since so many stored fields can be skipped, but I notice > no real difference in measuring total elapsed time (or qTime for that > matter). > > Am I missing something here? What criteria would need to be met for a field > to not be loaded for instance? Should I see a big performance boost in this > situation? > > Thanks, > Britske > -- > View this message in context: > http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) Physical RAM is 2 GB with -Xmx800M set to Solr. Yonik Seeley wrote: > > That high of a difference is due to the part of the index containing > these particular stored fields not being in OS cache. What's the size > on disk of your index compared to your physical RAM? > > -Yonik > > On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: >> >> Hi all, >> >> For some queries I need to return a lot of rows at once (say 100). >> When performing these queries I notice a big difference between qTime >> (which >> is mostly in the 15-30 ms range due to caching) and total time taken to >> return the response (measured through SolrJ's elapsedTime), which takes >> between 500-1600 ms. >> >> For queries which return less rows the difference becomes less big. >> >> I presume (after reading some threads in the past) that this is due to >> solr >> constructing and streaming the response (which includes retrieving the >> stored fields) , which is something that is not calculated in qTime. >> >> Documents have a lot of stored fields (more than 10.000), but at any >> given >> query a maximum of say 20 are returned (through fl-field ) or used (as >> part >> of filtering, faceting, sorting) >> >> I would have thought that enabling enableLazyFieldLoading for this >> situation >> would mean a lot, since so many stored fields can be skipped, but I >> notice >> no real difference in measuring total elapsed time (or qTime for that >> matter). >> >> Am I missing something here? What criteria would need to be met for a >> field >> to not be loaded for instance? Should I see a big performance boost in >> this >> situation? >> >> Thanks, >> Britske >> -- >> View this message in context: >> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
That's a bit too tight to have *all* of the index cached...your best bet is to go to 4GB+, or figure out a way not to have to retrieve so many stored fields. -Yonik On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: > > Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) > Physical RAM is 2 GB with -Xmx800M set to Solr. > > > Yonik Seeley wrote: >> >> That high of a difference is due to the part of the index containing >> these particular stored fields not being in OS cache. What's the size >> on disk of your index compared to your physical RAM? >> >> -Yonik >> >> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: >>> >>> Hi all, >>> >>> For some queries I need to return a lot of rows at once (say 100). >>> When performing these queries I notice a big difference between qTime >>> (which >>> is mostly in the 15-30 ms range due to caching) and total time taken to >>> return the response (measured through SolrJ's elapsedTime), which takes >>> between 500-1600 ms. >>> >>> For queries which return less rows the difference becomes less big. >>> >>> I presume (after reading some threads in the past) that this is due to >>> solr >>> constructing and streaming the response (which includes retrieving the >>> stored fields) , which is something that is not calculated in qTime. >>> >>> Documents have a lot of stored fields (more than 10.000), but at any >>> given >>> query a maximum of say 20 are returned (through fl-field ) or used (as >>> part >>> of filtering, faceting, sorting) >>> >>> I would have thought that enabling enableLazyFieldLoading for this >>> situation >>> would mean a lot, since so many stored fields can be skipped, but I >>> notice >>> no real difference in measuring total elapsed time (or qTime for that >>> matter). >>> >>> Am I missing something here? What criteria would need to be met for a >>> field >>> to not be loaded for instance? Should I see a big performance boost in >>> this >>> situation? >>> >>> Thanks, >>> Britske >>> -- >>> View this message in context: >>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Another possibility is to partition the stored fields into a frequently-accessed set and a full set. If the frequently-accessed set is significantly smaller (in terms of # bytes), then the documents will be tightly-packed on disk and the os caching will be much more effective given the same amount of ram. The situation you are experiencing is one-seek-per-doc, which is performance death. -Mike On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote: That's a bit too tight to have *all* of the index cached...your best bet is to go to 4GB+, or figure out a way not to have to retrieve so many stored fields. -Yonik On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) Physical RAM is 2 GB with -Xmx800M set to Solr. Yonik Seeley wrote: That high of a difference is due to the part of the index containing these particular stored fields not being in OS cache. What's the size on disk of your index compared to your physical RAM? -Yonik On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: Hi all, For some queries I need to return a lot of rows at once (say 100). When performing these queries I notice a big difference between qTime (which is mostly in the 15-30 ms range due to caching) and total time taken to return the response (measured through SolrJ's elapsedTime), which takes between 500-1600 ms. For queries which return less rows the difference becomes less big. I presume (after reading some threads in the past) that this is due to solr constructing and streaming the response (which includes retrieving the stored fields) , which is something that is not calculated in qTime. Documents have a lot of stored fields (more than 10.000), but at any given query a maximum of say 20 are returned (through fl-field ) or used (as part of filtering, faceting, sorting) I would have thought that enabling enableLazyFieldLoading for this situation would mean a lot, since so many stored fields can be skipped, but I notice no real difference in measuring total elapsed time (or qTime for that matter). Am I missing something here? What criteria would need to be met for a field to not be loaded for instance? Should I see a big performance boost in this situation? Thanks, Britske -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
I'm on a development box currently and production servers will be bigger, but at the same time the index will be too. Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in this situation? I don't need to retrieve all stored fields and I thought I wasn't doing this (through limiting the fields returned using the FL-param), but if I read your comment correctly, apparently I am retrieving them all, I'm just not displaying them all? Also, if I understand correctly, for optimal performance I need to have at least enough RAM to put the entire Index size in OS cache (thus RAM) + the amount of RAM that SOLR / Lucene consumes directly through the JVM? (which among other things includes the Lucene field-cache + all of SOlr's caches on top of that). I've never read the requirement of having the entire index in OS cache before, is this because in normal situations (with less stored fields) it doesn't matter much? I'm just surprised to hear of this for the first time, since it will likely give a big impact on my design. Luckily most of the normal queries return 10 documents each, which results in a discrepancy between total elapsed time and qTIme of about 15-30 ms. Doesn't this seem strange, since to me it would seem logical that the discrepancy would be at least 1/10th of fetching 100 documents. hmm, hope you can shine some light on this, Thanks a lot, Britske Yonik Seeley wrote: > > That's a bit too tight to have *all* of the index cached...your best > bet is to go to 4GB+, or figure out a way not to have to retrieve so > many stored fields. > > -Yonik > > On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: >> >> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that >> matters) >> Physical RAM is 2 GB with -Xmx800M set to Solr. >> >> >> Yonik Seeley wrote: >>> >>> That high of a difference is due to the part of the index containing >>> these particular stored fields not being in OS cache. What's the size >>> on disk of your index compared to your physical RAM? >>> >>> -Yonik >>> >>> On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: Hi all, For some queries I need to return a lot of rows at once (say 100). When performing these queries I notice a big difference between qTime (which is mostly in the 15-30 ms range due to caching) and total time taken to return the response (measured through SolrJ's elapsedTime), which takes between 500-1600 ms. For queries which return less rows the difference becomes less big. I presume (after reading some threads in the past) that this is due to solr constructing and streaming the response (which includes retrieving the stored fields) , which is something that is not calculated in qTime. Documents have a lot of stored fields (more than 10.000), but at any given query a maximum of say 20 are returned (through fl-field ) or used (as part of filtering, faceting, sorting) I would have thought that enabling enableLazyFieldLoading for this situation would mean a lot, since so many stored fields can be skipped, but I notice no real difference in measuring total elapsed time (or qTime for that matter). Am I missing something here? What criteria would need to be met for a field to not be loaded for instance? Should I see a big performance boost in this situation? Thanks, Britske -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote: > Each query requests at most 20 stored fields. Why doesn't help > lazyfieldloading in this situation? It's the disk seek that kills you... loading 1 byte or 1000 bytes per document would be about the same speed. > Also, if I understand correctly, for optimal performance I need to have at > least enough RAM to put the entire Index size in OS cache (thus RAM) + the > amount of RAM that SOLR / Lucene consumes directly through the JVM? The normal usage is to just retrieve the stored fields for the top 10 (or a window of 10 or 20) documents. Under this scenario, the slowdown from not having all of the stored fields cached is usually acceptable. Faster disks (seek time) can also help. > Luckily most of the normal queries return 10 documents each, which results > in a discrepancy between total elapsed time and qTIme of about 15-30 ms. > Doesn't this seem strange, since to me it would seem logical that the > discrepancy would be at least 1/10th of fetching 100 documents. Yes, in general 1/10th the cost is what one would expect on average. But some of the docs you are trying to retrieve *will* be in cache, so it's hard to control this test. You could try forcing the index out of memory by "cat"ing some other big files multiple times and then re-trying or do a reboot to be sure. -Yonik
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
What version of Solr/Lucene are you using? On Jul 28, 2008, at 4:53 PM, Britske wrote: I'm on a development box currently and production servers will be bigger, but at the same time the index will be too. Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in this situation? I don't need to retrieve all stored fields and I thought I wasn't doing this (through limiting the fields returned using the FL-param), but if I read your comment correctly, apparently I am retrieving them all, I'm just not displaying them all? Also, if I understand correctly, for optimal performance I need to have at least enough RAM to put the entire Index size in OS cache (thus RAM) + the amount of RAM that SOLR / Lucene consumes directly through the JVM? (which among other things includes the Lucene field-cache + all of SOlr's caches on top of that). I've never read the requirement of having the entire index in OS cache before, is this because in normal situations (with less stored fields) it doesn't matter much? I'm just surprised to hear of this for the first time, since it will likely give a big impact on my design. Luckily most of the normal queries return 10 documents each, which results in a discrepancy between total elapsed time and qTIme of about 15-30 ms. Doesn't this seem strange, since to me it would seem logical that the discrepancy would be at least 1/10th of fetching 100 documents. hmm, hope you can shine some light on this, Thanks a lot, Britske Yonik Seeley wrote: That's a bit too tight to have *all* of the index cached...your best bet is to go to 4GB+, or figure out a way not to have to retrieve so many stored fields. -Yonik On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) Physical RAM is 2 GB with -Xmx800M set to Solr. Yonik Seeley wrote: That high of a difference is due to the part of the index containing these particular stored fields not being in OS cache. What's the size on disk of your index compared to your physical RAM? -Yonik On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: Hi all, For some queries I need to return a lot of rows at once (say 100). When performing these queries I notice a big difference between qTime (which is mostly in the 15-30 ms range due to caching) and total time taken to return the response (measured through SolrJ's elapsedTime), which takes between 500-1600 ms. For queries which return less rows the difference becomes less big. I presume (after reading some threads in the past) that this is due to solr constructing and streaming the response (which includes retrieving the stored fields) , which is something that is not calculated in qTime. Documents have a lot of stored fields (more than 10.000), but at any given query a maximum of say 20 are returned (through fl-field ) or used (as part of filtering, faceting, sorting) I would have thought that enabling enableLazyFieldLoading for this situation would mean a lot, since so many stored fields can be skipped, but I notice no real difference in measuring total elapsed time (or qTime for that matter). Am I missing something here? What criteria would need to be met for a field to not be loaded for instance? Should I see a big performance boost in this situation? Thanks, Britske -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
Thanks for clearing that up for me. I'm going to investigate some more... Yonik Seeley wrote: > > On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote: >> Each query requests at most 20 stored fields. Why doesn't help >> lazyfieldloading in this situation? > > It's the disk seek that kills you... loading 1 byte or 1000 bytes per > document would be about the same speed. > >> Also, if I understand correctly, for optimal performance I need to have >> at >> least enough RAM to put the entire Index size in OS cache (thus RAM) + >> the >> amount of RAM that SOLR / Lucene consumes directly through the JVM? > > The normal usage is to just retrieve the stored fields for the top 10 > (or a window of 10 or 20) documents. Under this scenario, the > slowdown from not having all of the stored fields cached is usually > acceptable. Faster disks (seek time) can also help. > >> Luckily most of the normal queries return 10 documents each, which >> results >> in a discrepancy between total elapsed time and qTIme of about 15-30 ms. >> Doesn't this seem strange, since to me it would seem logical that the >> discrepancy would be at least 1/10th of fetching 100 documents. > > Yes, in general 1/10th the cost is what one would expect on average. > But some of the docs you are trying to retrieve *will* be in cache, so > it's hard to control this test. > You could try forcing the index out of memory by "cat"ing some other > big files multiple times and then re-trying or do a reboot to be > sure. > > -Yonik > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p1861.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
I'm using the solr-nightly of 2008-04-05 Grant Ingersoll-6 wrote: > > What version of Solr/Lucene are you using? > > On Jul 28, 2008, at 4:53 PM, Britske wrote: > >> >> I'm on a development box currently and production servers will be >> bigger, but >> at the same time the index will be too. >> >> Each query requests at most 20 stored fields. Why doesn't help >> lazyfieldloading in this situation? >> I don't need to retrieve all stored fields and I thought I wasn't >> doing this >> (through limiting the fields returned using the FL-param), but if I >> read >> your comment correctly, apparently I am retrieving them all, I'm >> just not >> displaying them all? >> >> Also, if I understand correctly, for optimal performance I need to >> have at >> least enough RAM to put the entire Index size in OS cache (thus RAM) >> + the >> amount of RAM that SOLR / Lucene consumes directly through the JVM? >> (which >> among other things includes the Lucene field-cache + all of SOlr's >> caches on >> top of that). >> >> I've never read the requirement of having the entire index in OS cache >> before, is this because in normal situations (with less stored >> fields) it >> doesn't matter much? I'm just surprised to hear of this for the >> first time, >> since it will likely give a big impact on my design. >> >> Luckily most of the normal queries return 10 documents each, which >> results >> in a discrepancy between total elapsed time and qTIme of about 15-30 >> ms. >> Doesn't this seem strange, since to me it would seem logical that the >> discrepancy would be at least 1/10th of fetching 100 documents. >> >> hmm, hope you can shine some light on this, >> >> Thanks a lot, >> Britske >> >> >> >> Yonik Seeley wrote: >>> >>> That's a bit too tight to have *all* of the index cached...your best >>> bet is to go to 4GB+, or figure out a way not to have to retrieve so >>> many stored fields. >>> >>> -Yonik >>> >>> On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) Physical RAM is 2 GB with -Xmx800M set to Solr. Yonik Seeley wrote: > > That high of a difference is due to the part of the index > containing > these particular stored fields not being in OS cache. What's the > size > on disk of your index compared to your physical RAM? > > -Yonik > > On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: >> >> Hi all, >> >> For some queries I need to return a lot of rows at once (say 100). >> When performing these queries I notice a big difference between >> qTime >> (which >> is mostly in the 15-30 ms range due to caching) and total time >> taken to >> return the response (measured through SolrJ's elapsedTime), >> which takes >> between 500-1600 ms. >> >> For queries which return less rows the difference becomes less >> big. >> >> I presume (after reading some threads in the past) that this is >> due to >> solr >> constructing and streaming the response (which includes >> retrieving the >> stored fields) , which is something that is not calculated in >> qTime. >> >> Documents have a lot of stored fields (more than 10.000), but at >> any >> given >> query a maximum of say 20 are returned (through fl-field ) or >> used (as >> part >> of filtering, faceting, sorting) >> >> I would have thought that enabling enableLazyFieldLoading for this >> situation >> would mean a lot, since so many stored fields can be skipped, >> but I >> notice >> no real difference in measuring total elapsed time (or qTime for >> that >> matter). >> >> Am I missing something here? What criteria would need to be met >> for a >> field >> to not be loaded for instance? Should I see a big performance >> boost in >> this >> situation? >> >> Thanks, >> Britske >> -- >> View this message in context: >> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18699550.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- > Grant Ingersoll > http://www.lucidimagination.com > > L
RE: Tokenizing and searching named character entity references
Hi Frances, HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an HTMLStripReader. You could extend HTMLStripReader to not decode named character entities, e.g. by overriding HTMLStripReader.read() so that it calls an alternative readEntity(), which instead of converting entity references to characters would just leave the entity references as-is, something like: public class MyHTMLStripReader extends HTMLStripReader { / override read() to call myReadEntity(), but no other changes public int read() throws IOException { ... switch (ch) { case '&': saveState(); ch = myReadEntity(); / Change this line to call new method if (ch>=0) return ch; if (ch==MISMATCH) { restoreState(); return '&'; } break; ... } } private int myReadEntity() throws IOException { int ch = next(); if (ch=='#') return readNumericEntity(); return MISMATCH; / Always a mismatch, except for numeric entities } } Then you could create a new Factory, something like: public class MyHTMLStripWhitespaceTokenizerFactory extends BaseTokenizerFactory { public TokenStream create(Reader input) { return new WhitespaceTokenizer(new MyHTMLStripReader(input)); } } Steve On 07/24/2008 at 9:53 AM, F Knudson wrote: > > Greetings: > > I am working with many different data sources - some source > employ "entity references" ; others do not. My goal is to > make the searching across sources as consistent as possible. > > Example text - > > Source1: weakening Hδ absorption > Source1: zero-field gap ω > > Source2: weakening H delta absorption > Source2: zero-field gap omega > > Using the tokenizer solr.HTMLStripWhitespaceTokenizerFactory > for Source1 - the entity is replaced with the "named character > entity" - This works great. > > But I want the searching tokens to be identical for each > source. I need to capture δ as a token. > > positionIncrementGap="100"> > > > ignoreCase="true" expand="true"/> > > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateA ll="0"/> > > > > > > Is this possible with the SOLR supplied tokenizers? I > experimented with different combinations and orders and was > not successful. > > Is this possible using synonyms? I also experimented with > this route but again was not successful. > > Do I need to create a custom tokenizer? > > Thanks > Frances
Re: Expansion stemming
: "Expansion stemming ? Takes a root word and 'expands' it to all of its : various forms ? can be used either at insertion time or at query : time." : : How do I specify that I want the expansion stemming instead of the porter : stemming? there isn't anexpclit expansion stemming filter included with Solr. As far as i know the only way to accomplish expansion stemming is with a dictionary of word mappings -- which could be achieved using the SynonymFilterFactory ... i've added a note aboutthis to the wiki. -Hoss
Re: morphology and queryPrase
: When i'm looking for words taking care of distance between them, i'm using : lucene syntax "A B"~distance... unfortunaly if A leads to A1 and A2 forms i : should split this into syntax +("A1 B"~dist "A2 B"~dist ") - this grows with : progression depending of normal forms quantity of each term. : : Can i search within distance using something like (+(A1 A2) +(B))~dist... : i heard that dismax can handle distance between words ignoring quotes - : could you advice in this? Internally there are types of Lucene queries that can manage structure like what you are describing: SpanNearQuery being the most flexible, MultiPhraseQuery being less flexible but (in theory) faster. Neither of these are directly usable from the query parser -- but you could write your own query parser (or custom request handler) that built them up. -Hoss
Re: Best way to return ExternalFileField in the results
: I've been trying to return a field of type ExternalFileField in the search : result. Upon examining XMLWriter class, it seems like Solr can't do this out : of the box. Therefore, I've tried to hack Solr to enable this behaviour. : The goal is to call to ExternalFileField.getValueSource(SchemaField : field,QParser parser) in XMLWriter.writeDoc(String name, Document : document,...) method. There are two issues with doing this: Some of what you're specificly asking about could probably be achieved by modifying the XMLWriter constructor to hang on to the SolrCore associated with the request. In general though i wondering if steping back a bit and modifying your request handler to use a SolrDocumentList where you've already flattened the ExternalFileField into each SolrDocument would be an easier approach -- then you wouldnt' need to modify the ResponseWriter at all. -Hoss
Re: Unsure about omitNorms, termVectors...
: > omitNorms: do I need it for full-text fields even if I don't need index-time : > boosting? I don't want to boost text where keyword repeated several time. Is : > my understanding correct? if you omitNorms="true" then you not only lose index-time doc/field boosting, but you also loose lengthNorms -- it won't matter how long a field is, if a term occurs once in a 5 term field value it will score the same as if it appears once in a 5000 term field value. if you don't wnat docs to score higher when the word is repeated omitNorms won't help you -- you'll need a custom similarity where you override the tf() method. : > What are memory requirements for Lucene caches warming up if I use term : > vectors and norms? : : I don't believe Term Vectors are cached anywhere, other than via the OS. I'd : have to go dig around for norms info, or maybe someone else can chime in. norms is one byte per doc per field. -Hoss
Re: Best way to return ExternalFileField in the results
In general though i wondering if steping back a bit and modifying your request handler to use a SolrDocumentList where you've already flattened the ExternalFileField into each SolrDocument would be an easier approach -- then you wouldnt' need to modify the ResponseWriter at all. Consider using a search component at the end of the chain that adds fields to your document... this way things work for any writer (json, xml, whatever) We really should add an example to do this... but in the meantime, a good example (though a bit complex) is with the local lucene: http://sourceforge.net/projects/locallucene/ this adds a calculated distance to each document before it gets passed to the writer
RE: Tokenizing and searching named character entity references
: You could extend HTMLStripReader to not decode named character entities, : e.g. by overriding HTMLStripReader.read() so that it calls an : alternative readEntity(), which instead of converting entity references : to characters would just leave the entity references as-is, something : like: Alternately: use SynonymFilterFactory to map any entity "names" to the real Unicode character so your "Source2" style docs get "omega" replaced with the same character the HTMLStrip*TokenizerFactories generate when they encounter the HTML entities. generating the list of synonyms from the comment at the end of HTMLSripReader.java should be easy. : > Source1: weakening Hδ absorption : > Source1: zero-field gap ω : > : > Source2: weakening H delta absorption : > Source2: zero-field gap omega -Hoss
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
On 28-Jul-08, at 1:53 PM, Britske wrote: Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in this situation? It does help, but not enough. With lots of data per document and not a lot of memory, it becomes probabilistically likely that each doc resides in a separate uncached disk block, thus requiring a disk seek (~10ms), which then dominates total time regardless of the amount of bytes read. I don't need to retrieve all stored fields and I thought I wasn't doing this (through limiting the fields returned using the FL-param), but if I read your comment correctly, apparently I am retrieving them all, I'm just not displaying them all? No, they are not read. It is important to understand the performance characteristic of disks in random access vs. serial reading in this case. Also, if I understand correctly, for optimal performance I need to have at least enough RAM to put the entire Index size in OS cache (thus RAM) + the amount of RAM that SOLR / Lucene consumes directly through the JVM? (which among other things includes the Lucene field-cache + all of SOlr's caches on top of that). Not necessarily all, no. The type of data you store and the request characteristics affect the size of the "hot spot" of the index, the specific blocks that need to be in memory to achieve good performance. If you are retrieving the stored fields for 100 docs per query, the doc data should probably be all in cache. One way to mitigate this is to partition the fields like I suggested in the other reply. -Mike
javax.xml.stream.XMLStreamException while indexing
I've recently encountered a strange error while batch indexing around 500 average-sized documents: HTTP Status 500 - null javax.xml.stream.XMLStreamException at com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3700) at com.bea.xml.stream.MXParser.more(MXParser.java:3715) at com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1756) at com.bea.xml.stream.MXParser.next(MXParser.java:1333) at org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:323) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:197) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1038) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.hyperic.hq.product.servlet.filter.JMXFilter.doFilter(JMXFilter.java:324) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:174) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:151) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:870) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:685) at java.lang.Thread.run(Thread.java:595) Most other reports of this exception refer to an XML parse error on a particular line / column, however this is not the case in this situation. It doesn't seem to be a problem with the data either, since it fails on different sets of documents on every occasion (i.e. I can't find specific input data to reproduce this problem). Increasing / decreasing the number of documents still results in the same error. The system I'm using consists of Solr 1.3 dev (compiled from SVN on 2008-07-21), Tomcat 5.5.23, and Sun Java SDK 1.5.0-11-1 running on Ubuntu Server 7.10 with all current updates applied. Has anybody else experienced a similar problem to this? Would upgrading either Tomcat / Java help in this instance? Thanks in advance for any help. regards, Pieter
RE: solr synonyms behaviour
Hi Laurent Laurent Gilles wrote: > > Hi, > > I was faced with the same issues reguarding multiwords synonyms > Let's say a synonyms list like: > > club, bar, night cabaret > > Now if we have a document containing "club", with the default synonyms > filter behaviour with expand=true, we will end up in the lucene index with > a > document containing "club|bar|night cabaret". > So if the user search for "night", the query-time will search for "night" > in > the index and will match our document since it had been "enriched" @ > index-time, and it really contains the token "night". > > The only valid solution I've founded was to create a field-type > exclusively > used for synonyms search where: > > @IndexTime > ignoreCase="true" expand="false" /> > @QueryTime > ignoreCase="true" expand="false" /> > > And with a customised synonyms file that looks like: > > SYN_ID_1, club, bar, night cabaret > > So for our document containing "club", the synonym filter at index time > with > expand=false will replace every matching token/expression in the document > with the SYN_ID_1. > > And at query time, when an user search for "night", since "night" is not > alone in synonyms definition, it will not be matched, even by "normal" > search, because every document containing "club" or "bar" would have been > "enriched" with "SYN_ID_1" and NOT with "club|bar|night cabaret", so the > final indexed document will not contains isolated token from synonyms > expression that risks to be matched later without notice. > > In order to match our document containing "club", the user HAVE TO type > the > entire expression "night cabaret", and not only part of the expression. > > > Of course, as I said before, this field was exclusively used for synonym > matching, so it requires another field for normal full-text-stemmed search > to add normal results, this approach give us the opportunity to setup > Boosting separately on full-text-stemmed search VS synonyms search, let's > say : > > "title_stem":"club"^100 OR "title_syns":"club"^10 > > I hope to have been clear, even if I dont believe to.. Fact is this > approach have fixed your problem, since we didn't what synonym matching if > the user only types part of synonymic expression. > > Regards, > Laurent > > This has seemed to solve our problem. Thank you very much for your help. Once we have our environment setup and all of our data indexed, it may even provide an extra 'bonus' to be able to add different weights/boosts for the different fields. Now, not to be too greedy, but I am wondering if there is a way to utilize this technique for "Explicit synonym matching" (i.e. synonym mappings that use the '=>' operator). For example, we may have a couple mappings like the following: night club=>club, bar swim club=>club, team As you can see, both night clubs and swim clubs are clubs, but are not necessarily equivalent with the term "club". It would be nice to be able to search for "night club" and only see results for "clubs" and "bars", but not necessarily "teams", which otherwise, would show up in the results if we use Equivalent synonyms. Just wondering if you have been able to do this as well. Again, thank you for your help! -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18703520.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true
That sounds interesting. Let me explain my situation, which may be a variant of what you are proposing. My documents contain more than 10.000 fields, but these fields are divided like: 1. about 20 general purpose fields, of which more than 1 can be selected in a query. 2. about 10.000 fields of which each query based on some criteria exactly selects one field. Obviously 2. is killing me here, but given the above perhaps it would be possible to make 10.000 vertical slices/ indices, and based on the field to be selected (from point 2) select the slice/index to search in. The 10.000 indices would run on the same box, and the 20 general purpose fields have have to be copied to all slices (which means some increase in overall index size, but managable), but this would give me far more reasonable sized and compact documents, which would mean (documents are far more likely to be in the same cached slot, and be accessed in the same disk -seek. Does this make sense? Am I correct that this has nothing to do with Distributed search, since that really is all about horizontal splitting / sharding of the index, and what I'm suggesting is splitting vertically? Is there some other part of Solr that I can use for this, or would it be all home-grown? Thanks, Britske Mike Klaas wrote: > > Another possibility is to partition the stored fields into a > frequently-accessed set and a full set. If the frequently-accessed > set is significantly smaller (in terms of # bytes), then the documents > will be tightly-packed on disk and the os caching will be much more > effective given the same amount of ram. > > The situation you are experiencing is one-seek-per-doc, which is > performance death. > > -Mike > > On 28-Jul-08, at 1:34 PM, Yonik Seeley wrote: > >> That's a bit too tight to have *all* of the index cached...your best >> bet is to go to 4GB+, or figure out a way not to have to retrieve so >> many stored fields. >> >> -Yonik >> >> On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: >>> >>> Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that >>> matters) >>> Physical RAM is 2 GB with -Xmx800M set to Solr. >>> >>> >>> Yonik Seeley wrote: That high of a difference is due to the part of the index containing these particular stored fields not being in OS cache. What's the size on disk of your index compared to your physical RAM? -Yonik On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: > > Hi all, > > For some queries I need to return a lot of rows at once (say 100). > When performing these queries I notice a big difference between > qTime > (which > is mostly in the 15-30 ms range due to caching) and total time > taken to > return the response (measured through SolrJ's elapsedTime), which > takes > between 500-1600 ms. > > For queries which return less rows the difference becomes less big. > > I presume (after reading some threads in the past) that this is > due to > solr > constructing and streaming the response (which includes > retrieving the > stored fields) , which is something that is not calculated in > qTime. > > Documents have a lot of stored fields (more than 10.000), but at > any > given > query a maximum of say 20 are returned (through fl-field ) or > used (as > part > of filtering, faceting, sorting) > > I would have thought that enabling enableLazyFieldLoading for this > situation > would mean a lot, since so many stored fields can be skipped, but I > notice > no real difference in measuring total elapsed time (or qTime for > that > matter). > > Am I missing something here? What criteria would need to be met > for a > field > to not be loaded for instance? Should I see a big performance > boost in > this > situation? > > Thanks, > Britske > -- > View this message in context: > http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698590.html > Sent from the Solr - User mailing list archive at Nabble.com. > > >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18698909.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> > > > -- View this message in context: http://www.nabble.com/big-discrepancy-between-elapsedtime-and-qtime-although-enableLazyFieldLoading%3D-true-tp18698590p18706099.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: nested data structure definition
In my site, I have a document, which may have multiple comments. For each comment, I would like to know several pieces of information, like: text, author, and date. -Matt Shalin Shekhar Mangar wrote: > > Hi Ranjeet, > > Solr supports multi-valued fields and you can always denormalize your > data. > Can you give more details on the problem you are trying to solve? > > On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet > <[EMAIL PROTECTED]>wrote: > >> Hi, >> >> Can we defined nested data structure in schema.xml for searching? is it >> prossible or not? >> > -- View this message in context: http://www.nabble.com/nested-data-structure-definition-tp18687164p18706307.html Sent from the Solr - User mailing list archive at Nabble.com.