Fwd: Performance help for heavy indexing workload

2008-02-12 Thread James Brady
Hi again, More analysis showed that the extraordinarily long query times only appeared when I specify a sort. A concrete example: For a querystring such as: ?indent=on&version=2.2&q=apache+user_id% 3A39&start=0&rows=1&fl=*%2Cscore&qt=standard&wt=standard&explainOther= The QTime is ~500ms. Fo

Re: SolrJ and Unique Doc ID

2008-02-12 Thread Chris Hostetter
: > Honestly: i can't think of a single use case where client code would care : > about what the uniqueKey field is, unless it already *knew* what the : > uniqueKey field is. : : :-) Abstractions allow one to use different implementations. My : client/display doesn't know about Solr, it just kno

Filter Query

2008-02-12 Thread Evgeniy Strokin
Hello,.. Lets say I have one query like this: NAME:Smith I need to restrict the result and I'm doing this: NAME:Smith AND AGE:30 Also, I can do this using fq parameter: q=NAME:Smith&fq=AGE:30 The result of second and third queries should be the same, right? But why should I use fq then? In which ca

Re: Performance help for heavy indexing workload

2008-02-12 Thread Mike Klaas
On 11-Feb-08, at 11:38 PM, James Brady wrote: Hello, I'm looking for some configuration guidance to help improve performance of my application, which tends to do a lot more indexing than searching. At present, it needs to index around two documents / sec - a document being the stripped c

Re: SolrJ and Unique Doc ID

2008-02-12 Thread Erik Hatcher
On Feb 12, 2008, at 3:44 PM, Grant Ingersoll wrote: On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote: : > Honestly: i can't think of a single use case where client code would care : > about what the uniqueKey field is, unless it already *knew* what the : > uniqueKey field is. : : :-) Ab

RE: Performance help for heavy indexing workload

2008-02-12 Thread Lance Norskog
1) autowarming: it means that if you have a cached query or similar, and do a commit, it then reloads each cached query. This is in solrconfig.xml 2) sorting is a pig. A sort creates an array of N integers where N is the size of the index, not the query. If the sorted field is anything but an integ

Using embedded Solr with admin GUI

2008-02-12 Thread Ken Krugler
Hi all, We're moving towards embedding multiple Solr cores, versus using multiple Solr webapps, as a way of simplifying our build/deploy and also getting more control over the startup/update process. But I'd hate to lose that handy GUI for inspecting the schema and (most importantly) trying

Re: what is searcher

2008-02-12 Thread Briggs
Searcher is the main search abstraction in Lucene. It defines the methods used for querying an underlying index(es). See: http://lucene.apache.org/java/2_3_0/api/org/apache/lucene/search/Searcher.html On Feb 12, 2008 10:33 PM, Mochamad bahri nurhabbibi <[EMAIL PROTECTED]> wrote: > hello all.. >

what is searcher

2008-02-12 Thread Mochamad bahri nurhabbibi
hello all.. I am learning SOLR since 2 days ago. I have to make training/presentation aboutSOLR to rest of my fellow in my company. my question is: what is searcher ? this term seems to be found everywhere. but there's no exact definition of this term either in google nor SOLR wiki. anyone pl

Re: Performance help for heavy indexing workload

2008-02-12 Thread James Brady
Hi - thanks to everyone for their responses. A couple of extra pieces of data which should help me optimise - documents are very rarely updated once in the index, and I can throw away index data older than 7 days. So, based on advice from Mike and Walter, it seems my best option will be t

Re: 2D Facet

2008-02-12 Thread evgeniy . strokin
Chris, I'm very interested to implement generic multidimensional faceting. But I'm not an expert in Solr, but I'm very good with Java. So I need little bit more directions if you don't mind. I promise to share my code and if you'll be Ok with it you are welcome to use it. So, Lets say I have a p

Re: upgrading to lucene 2.3

2008-02-12 Thread Grant Ingersoll
See: https://issues.apache.org/jira/browse/SOLR-330 https://issues.apache.org/jira/browse/SOLR-342 for various solutions around taking advantage of Lucene's new capabilities. -Grant On Feb 12, 2008, at 1:15 PM, Yonik Seeley wrote: On Feb 12, 2008 1:06 PM, Lance Norskog <[EMAIL PROTECTED]

Re: upgrading to lucene 2.3

2008-02-12 Thread Yonik Seeley
On Feb 12, 2008 1:06 PM, Lance Norskog <[EMAIL PROTECTED]> wrote: > What will this improve? Text analysis may be slower since Solr won't have the changes to use the faster Token APIs. Indexing overall should still be faster. Querying should see little change. -Yonik

RE: upgrading to lucene 2.3

2008-02-12 Thread Lance Norskog
What will this improve? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, February 12, 2008 6:48 AM To: solr-user@lucene.apache.org Subject: Re: upgrading to lucene 2.3 On Feb 12, 2008 9:25 AM, Robert Young <[EMAIL PROTECTED]> wr

Re: SolrJ and Unique Doc ID

2008-02-12 Thread Grant Ingersoll
On Feb 12, 2008, at 2:10 PM, Chris Hostetter wrote: : > Honestly: i can't think of a single use case where client code would care : > about what the uniqueKey field is, unless it already *knew* what the : > uniqueKey field is. : : :-) Abstractions allow one to use different implementations

Re: Strange behavior

2008-02-12 Thread Yonik Seeley
On Feb 12, 2008 9:50 AM, Traut <[EMAIL PROTECTED]> wrote: > Thank you, it works. Stemming filter works only with lowercased words? I've never tried it in the order you have it. You could try the analysis admin page and report back what happens... -Yonik > On Feb 12, 2008 4:29 PM, Yonik Seeley <

Re: Setting the schema files

2008-02-12 Thread Ryan McKinley
Aditi Goyal wrote: Hi, I am using the SOLR searching in my project. I am actually little bit confused about how the schema works. Can you please provide me the documentation where I can define how should my query work? Like, I want that "a, and, the etc" should not be searched. Also, it should n

Re: Performance help for heavy indexing workload

2008-02-12 Thread Walter Underwood
On 2/12/08 7:40 AM, "Ken Krugler" <[EMAIL PROTECTED]> wrote: > In general immediate updating of an index with a continuous stream of > new content, and fast search results, work in opposition. The > searcher's various caches are getting continuously flushed to avoid > stale content, which can easi

Re: Performance help for heavy indexing workload

2008-02-12 Thread Walter Underwood
That does seem really slow. Is the index on NFS-mounted storage? wunder On 2/12/08 7:04 AM, "Erick Erickson" <[EMAIL PROTECTED]> wrote: > Well, the *first* sort to the underlying Lucene engine is expensive since > it builds up the terms to sort. I wonder if you're closing and opening the > under

Re: Fwd: Performance help for heavy indexing workload

2008-02-12 Thread Ken Krugler
Hi James, I'm looking for some configuration guidance to help improve performance of my application, which tends to do a lot more indexing than searching. At present, it needs to index around two documents / sec - a document being the stripped content of a webpage. However, performance was

wildcard query question

2008-02-12 Thread Alessandro Senserini
I have indexed a field called courseTitle of 'text' type (as in the schema.xml but without the stemming factory) that contains COBOL: Data Structure Searching with a wildcard query like courseTitle:cobol\:* AND courseTitle:data* AND courseTitle:structure* (the colon character ":" i

RE: upgrading to lucene 2.3

2008-02-12 Thread Fuad Efendi
I did the same: Stopped SOLR-1.2, replaced Lucene jars, started SOLR-1.2 No any problem. > -Original Message- > From: Robert Young [mailto:[EMAIL PROTECTED] > Sent: Tuesday, February 12, 2008 9:25 AM > To: solr-user@lucene.apache.org > Subject: Re: upgrading to lucene 2.3 > > > ok, a

Re: upgrading to lucene 2.3

2008-02-12 Thread Robert Young
ok, and to do the change I just replace the jar directly in sorl/WEB_INF/lib and restart tomcat? Thanks Rob On Feb 12, 2008 1:55 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Solr Trunk is using the latest Lucene version. Also note there are a > couple edge cases in Lucene 2.3 that are causin

Strange behavior

2008-02-12 Thread Traut
Hi all Please take a look at this strange behavior (connected with stemming I suppose): type: field: I'm adding a document: 99Apple Queriyng "name:apple" - 0 results. Searching "

Re: upgrading to lucene 2.3

2008-02-12 Thread Grant Ingersoll
Solr Trunk is using the latest Lucene version. Also note there are a couple edge cases in Lucene 2.3 that are causing problems if you use SOLR-342 with lucenAutoCommit == false. But, yes, you should be able to drop in 2.3, as that is one of the back-compatible goals for Lucene minor releas

RE: Commit preformance problem

2008-02-12 Thread Jae Joo
Or, if you have multiple files to be updated, please make sure "Index Multiple Files" and commit "Once" at the end of Indexing.. Jae -Original Message- From: Jae Joo [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 12, 2008 10:50 AM To: solr-user@lucene.apache.org Subject: RE: Commit pr

RE: Commit preformance problem

2008-02-12 Thread Jae Joo
I have same experience.. I do have 6.5G Index and update it daily. Have you ever check that the updated file does not have any document and tried "commit"? I don't know why, but it takes so long - more than 10 minutes. Jae Joo -Original Message- From: Ken Krugler [mailto:[EMAIL PROTECTED]

Re: Commit preformance problem

2008-02-12 Thread Ken Krugler
I have a large solr index that is currently about 6 GB and is suffering of severe performance problems during updates. A commit can take over 10 minutes to complete. I have tried to increase max memory to the JVM to over 6 GB, but without any improvement. I have also tried to turn off waitSearcher

Commit preformance problem

2008-02-12 Thread Anders Arpteg
I have a large solr index that is currently about 6 GB and is suffering of severe performance problems during updates. A commit can take over 10 minutes to complete. I have tried to increase max memory to the JVM to over 6 GB, but without any improvement. I have also tried to turn off waitSearcher

Re: Strange behavior

2008-02-12 Thread Traut
Thank you, it works. Stemming filter works only with lowercased words? On Feb 12, 2008 4:29 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > Try putting the stemmer after the lowercase filter. > -Yonik > > On Feb 12, 2008 9:15 AM, Traut <[EMAIL PROTECTED]> wrote: > > Hi all > > > > Please take a loo

Re: Strange behavior

2008-02-12 Thread Yonik Seeley
Try putting the stemmer after the lowercase filter. -Yonik On Feb 12, 2008 9:15 AM, Traut <[EMAIL PROTECTED]> wrote: > Hi all > > Please take a look at this strange behavior (connected with stemming I > suppose): > > > type: > > stored="false"> > > > > >

Re: SolrJ and Unique Doc ID

2008-02-12 Thread Grant Ingersoll
On Feb 11, 2008, at 11:24 PM, Chris Hostetter wrote: : Another option is to add it to the responseHeader Or it could be a quick : add to the LukeRH. The former has the advantage that we wouldn't have to make adding the info to LukeRequestHandler makes sense. Honestly: i can't think

Setting the schema files

2008-02-12 Thread Aditi Goyal
Hi, I am using the SOLR searching in my project. I am actually little bit confused about how the schema works. Can you please provide me the documentation where I can define how should my query work? Like, I want that "a, and, the etc" should not be searched. Also, it should not spilt on case chan

upgrading to lucene 2.3

2008-02-12 Thread Robert Young
I have heard that upgrading to lucene 2.3 in Solr 1.2 is as simple as replacing the lucene jar and restarting. Is this the case? Has anyone had any experience with upgrading lucene to 2.3? Did you have any problems? Is there anything I should be looking out for? Thanks Rob

Re: upgrading to lucene 2.3

2008-02-12 Thread Yonik Seeley
On Feb 12, 2008 9:25 AM, Robert Young <[EMAIL PROTECTED]> wrote: > ok, and to do the change I just replace the jar directly in > sorl/WEB_INF/lib and restart tomcat? That should work. -Yonik

Re: Performance help for heavy indexing workload

2008-02-12 Thread Erick Erickson
Well, the *first* sort to the underlying Lucene engine is expensive since it builds up the terms to sort. I wonder if you're closing and opening the underlying searcher for every request? This is a definite limiter. Disclaimer: I mostly do Lucene, not SOLR (yet), so don't *even* ask me how to chan

Re: Filter Query

2008-02-12 Thread Shalin Shekhar Mangar
Using q=NAME:Smith&fq=AGE:30 would be better because filter queries are cached separately and can be re-used regardless of the NAME query. So if you expect your filter queries to be re-used, you should use fq, otherwise performance would probably be the same for both "NAME:Smith AND AGE:30" and "q=