Incomplete search by using dixmasequesthandler
Hi, I am using dismaxrequesthandler to boost by query on the basis of fields. There are 5 indexes which contain the search string. Field names which have this Searchcriteria are: - statusName_s - listOf_author - prdMainTitle_s - productDescription_s - productURL_s my query string is : ?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s%5E6.0+prdMainTitle_s%5E3.0 I dont need any boosting for productDescription_s and productURL_s hence i am not giving these field names the query string above. The results I am getting from this query does not contain documents where the searchstring is present in productDescription_s and productURL_s fields Is there any configuration in solrconfig which can handle this scenario. Let me know if you need more information. Thanks, Prerna -- View this message in context: http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incomplete search by using dixmasequesthandler
On Oct 4, 2008, at 4:24 AM, prerna07 wrote: I am using dismaxrequesthandler to boost by query on the basis of fields. There are 5 indexes which contain the search string. Field names which have this Searchcriteria are: - statusName_s - listOf_author - prdMainTitle_s - productDescription_s - productURL_s my query string is : ?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s %5E6.0+prdMainTitle_s%5E3.0 I dont need any boosting for productDescription_s and productURL_s hence i am not giving these field names the query string above. The results I am getting from this query does not contain documents where the searchstring is present in productDescription_s and productURL_s fields You still need to specify the fields you want searched, just use a boost of 1.0. Erik Is there any configuration in solrconfig which can handle this scenario. Let me know if you need more information. Thanks, Prerna -- View this message in context: http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incomplete search by using dixmasequesthandler
All these fields are dynamic fields hence we dont know names of all the fields also the number of dynamic fields is large, and we want to search for all these dynamic fields. Is there any other way of query field boosting ? prerna07 wrote: > > Hi, > > I am using dismaxrequesthandler to boost by query on the basis of fields. > There are 5 indexes which contain the search string. > > Field names which have this Searchcriteria are: > - statusName_s > - listOf_author > - prdMainTitle_s > - productDescription_s > - productURL_s > > > my query string is : > ?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s%5E6.0+prdMainTitle_s%5E3.0 > > I dont need any boosting for productDescription_s and productURL_s hence i > am not giving these field names the query string above. > The results I am getting from this query does not contain documents where > the searchstring is present in productDescription_s and productURL_s > fields > > Is there any configuration in solrconfig which can handle this scenario. > > Let me know if you need more information. > > Thanks, > Prerna > > > > -- View this message in context: http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Incomplete search by using dixmasequesthandler
Currently there is not a way to specify wildcards or "all" fields in a qf parameter. However, if the goal is to make a bunch of dynamic fields searchable, but without individual boosts, use copyField to merge all of your desired dynamic fields into a single searchable one. Erik On Oct 4, 2008, at 5:36 AM, prerna07 wrote: All these fields are dynamic fields hence we dont know names of all the fields also the number of dynamic fields is large, and we want to search for all these dynamic fields. Is there any other way of query field boosting ? prerna07 wrote: Hi, I am using dismaxrequesthandler to boost by query on the basis of fields. There are 5 indexes which contain the search string. Field names which have this Searchcriteria are: - statusName_s - listOf_author - prdMainTitle_s - productDescription_s - productURL_s my query string is : ?q=Shahrukh&qt=dismaxrequest&qf=listOf_author%5E5.0+statusName_s %5E6.0+prdMainTitle_s%5E3.0 I dont need any boosting for productDescription_s and productURL_s hence i am not giving these field names the query string above. The results I am getting from this query does not contain documents where the searchstring is present in productDescription_s and productURL_s fields Is there any configuration in solrconfig which can handle this scenario. Let me know if you need more information. Thanks, Prerna -- View this message in context: http://www.nabble.com/Incomplete-search-by-using-dixmasequesthandler-tp19810056p19810457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Commit in solr 1.3 can take up to 5 minutes
5 minutes for only one update is slow. On Fri, Oct 3, 2008 at 8:13 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote: > Hi Uwe, > > 5 minutes is not slow; commit can't be realtime... I do commit&optimize > once a day at 3:00AM. It takes 15-20 minutes, but I have several millions > daily updates... > > > > Is there a way to see why commits are slow? Has anyone had the same >> problem >> and what was the solution that solved it? >> >> I can provide my schema.xml and solrconfig.xml if needed. >> >> Thanks in advance >> Uwe >> >> > > >
Re: Commit in solr 1.3 can take up to 5 minutes
Thanks Mike The use of fsync() might be the answer to my problem, because I have installed Solr for lack of other possibilities in a zone on Solaris with ZFS which slows down when many fsync() calls are made. This will be fixed in a upcoming release of Solaris, but I will move as soon as possible the Solr instances to another server with a different file system. Would the use of a different file system than ext3 boost the performance? Uwe On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Yonik Seeley wrote: > > On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: >> >>> I have a big problem with one of my solr instances. A commit can take up >>> to >>> 5 minutes. This time does not depend on the number of documents which are >>> updated. The difference for 1 or 100 updated documents is only a few >>> seconds. >>> >> >> Since Solr's commit logic really hasn't changed, I wonder if this >> could be lucene related somehow. >> > > Lucene's commit logic has changed: we now fsync() each file in the index to > ensure all bytes are on stable storage, before returning. > > But I can't imagine that taking 5 minutes, unless there are somehow a great > many files added to the index? > > Uwe, what filesystem are you using? > > Yonik, when Solr commits what does it actually do? > > Mike >
Re: Commit in solr 1.3 can take up to 5 minutes
Hmm OK that seems like a possible explanation then. Still it's spooky that it's taking 5 minutes. How many files are in the index at the time you call commit? I wonder if you were to simply pause for say 30 seconds, before issuing the commit, whether you'd then see the commit go faster? On Windows at least such a silly trick does seem to improve performance, I think because it allows the OS to move the bytes from its write cache onto stable storage "on its own schedule" whereas when we commit we are demanding the OS move the bytes on our [arbitrary] schedule. I really wish OSs would add an API that would just block & return once the file has made it to stable storage (letting the OS sync on its own optimal schedule), rather than demanding the file be fsync'd immediately. I really haven't explored the performance of fsync on different filesystems. I think I've read that ReiserFS may have issues, though it could have been addressed by now. I *believe* ext3 is OK (at least, it didn't show the strange "sleep to get better performance" issue above, in my limited testing). Mike Uwe Klosa wrote: Thanks Mike The use of fsync() might be the answer to my problem, because I have installed Solr for lack of other possibilities in a zone on Solaris with ZFS which slows down when many fsync() calls are made. This will be fixed in a upcoming release of Solaris, but I will move as soon as possible the Solr instances to another server with a different file system. Would the use of a different file system than ext3 boost the performance? Uwe On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Yonik Seeley wrote: On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: I have a big problem with one of my solr instances. A commit can take up to 5 minutes. This time does not depend on the number of documents which are updated. The difference for 1 or 100 updated documents is only a few seconds. Since Solr's commit logic really hasn't changed, I wonder if this could be lucene related somehow. Lucene's commit logic has changed: we now fsync() each file in the index to ensure all bytes are on stable storage, before returning. But I can't imagine that taking 5 minutes, unless there are somehow a great many files added to the index? Uwe, what filesystem are you using? Yonik, when Solr commits what does it actually do? Mike
Re: Commit in solr 1.3 can take up to 5 minutes
There are around 35.000 files in the index. When I started Indexing 5 weeks ago with only 2000 documents I did not this issue. I have seen it the first time with around 10.000 documents. Before that I have been using the same instance on a Linux machine with up to 17.000 documents and I haven't seen this issue at all. The original plan has always been to use Solr on Linux, but I'm still waiting for the new server. Uwe On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Hmm OK that seems like a possible explanation then. Still it's spooky that > it's taking 5 minutes. How many files are in the index at the time you call > commit? > > I wonder if you were to simply pause for say 30 seconds, before issuing the > commit, whether you'd then see the commit go faster? On Windows at least > such a silly trick does seem to improve performance, I think because it > allows the OS to move the bytes from its write cache onto stable storage "on > its own schedule" whereas when we commit we are demanding the OS move the > bytes on our [arbitrary] schedule. > > I really wish OSs would add an API that would just block & return once the > file has made it to stable storage (letting the OS sync on its own optimal > schedule), rather than demanding the file be fsync'd immediately. > > I really haven't explored the performance of fsync on different > filesystems. I think I've read that ReiserFS may have issues, though it > could have been addressed by now. I *believe* ext3 is OK (at least, it > didn't show the strange "sleep to get better performance" issue above, in my > limited testing). > > Mike > > > Uwe Klosa wrote: > > Thanks Mike >> >> The use of fsync() might be the answer to my problem, because I have >> installed Solr for lack of other possibilities in a zone on Solaris with >> ZFS >> which slows down when many fsync() calls are made. This will be fixed in a >> upcoming release of Solaris, but I will move as soon as possible the Solr >> instances to another server with a different file system. Would the use of >> a >> different file system than ext3 boost the performance? >> >> Uwe >> >> On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < >> [EMAIL PROTECTED]> wrote: >> >> >>> Yonik Seeley wrote: >>> >>> On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: >>> I have a big problem with one of my solr instances. A commit can take > up > to > 5 minutes. This time does not depend on the number of documents which > are > updated. The difference for 1 or 100 updated documents is only a few > seconds. > > Since Solr's commit logic really hasn't changed, I wonder if this could be lucene related somehow. >>> Lucene's commit logic has changed: we now fsync() each file in the index >>> to >>> ensure all bytes are on stable storage, before returning. >>> >>> But I can't imagine that taking 5 minutes, unless there are somehow a >>> great >>> many files added to the index? >>> >>> Uwe, what filesystem are you using? >>> >>> Yonik, when Solr commits what does it actually do? >>> >>> Mike >>> >>> >
Re: Commit in solr 1.3 can take up to 5 minutes
Yikes! That's way too many files. Have you changed mergeFactor? Or implemented a custom DeletionPolicy or MergePolicy? Or... does anyone know of something else in Solr's configuration that could lead to such an insane number of files? Mike Uwe Klosa wrote: There are around 35.000 files in the index. When I started Indexing 5 weeks ago with only 2000 documents I did not this issue. I have seen it the first time with around 10.000 documents. Before that I have been using the same instance on a Linux machine with up to 17.000 documents and I haven't seen this issue at all. The original plan has always been to use Solr on Linux, but I'm still waiting for the new server. Uwe On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Hmm OK that seems like a possible explanation then. Still it's spooky that it's taking 5 minutes. How many files are in the index at the time you call commit? I wonder if you were to simply pause for say 30 seconds, before issuing the commit, whether you'd then see the commit go faster? On Windows at least such a silly trick does seem to improve performance, I think because it allows the OS to move the bytes from its write cache onto stable storage "on its own schedule" whereas when we commit we are demanding the OS move the bytes on our [arbitrary] schedule. I really wish OSs would add an API that would just block & return once the file has made it to stable storage (letting the OS sync on its own optimal schedule), rather than demanding the file be fsync'd immediately. I really haven't explored the performance of fsync on different filesystems. I think I've read that ReiserFS may have issues, though it could have been addressed by now. I *believe* ext3 is OK (at least, it didn't show the strange "sleep to get better performance" issue above, in my limited testing). Mike Uwe Klosa wrote: Thanks Mike The use of fsync() might be the answer to my problem, because I have installed Solr for lack of other possibilities in a zone on Solaris with ZFS which slows down when many fsync() calls are made. This will be fixed in a upcoming release of Solaris, but I will move as soon as possible the Solr instances to another server with a different file system. Would the use of a different file system than ext3 boost the performance? Uwe On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Yonik Seeley wrote: On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: I have a big problem with one of my solr instances. A commit can take up to 5 minutes. This time does not depend on the number of documents which are updated. The difference for 1 or 100 updated documents is only a few seconds. Since Solr's commit logic really hasn't changed, I wonder if this could be lucene related somehow. Lucene's commit logic has changed: we now fsync() each file in the index to ensure all bytes are on stable storage, before returning. But I can't imagine that taking 5 minutes, unless there are somehow a great many files added to the index? Uwe, what filesystem are you using? Yonik, when Solr commits what does it actually do? Mike
Re: Commit in solr 1.3 can take up to 5 minutes
Oh, you meant index files. I misunderstood your question. Sorry, now that I read it again I see what you meant. There are only 136 index files. So no problem there. Uwe On Sat, Oct 4, 2008 at 1:59 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: > > Yikes! That's way too many files. Have you changed mergeFactor? Or > implemented a custom DeletionPolicy or MergePolicy? > > Or... does anyone know of something else in Solr's configuration that could > lead to such an insane number of files? > > Mike > > > Uwe Klosa wrote: > > There are around 35.000 files in the index. When I started Indexing 5 >> weeks >> ago with only 2000 documents I did not this issue. I have seen it the >> first >> time with around 10.000 documents. >> >> Before that I have been using the same instance on a Linux machine with up >> to 17.000 documents and I haven't seen this issue at all. The original >> plan >> has always been to use Solr on Linux, but I'm still waiting for the new >> server. >> >> Uwe >> >> On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless < >> [EMAIL PROTECTED]> wrote: >> >> >>> Hmm OK that seems like a possible explanation then. Still it's spooky >>> that >>> it's taking 5 minutes. How many files are in the index at the time you >>> call >>> commit? >>> >>> I wonder if you were to simply pause for say 30 seconds, before issuing >>> the >>> commit, whether you'd then see the commit go faster? On Windows at least >>> such a silly trick does seem to improve performance, I think because it >>> allows the OS to move the bytes from its write cache onto stable storage >>> "on >>> its own schedule" whereas when we commit we are demanding the OS move the >>> bytes on our [arbitrary] schedule. >>> >>> I really wish OSs would add an API that would just block & return once >>> the >>> file has made it to stable storage (letting the OS sync on its own >>> optimal >>> schedule), rather than demanding the file be fsync'd immediately. >>> >>> I really haven't explored the performance of fsync on different >>> filesystems. I think I've read that ReiserFS may have issues, though it >>> could have been addressed by now. I *believe* ext3 is OK (at least, it >>> didn't show the strange "sleep to get better performance" issue above, in >>> my >>> limited testing). >>> >>> Mike >>> >>> >>> Uwe Klosa wrote: >>> >>> Thanks Mike >>> The use of fsync() might be the answer to my problem, because I have installed Solr for lack of other possibilities in a zone on Solaris with ZFS which slows down when many fsync() calls are made. This will be fixed in a upcoming release of Solaris, but I will move as soon as possible the Solr instances to another server with a different file system. Would the use of a different file system than ext3 boost the performance? Uwe On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Yonik Seeley wrote: > > On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: > > >> I have a big problem with one of my solr instances. A commit can take >> >>> up >>> to >>> 5 minutes. This time does not depend on the number of documents which >>> are >>> updated. The difference for 1 or 100 updated documents is only a few >>> seconds. >>> >>> >>> Since Solr's commit logic really hasn't changed, I wonder if this >> could be lucene related somehow. >> >> >> Lucene's commit logic has changed: we now fsync() each file in the > index > to > ensure all bytes are on stable storage, before returning. > > But I can't imagine that taking 5 minutes, unless there are somehow a > great > many files added to the index? > > Uwe, what filesystem are you using? > > Yonik, when Solr commits what does it actually do? > > Mike > > > >>> >
Re: Commit in solr 1.3 can take up to 5 minutes
Oh OK, phew. I misunderstood your answer too! So it seems like fsync with ZFS can be very slow? Mike Uwe Klosa wrote: Oh, you meant index files. I misunderstood your question. Sorry, now that I read it again I see what you meant. There are only 136 index files. So no problem there. Uwe On Sat, Oct 4, 2008 at 1:59 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Yikes! That's way too many files. Have you changed mergeFactor? Or implemented a custom DeletionPolicy or MergePolicy? Or... does anyone know of something else in Solr's configuration that could lead to such an insane number of files? Mike Uwe Klosa wrote: There are around 35.000 files in the index. When I started Indexing 5 weeks ago with only 2000 documents I did not this issue. I have seen it the first time with around 10.000 documents. Before that I have been using the same instance on a Linux machine with up to 17.000 documents and I haven't seen this issue at all. The original plan has always been to use Solr on Linux, but I'm still waiting for the new server. Uwe On Sat, Oct 4, 2008 at 12:06 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Hmm OK that seems like a possible explanation then. Still it's spooky that it's taking 5 minutes. How many files are in the index at the time you call commit? I wonder if you were to simply pause for say 30 seconds, before issuing the commit, whether you'd then see the commit go faster? On Windows at least such a silly trick does seem to improve performance, I think because it allows the OS to move the bytes from its write cache onto stable storage "on its own schedule" whereas when we commit we are demanding the OS move the bytes on our [arbitrary] schedule. I really wish OSs would add an API that would just block & return once the file has made it to stable storage (letting the OS sync on its own optimal schedule), rather than demanding the file be fsync'd immediately. I really haven't explored the performance of fsync on different filesystems. I think I've read that ReiserFS may have issues, though it could have been addressed by now. I *believe* ext3 is OK (at least, it didn't show the strange "sleep to get better performance" issue above, in my limited testing). Mike Uwe Klosa wrote: Thanks Mike The use of fsync() might be the answer to my problem, because I have installed Solr for lack of other possibilities in a zone on Solaris with ZFS which slows down when many fsync() calls are made. This will be fixed in a upcoming release of Solaris, but I will move as soon as possible the Solr instances to another server with a different file system. Would the use of a different file system than ext3 boost the performance? Uwe On Fri, Oct 3, 2008 at 8:28 PM, Michael McCandless < [EMAIL PROTECTED]> wrote: Yonik Seeley wrote: On Fri, Oct 3, 2008 at 1:56 PM, Uwe Klosa <[EMAIL PROTECTED]> wrote: I have a big problem with one of my solr instances. A commit can take up to 5 minutes. This time does not depend on the number of documents which are updated. The difference for 1 or 100 updated documents is only a few seconds. Since Solr's commit logic really hasn't changed, I wonder if this could be lucene related somehow. Lucene's commit logic has changed: we now fsync() each file in the index to ensure all bytes are on stable storage, before returning. But I can't imagine that taking 5 minutes, unless there are somehow a great many files added to the index? Uwe, what filesystem are you using? Yonik, when Solr commits what does it actually do? Mike
Re: RequestHandler that passes along the query
Thanks grant and ryan, so far so good. But I am confused about one thing - when I set this up like: public void process(ResponseBuilder rb) throws IOException { And put it as the last-component on a distributed search (a defaults shard is defined in the solrconfig for the handler), the component never does its thing. I looked at the TermVectorComponent implementation and it instead defines public int distributedProcess(ResponseBuilder rb) throws IOException { And when I implemented that method it works. Is there a way to define just one method that will work with both distributed and normal searches? On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > No need to even write a new ReqHandler if you're using 1.3: > http://wiki.apache.org/solr/SearchComponent >
Re: Commit in solr 1.3 can take up to 5 minutes
On Fri, Oct 3, 2008 at 2:28 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > Yonik, when Solr commits what does it actually do? Less than it used to (Solr now uses Lucene to handle deletes). A solr-level commit closes the IndexWriter, calls some configured callbacks, opens a new IndexSearcher, warms it, and registers it. We can tell where the time is taken by looking at the timestamps in the log entries. Here is what the log output should look like for a commit: INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) // close the index writer // call any configured post-commit callbacks (to take a snapshot if the index, etc). // open a new IndexSearcher (uses IndexReader.reopen() of the last opened reader) INFO: Opening [EMAIL PROTECTED] main INFO: end_commit_flush // in a different thread, warming of the new IndexSearcher will be done. // by default, the solr-level commit will wait for warming to be done and the new searcher // to be registered (i.e. any new searches will see the committed changes) INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main [...] // there will be multiple autowarming statements, and some could appear before the // end_commit_flush log entry because it's being done in another thread. INFO: [] Registered new searcher [EMAIL PROTECTED] main INFO: Closing [EMAIL PROTECTED] main INFO: {commit=} 0 547 INFO: [] webapp=/solr path=/update params={} status=0 QTime=547 Uwe, can you verify that the bulk of the time is between "start commit" and "Opening Searcher"? -Yonik
Re: *Very* slow Commit after upgrading to solr 1.3
Ben, see also http://www.nabble.com/Commit-in-solr-1.3-can-take-up-to-5-minutes-td19802781.html#a19802781 What type of physical drive is this and what interface is used (SATA, etc)? What is the filesystem (NTFS)? Did you add to an existing index from an older version of Solr, or start from scratch? If you add a single document to the index and commit, does it take a long time? I notice your merge factor is 1000... this will create many files that need to be sync'd It may help to try the IndexWriter settings from the 1.3 example setup... the important changes being: 10 32 -Yonik On Mon, Sep 29, 2008 at 5:33 AM, Ben Shlomo, Yatir <[EMAIL PROTECTED]> wrote: > Hi! > > > > I am running on widows 64 bit ... > I have upgraded to solr 1.3 in order to use the distributed search. > > I haven't changed the solrConfig and the schema xml files during the > upgrade. > > I am indexing ~ 350K documents (each one is about 0.5 KB in size) > > The indexing takes a reasonable amount of time (350 seconds) > > See tomcat log: > > INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==, > YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==, > 9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875 > > > > But when I commit it takes more than an hour ! (5000 seconds!, the > optimize after the commit took 14 seconds) > > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) > > > > p.s. its not a machine problem I moved to another machine and the same > thing happened > > > I noticed something very strange during the time I wait for the commit: > > While the solr index is 210MB in size > > In the windows task manager I noticed that the java process is making a > HUGE amounts of IO reads: > > It reads more than 350 GB ! (- which takes a lot of time.) > > The process is constantly taking 25% of the cpu resources. > > All my autowarmCount in Solrconfig file do not exceed 256... > > > > Any more ideas to check? > > Thanks. > > > > > > > > Here is part of my solrConfig file: > > - < - > > - > > false > > 1000 > > 1000 > > 2147483647 > > 1 > > 1000 > > 1 > > > > - > > - > > false > > 1000 > > 1000 > > 2147483647 > > 1 > > - > > true > > > > > > > > > > > > > > Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com > (Israel) | w: +972-9-892-1373 | email: [EMAIL PROTECTED] | > > > >
Re: Commit in solr 1.3 can take up to 5 minutes
On Sat, Oct 4, 2008 at 9:35 AM, Michael McCandless <[EMAIL PROTECTED]> wrote: > So it seems like fsync with ZFS can be very slow? The other user that appears to have a commit issue is on Win64. http://www.nabble.com/*Very*-slow-Commit-after-upgrading-to-solr-1.3-td19720792.html#a19720792 -Yonik
Re: Commit in solr 1.3 can take up to 5 minutes
A "Opening Server" is always happening directly after "start commit" with no delay. But I can see many {commit=} with QTime around 280.000 (4 and a half minutes) One difference I could see to your logging is that I have waitFlush=true. Could that have this impact? Uwe On Sat, Oct 4, 2008 at 4:36 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Fri, Oct 3, 2008 at 2:28 PM, Michael McCandless > <[EMAIL PROTECTED]> wrote: > > Yonik, when Solr commits what does it actually do? > > Less than it used to (Solr now uses Lucene to handle deletes). > A solr-level commit closes the IndexWriter, calls some configured > callbacks, opens a new IndexSearcher, warms it, and registers it. > > We can tell where the time is taken by looking at the timestamps in > the log entries. Here is what the log output should look like for a > commit: > > INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true) > // close the index writer > // call any configured post-commit callbacks (to take a snapshot if > the index, etc). > // open a new IndexSearcher (uses IndexReader.reopen() of the last > opened reader) > INFO: Opening [EMAIL PROTECTED] main > INFO: end_commit_flush > // in a different thread, warming of the new IndexSearcher will be done. > // by default, the solr-level commit will wait for warming to be > done and the new searcher > // to be registered (i.e. any new searches will see the committed changes) > INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main [...] > // there will be multiple autowarming statements, and some could > appear before the > // end_commit_flush log entry because it's being done in another thread. > INFO: [] Registered new searcher [EMAIL PROTECTED] main > INFO: Closing [EMAIL PROTECTED] main > INFO: {commit=} 0 547 > INFO: [] webapp=/solr path=/update params={} status=0 QTime=547 > > Uwe, can you verify that the bulk of the time is between "start > commit" and "Opening Searcher"? > > -Yonik >
Re: RequestHandler that passes along the query
Sorry for the extended question, but I am having trouble making SearchComponent that can actually get at the returned response in a distributed setup. In my distributedProcess: public int distributedProcess(ResponseBuilder rb) throws IOException { How can I get at the returned results from all shards? I want to get at really the rendered response right before it goes back to the client so I can add some information based on what came back. The TermVector example seems to get at rb.resultIds (which is not public and I can't use in my plugin) and then sends a request back to the shards to get the stored fields (using ShardDoc.id, another field I don't have access to.) Instead of doing all of that I'd like to just "peek" into the response that is about to be written to the client. I tried getting at rb.rsp but the data is not filled in during the last stage (GET_FIELDS) that distributedProcess gets called for. On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > Thanks grant and ryan, so far so good. But I am confused about one thing - > when I set this up like: > > public void process(ResponseBuilder rb) throws IOException { > > And put it as the last-component on a distributed search (a defaults shard > is defined in the solrconfig for the handler), the component never does its > thing. I looked at the TermVectorComponent implementation and it instead > defines > > public int distributedProcess(ResponseBuilder rb) throws IOException { > > And when I implemented that method it works. Is there a way to define just > one method that will work with both distributed and normal searches? > > > > On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > >> No need to even write a new ReqHandler if you're using 1.3: >> http://wiki.apache.org/solr/SearchComponent >> >
Re: Commit in solr 1.3 can take up to 5 minutes
On Sat, Oct 4, 2008 at 11:55 AM, Uwe Klosa <[EMAIL PROTECTED]> wrote: > A "Opening Server" is always happening directly after "start commit" with no > delay. Ah, so it doesn't look like it's the close of the IndexWriter then! When do you see the "end_commit_flush"? Could you post everything in your log between when the commit begins and when it ends? Is this a live server (is query traffic continuing to come in while the commit is happening?) If so, it would be interesting to see (and easier to debug) if it happened on a server with no query traffic. > But I can see many {commit=} with QTime around 280.000 (4 and a half > minutes) > One difference I could see to your logging is that I have waitFlush=true. > Could that have this impact? These parameters (waitFlush/waitSearcher) won't affect how long it takes to get the new searcher registered, but does affect at what point control is returned to the caller (and hence when you see the response). If waitSearcher==false, then you see the response before searcher warming, otherwise it blocks until after. waitFlush==false is not currently supported (it will always act as true), so your change of that doesn't matter. -Yonik
Re: RequestHandler that passes along the query
I'm not totally on top of how distributed components work, but check: http://wiki.apache.org/solr/WritingDistributedSearchComponents and: https://issues.apache.org/jira/browse/SOLR-680 Do you want each of the shards to append values? or just the final result? If appending the values is not a big resource hog, it may make sense to only do that in the main "process" block. If that is the case, I *think* you just implement: process(ResponseBuilder rb) ryan On Oct 4, 2008, at 1:06 PM, Brian Whitman wrote: Sorry for the extended question, but I am having trouble making SearchComponent that can actually get at the returned response in a distributed setup. In my distributedProcess: public int distributedProcess(ResponseBuilder rb) throws IOException { How can I get at the returned results from all shards? I want to get at really the rendered response right before it goes back to the client so I can add some information based on what came back. The TermVector example seems to get at rb.resultIds (which is not public and I can't use in my plugin) and then sends a request back to the shards to get the stored fields (using ShardDoc.id, another field I don't have access to.) Instead of doing all of that I'd like to just "peek" into the response that is about to be written to the client. I tried getting at rb.rsp but the data is not filled in during the last stage (GET_FIELDS) that distributedProcess gets called for. On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: Thanks grant and ryan, so far so good. But I am confused about one thing - when I set this up like: public void process(ResponseBuilder rb) throws IOException { And put it as the last-component on a distributed search (a defaults shard is defined in the solrconfig for the handler), the component never does its thing. I looked at the TermVectorComponent implementation and it instead defines public int distributedProcess(ResponseBuilder rb) throws IOException { And when I implemented that method it works. Is there a way to define just one method that will work with both distributed and normal searches? On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: No need to even write a new ReqHandler if you're using 1.3: http://wiki.apache.org/solr/SearchComponent
Re: RequestHandler that passes along the query
The issue I think is that process() is never called in my component, just distributedProcess. The server that hosts the component is a separate solr instance from the shards, so my guess is process() is only called when that particular solr instance has something to do with the index. distributedProcess() is called for each of those stages, but the last stage it is called for is GET_FIELDS. But the WritingDistributedSearchComponents page did tip me off to a new function, finishStage, that is called *after* each stage is done and does exactly what I want: @Override public void finishStage(ResponseBuilder rb) { if(rb.stage == ResponseBuilder.STAGE_GET_FIELDS) { SolrDocumentList sd = (SolrDocumentList) rb.rsp.getValues().get( "response"); for (SolrDocument d : sd) { rb.rsp.add("second-id-list", d.getFieldValue("id").toString()); } } } On Sat, Oct 4, 2008 at 1:37 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > I'm not totally on top of how distributed components work, but check: > http://wiki.apache.org/solr/WritingDistributedSearchComponents > > and: > https://issues.apache.org/jira/browse/SOLR-680 > > Do you want each of the shards to append values? or just the final result? > If appending the values is not a big resource hog, it may make sense to > only do that in the main "process" block. If that is the case, I *think* > you just implement: process(ResponseBuilder rb) > > ryan > > > > On Oct 4, 2008, at 1:06 PM, Brian Whitman wrote: > > Sorry for the extended question, but I am having trouble making >> SearchComponent that can actually get at the returned response in a >> distributed setup. >> In my distributedProcess: >> >> public int distributedProcess(ResponseBuilder rb) throws IOException { >> >> How can I get at the returned results from all shards? I want to get at >> really the rendered response right before it goes back to the client so I >> can add some information based on what came back. >> >> The TermVector example seems to get at rb.resultIds (which is not public >> and >> I can't use in my plugin) and then sends a request back to the shards to >> get >> the stored fields (using ShardDoc.id, another field I don't have access >> to.) >> Instead of doing all of that I'd like to just "peek" into the response >> that >> is about to be written to the client. >> >> I tried getting at rb.rsp but the data is not filled in during the last >> stage (GET_FIELDS) that distributedProcess gets called for. >> >> >> >> On Sat, Oct 4, 2008 at 10:12 AM, Brian Whitman <[EMAIL PROTECTED]> >> wrote: >> >> Thanks grant and ryan, so far so good. But I am confused about one thing >>> - >>> when I set this up like: >>> >>> public void process(ResponseBuilder rb) throws IOException { >>> >>> And put it as the last-component on a distributed search (a defaults >>> shard >>> is defined in the solrconfig for the handler), the component never does >>> its >>> thing. I looked at the TermVectorComponent implementation and it instead >>> defines >>> >>> public int distributedProcess(ResponseBuilder rb) throws IOException { >>> >>> And when I implemented that method it works. Is there a way to define >>> just >>> one method that will work with both distributed and normal searches? >>> >>> >>> >>> On Fri, Oct 3, 2008 at 4:41 PM, Grant Ingersoll <[EMAIL PROTECTED] >>> >wrote: >>> >>> No need to even write a new ReqHandler if you're using 1.3: http://wiki.apache.org/solr/SearchComponent >>> >