Re: paging size in SOLR
speaking about pagesizes, what is the optimum page size that should be retrieved each time ?? i understand it depends upon the data you are fetching back fromeach hit document ... but lets say when ever a document is hit am fetching back 100 bytes worth data from each of those docs in indexes (along with solr response statements ) . this will make 100*x bytes worth data in each page if x is the page size .. what is the optimum value of this x that solr can return each time without going into exceptions On 13 August 2011 19:59, Erick Erickson wrote: > Jame: > > You control the number via settings in solrconfig.xml, so it's > up to you. > > Jonathan: > Hmmm, that's seems right, after all the "deep paging" penalty is really > about keeping a large sorted array in memory but at least you only > pay it once per 10,000, rather than 100 times (assuming page size is > 100)... > > Best > Erick > > On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet > wrote: > > when you say queryResultCache, does it only cache n number of result for > the > > last one query or more than one queries? > > > > > > On 10 August 2011 20:14, simon wrote: > > > >> Worth remembering there are some performance penalties with deep > >> paging, if you use the page-by-page approach. may not be too much of a > >> problem if you really are only looking to retrieve 10K docs. > >> > >> -Simon > >> > >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson > >> wrote: > >> > Well, if you really want to you can specify start=0 and rows=1 and > >> > get them all back at once. > >> > > >> > You can do page-by-page by incrementing the "start" parameter as you > >> > indicated. > >> > > >> > You can keep from re-executing the search by setting your > >> queryResultCache > >> > appropriately, but this affects all searches so might be an issue. > >> > > >> > Best > >> > Erick > >> > > >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet > >> wrote: > >> >> hi, > >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my > page > >> size > >> >> is 1000 . > >> >> how do i get back the data (pages) one after other ?do i have to > >> increment > >> >> the "start" value each time by the page size from 0 and do the > iteration > >> ? > >> >> In this case am i querying the index 10 time instead of one or after > >> first > >> >> query the result will be cached somewhere for the subsequent pages ? > >> >> > >> >> > >> >> JAME VAALET > >> >> > >> > > >> > > > > > > > > -- > > > > -JAME > > > -- -JAME
Re: paging size in SOLR
There isn't an "optimum" page size that I know of, it'll vary with lots of stuff, not the least of which is whatever servlet container limits there are. But I suspect you can get quite a few (1000s) without too much problem, and you can always use the JSON response writer to pack in more pages with less overhead. You pretty much have to try it and see. Best Erick On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet wrote: > speaking about pagesizes, what is the optimum page size that should be > retrieved each time ?? > i understand it depends upon the data you are fetching back fromeach hit > document ... but lets say when ever a document is hit am fetching back 100 > bytes worth data from each of those docs in indexes (along with solr > response statements ) . > this will make 100*x bytes worth data in each page if x is the page size .. > what is the optimum value of this x that solr can return each time without > going into exceptions > > On 13 August 2011 19:59, Erick Erickson wrote: > >> Jame: >> >> You control the number via settings in solrconfig.xml, so it's >> up to you. >> >> Jonathan: >> Hmmm, that's seems right, after all the "deep paging" penalty is really >> about keeping a large sorted array in memory but at least you only >> pay it once per 10,000, rather than 100 times (assuming page size is >> 100)... >> >> Best >> Erick >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet >> wrote: >> > when you say queryResultCache, does it only cache n number of result for >> the >> > last one query or more than one queries? >> > >> > >> > On 10 August 2011 20:14, simon wrote: >> > >> >> Worth remembering there are some performance penalties with deep >> >> paging, if you use the page-by-page approach. may not be too much of a >> >> problem if you really are only looking to retrieve 10K docs. >> >> >> >> -Simon >> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson >> >> wrote: >> >> > Well, if you really want to you can specify start=0 and rows=1 and >> >> > get them all back at once. >> >> > >> >> > You can do page-by-page by incrementing the "start" parameter as you >> >> > indicated. >> >> > >> >> > You can keep from re-executing the search by setting your >> >> queryResultCache >> >> > appropriately, but this affects all searches so might be an issue. >> >> > >> >> > Best >> >> > Erick >> >> > >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet >> >> wrote: >> >> >> hi, >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my >> page >> >> size >> >> >> is 1000 . >> >> >> how do i get back the data (pages) one after other ?do i have to >> >> increment >> >> >> the "start" value each time by the page size from 0 and do the >> iteration >> >> ? >> >> >> In this case am i querying the index 10 time instead of one or after >> >> first >> >> >> query the result will be cached somewhere for the subsequent pages ? >> >> >> >> >> >> >> >> >> JAME VAALET >> >> >> >> >> > >> >> >> > >> > >> > >> > -- >> > >> > -JAME >> > >> > > > > -- > > -JAME >
Re: Not update on duplicate key
Though I think you could get the Dedupe feature to do this: http://wiki.apache.org/solr/Deduplication On Aug 13, 2011, at 11:52 , Erick Erickson wrote: > If you mean just throw the new document on the floor if > the index already contains a document with that key, I don't > think you can do that. You could write a custom updateHandler > that checks first to see whether the particular uniqueKey is > in the index I suppose... > > Best > Erick > > On Fri, Aug 12, 2011 at 7:31 AM, Rohit wrote: >> Hi All, >> >> >> >> Please correct me if I am wrong, but when I am trying to insert a document >> into Solr which was previously index, it overwrites the current key. >> >> >> >> Is there a way to change the behaviour, >> >> >> >> 1. I don't want Solr to override but on the other hand it should ignore the >> entry >> >> 2. Also, if I could change the behaviour on the fly, update based on a flag >> and ignore on another flag. >> >> >> >> >> >> Thanks and Regards, >> >> Rohit >> >> >> >>
Re: paging size in SOLR
thanks erick ... that means it depends upon the memory allocated to the JVM . going back queryCacheResults factor i have got this doubt .. say, i have got 10 threads with 10 different queries ..and each of them in parallel are searching the same index with millions of docs in it (multisharded ) . now each of the queries have large number of results in it hence got to page them all.. which all thread's (query ) result-set will be cached ? so that subsequent pages can be retrieved quickly ..? On 14 August 2011 17:40, Erick Erickson wrote: > There isn't an "optimum" page size that I know of, it'll vary with lots of > stuff, not the least of which is whatever servlet container limits there > are. > > But I suspect you can get quite a few (1000s) without > too much problem, and you can always use the JSON response > writer to pack in more pages with less overhead. > > You pretty much have to try it and see. > > Best > Erick > > On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet wrote: > > speaking about pagesizes, what is the optimum page size that should be > > retrieved each time ?? > > i understand it depends upon the data you are fetching back fromeach hit > > document ... but lets say when ever a document is hit am fetching back > 100 > > bytes worth data from each of those docs in indexes (along with solr > > response statements ) . > > this will make 100*x bytes worth data in each page if x is the page size > .. > > what is the optimum value of this x that solr can return each time > without > > going into exceptions > > > > On 13 August 2011 19:59, Erick Erickson wrote: > > > >> Jame: > >> > >> You control the number via settings in solrconfig.xml, so it's > >> up to you. > >> > >> Jonathan: > >> Hmmm, that's seems right, after all the "deep paging" penalty is really > >> about keeping a large sorted array in memory but at least you only > >> pay it once per 10,000, rather than 100 times (assuming page size is > >> 100)... > >> > >> Best > >> Erick > >> > >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet > >> wrote: > >> > when you say queryResultCache, does it only cache n number of result > for > >> the > >> > last one query or more than one queries? > >> > > >> > > >> > On 10 August 2011 20:14, simon wrote: > >> > > >> >> Worth remembering there are some performance penalties with deep > >> >> paging, if you use the page-by-page approach. may not be too much of > a > >> >> problem if you really are only looking to retrieve 10K docs. > >> >> > >> >> -Simon > >> >> > >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson > >> >> wrote: > >> >> > Well, if you really want to you can specify start=0 and rows=1 > and > >> >> > get them all back at once. > >> >> > > >> >> > You can do page-by-page by incrementing the "start" parameter as > you > >> >> > indicated. > >> >> > > >> >> > You can keep from re-executing the search by setting your > >> >> queryResultCache > >> >> > appropriately, but this affects all searches so might be an issue. > >> >> > > >> >> > Best > >> >> > Erick > >> >> > > >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet > > >> >> wrote: > >> >> >> hi, > >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my > >> page > >> >> size > >> >> >> is 1000 . > >> >> >> how do i get back the data (pages) one after other ?do i have to > >> >> increment > >> >> >> the "start" value each time by the page size from 0 and do the > >> iteration > >> >> ? > >> >> >> In this case am i querying the index 10 time instead of one or > after > >> >> first > >> >> >> query the result will be cached somewhere for the subsequent pages > ? > >> >> >> > >> >> >> > >> >> >> JAME VAALET > >> >> >> > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > -JAME > >> > > >> > > > > > > > > -- > > > > -JAME > > > -- -JAME
Re: paging size in SOLR
As many results will be cached as you ask. See solrconfig.xml, the queryResultCache. This cache is essentially a map of queries and result document IDs. The number of doc IDs cached for each query is controlled by queryResultWindowSize in solrconfig.xml Best Erick On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet wrote: > thanks erick ... that means it depends upon the memory allocated to the JVM > . > > going back queryCacheResults factor i have got this doubt .. > say, i have got 10 threads with 10 different queries ..and each of them in > parallel are searching the same index with millions of docs in it > (multisharded ) . > now each of the queries have large number of results in it hence got to page > them all.. > which all thread's (query ) result-set will be cached ? so that subsequent > pages can be retrieved quickly ..? > > On 14 August 2011 17:40, Erick Erickson wrote: > >> There isn't an "optimum" page size that I know of, it'll vary with lots of >> stuff, not the least of which is whatever servlet container limits there >> are. >> >> But I suspect you can get quite a few (1000s) without >> too much problem, and you can always use the JSON response >> writer to pack in more pages with less overhead. >> >> You pretty much have to try it and see. >> >> Best >> Erick >> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet wrote: >> > speaking about pagesizes, what is the optimum page size that should be >> > retrieved each time ?? >> > i understand it depends upon the data you are fetching back fromeach hit >> > document ... but lets say when ever a document is hit am fetching back >> 100 >> > bytes worth data from each of those docs in indexes (along with solr >> > response statements ) . >> > this will make 100*x bytes worth data in each page if x is the page size >> .. >> > what is the optimum value of this x that solr can return each time >> without >> > going into exceptions >> > >> > On 13 August 2011 19:59, Erick Erickson wrote: >> > >> >> Jame: >> >> >> >> You control the number via settings in solrconfig.xml, so it's >> >> up to you. >> >> >> >> Jonathan: >> >> Hmmm, that's seems right, after all the "deep paging" penalty is really >> >> about keeping a large sorted array in memory but at least you only >> >> pay it once per 10,000, rather than 100 times (assuming page size is >> >> 100)... >> >> >> >> Best >> >> Erick >> >> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet >> >> wrote: >> >> > when you say queryResultCache, does it only cache n number of result >> for >> >> the >> >> > last one query or more than one queries? >> >> > >> >> > >> >> > On 10 August 2011 20:14, simon wrote: >> >> > >> >> >> Worth remembering there are some performance penalties with deep >> >> >> paging, if you use the page-by-page approach. may not be too much of >> a >> >> >> problem if you really are only looking to retrieve 10K docs. >> >> >> >> >> >> -Simon >> >> >> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson >> >> >> wrote: >> >> >> > Well, if you really want to you can specify start=0 and rows=1 >> and >> >> >> > get them all back at once. >> >> >> > >> >> >> > You can do page-by-page by incrementing the "start" parameter as >> you >> >> >> > indicated. >> >> >> > >> >> >> > You can keep from re-executing the search by setting your >> >> >> queryResultCache >> >> >> > appropriately, but this affects all searches so might be an issue. >> >> >> > >> >> >> > Best >> >> >> > Erick >> >> >> > >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet > > >> >> >> wrote: >> >> >> >> hi, >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and my >> >> page >> >> >> size >> >> >> >> is 1000 . >> >> >> >> how do i get back the data (pages) one after other ?do i have to >> >> >> increment >> >> >> >> the "start" value each time by the page size from 0 and do the >> >> iteration >> >> >> ? >> >> >> >> In this case am i querying the index 10 time instead of one or >> after >> >> >> first >> >> >> >> query the result will be cached somewhere for the subsequent pages >> ? >> >> >> >> >> >> >> >> >> >> >> >> JAME VAALET >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > -JAME >> >> > >> >> >> > >> > >> > >> > -- >> > >> > -JAME >> > >> > > > > -- > > -JAME >
Re: exceeded limit of maxWarmingSearchers ERROR
You either have to go to near real time (NRT), which is under development, but not committed to trunk yet or just stop warming up searchers and let the first user to open a searcher pay the penalty for warmup, (useColdSearchers as I remember). Although I'd also ask whether this is a reasonable requirement, that the messages be searchable within milliseconds. Is 1 minute really too much time? 5 minutes? You can estimate the minimum time you can get away with by looking at the warmup times on the admin/stats page. Best Erick On Sat, Aug 13, 2011 at 9:47 PM, Naveen Gupta wrote: > Hi, > > Most of the settings are default. > > We have single node (Memory 1 GB, Index Size 4GB) > > We have a requirement where we are doing very fast commit. This is kind of > real time requirement where we are polling many threads from third party and > indexes into our system. > > We want these results to be available soon. > > We are committing for each user (may have 10k threads and inside that 1 > thread may have 10 messages). So overall documents per user will be having > around .1 million (10) > > Earlier we were using commit Within as 10 milliseconds inside the document, > but that was slowing the indexing and we were not getting any error. > > As we removed the commit Within, indexing became very fast. But after that > we started experiencing in the system > > As i read many forums, everybody told that this is happening because of very > fast commit rate, but what is the solution for our problem? > > We are using CURL to post the data and commit > > Also till now we are using default solrconfig. > > Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. > exceeded limit of maxWarmingSearchers=2, try again later. > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) > at > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > at java.lang.Thread.run(Thread.java:662) >
Re: paging size in SOLR
my queryResultCache size =0 and queryResultWindowSize =50 does this mean that am not caching any results ? On 14 August 2011 18:27, Erick Erickson wrote: > As many results will be cached as you ask. See solrconfig.xml, > the queryResultCache. This cache is essentially a map of queries > and result document IDs. The number of doc IDs cached for > each query is controlled by queryResultWindowSize in > solrconfig.xml > > Best > Erick > > On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet wrote: > > thanks erick ... that means it depends upon the memory allocated to the > JVM > > . > > > > going back queryCacheResults factor i have got this doubt .. > > say, i have got 10 threads with 10 different queries ..and each of them > in > > parallel are searching the same index with millions of docs in it > > (multisharded ) . > > now each of the queries have large number of results in it hence got to > page > > them all.. > > which all thread's (query ) result-set will be cached ? so that > subsequent > > pages can be retrieved quickly ..? > > > > On 14 August 2011 17:40, Erick Erickson wrote: > > > >> There isn't an "optimum" page size that I know of, it'll vary with lots > of > >> stuff, not the least of which is whatever servlet container limits there > >> are. > >> > >> But I suspect you can get quite a few (1000s) without > >> too much problem, and you can always use the JSON response > >> writer to pack in more pages with less overhead. > >> > >> You pretty much have to try it and see. > >> > >> Best > >> Erick > >> > >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet > wrote: > >> > speaking about pagesizes, what is the optimum page size that should be > >> > retrieved each time ?? > >> > i understand it depends upon the data you are fetching back fromeach > hit > >> > document ... but lets say when ever a document is hit am fetching back > >> 100 > >> > bytes worth data from each of those docs in indexes (along with solr > >> > response statements ) . > >> > this will make 100*x bytes worth data in each page if x is the page > size > >> .. > >> > what is the optimum value of this x that solr can return each time > >> without > >> > going into exceptions > >> > > >> > On 13 August 2011 19:59, Erick Erickson > wrote: > >> > > >> >> Jame: > >> >> > >> >> You control the number via settings in solrconfig.xml, so it's > >> >> up to you. > >> >> > >> >> Jonathan: > >> >> Hmmm, that's seems right, after all the "deep paging" penalty is > really > >> >> about keeping a large sorted array in memory but at least you > only > >> >> pay it once per 10,000, rather than 100 times (assuming page size is > >> >> 100)... > >> >> > >> >> Best > >> >> Erick > >> >> > >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet > >> >> wrote: > >> >> > when you say queryResultCache, does it only cache n number of > result > >> for > >> >> the > >> >> > last one query or more than one queries? > >> >> > > >> >> > > >> >> > On 10 August 2011 20:14, simon wrote: > >> >> > > >> >> >> Worth remembering there are some performance penalties with deep > >> >> >> paging, if you use the page-by-page approach. may not be too much > of > >> a > >> >> >> problem if you really are only looking to retrieve 10K docs. > >> >> >> > >> >> >> -Simon > >> >> >> > >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson > >> >> >> wrote: > >> >> >> > Well, if you really want to you can specify start=0 and > rows=1 > >> and > >> >> >> > get them all back at once. > >> >> >> > > >> >> >> > You can do page-by-page by incrementing the "start" parameter as > >> you > >> >> >> > indicated. > >> >> >> > > >> >> >> > You can keep from re-executing the search by setting your > >> >> >> queryResultCache > >> >> >> > appropriately, but this affects all searches so might be an > issue. > >> >> >> > > >> >> >> > Best > >> >> >> > Erick > >> >> >> > > >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet < > jamevaa...@gmail.com > >> > > >> >> >> wrote: > >> >> >> >> hi, > >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and > my > >> >> page > >> >> >> size > >> >> >> >> is 1000 . > >> >> >> >> how do i get back the data (pages) one after other ?do i have > to > >> >> >> increment > >> >> >> >> the "start" value each time by the page size from 0 and do the > >> >> iteration > >> >> >> ? > >> >> >> >> In this case am i querying the index 10 time instead of one or > >> after > >> >> >> first > >> >> >> >> query the result will be cached somewhere for the subsequent > pages > >> ? > >> >> >> >> > >> >> >> >> > >> >> >> >> JAME VAALET > >> >> >> >> > >> >> >> > > >> >> >> > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > > >> >> > -JAME > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > -JAME > >> > > >> > > > > > > > > -- > > > > -JAME > > > -- -JAME
Re: exceeded limit of maxWarmingSearchers ERROR
On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: > You either have to go to near real time (NRT), which is under > development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com
Results Group-By using SolrJ
Hi All, I am trying to group by results using SolrJ. According to https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so I upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for grouping in QueryResponse, which is getGroupResponse(). The only thing left that I don't understand is where do I set on which field to group. There is no method that looks like it does so on the SolrQuery object.. Ideas anyone ? thanks, Omri
Re: exceeded limit of maxWarmingSearchers ERROR
Naveen: You should try NRT with Apache Solr 3.3 and RankingAlgorithm. You can update 10,000 documents / sec while also concurrently searching. You can set commit freq to about 15 mins or as desired. The 10,000 document update performance is with the MBArtists index on a dual core Linux system. So you may be able to see similar performance on your system. You can get more details of the NRT implementation from here: http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x You can download Apache Solr 3.3 with RankingAlgorithm from here: http://solr-ra.tgels.org/ (There are no changes to your existing setup, everything should work as earlier except for adding the tag to your solrconfig.xml) Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/13/2011 6:47 PM, Naveen Gupta wrote: Hi, Most of the settings are default. We have single node (Memory 1 GB, Index Size 4GB) We have a requirement where we are doing very fast commit. This is kind of real time requirement where we are polling many threads from third party and indexes into our system. We want these results to be available soon. We are committing for each user (may have 10k threads and inside that 1 thread may have 10 messages). So overall documents per user will be having around .1 million (10) Earlier we were using commit Within as 10 milliseconds inside the document, but that was slowing the indexing and we were not getting any error. As we removed the commit Within, indexing became very fast. But after that we started experiencing in the system As i read many forums, everybody told that this is happening because of very fast commit rate, but what is the solution for our problem? We are using CURL to post the data and commit Also till now we are using default solrconfig. Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: Results Group-By using SolrJ
Hi Omri, SOLR-2637 was concerned with adding grouped response parsing. There is no convenience method for grouping, but you can use the normal SolrQuery#set(...) methods to enable grouping. The following code should enable grouping via SolrJ api: SolrQuery query = new SolrQuery(); query.set(GroupParams.GROUP, true); query.set(GroupParams.GROUP_FIELD, "your_field"); Martijn On 14 August 2011 16:53, Omri Cohen wrote: > Hi All, > > I am trying to group by results using SolrJ. According to > https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, so > I > upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for > grouping in QueryResponse, which is getGroupResponse(). The only thing left > that I don't understand is where do I set on which field to group. There is > no method that looks like it does so on the SolrQuery object.. > > Ideas anyone ? > > thanks, > Omri > -- Met vriendelijke groet, Martijn van Groningen
Re: Results Group-By using SolrJ
Thanks a lot!! Exactly what I was looking for.. That solved this! On Sun, Aug 14, 2011 at 6:23 PM, Martijn v Groningen < martijn.v.gronin...@gmail.com> wrote: > Hi Omri, > > SOLR-2637 was concerned with adding grouped response parsing. There is no > convenience method for grouping, but you can use the normal > SolrQuery#set(...) methods to enable grouping. > The following code should enable grouping via SolrJ api: > SolrQuery query = new SolrQuery(); > query.set(GroupParams.GROUP, true); > query.set(GroupParams.GROUP_FIELD, "your_field"); > > Martijn > > On 14 August 2011 16:53, Omri Cohen wrote: > > > Hi All, > > > > I am trying to group by results using SolrJ. According to > > https://issues.apache.org/jira/browse/SOLR-2637 the feature was added, > so > > I > > upgraded to SolrJ-3.4-Snapshot and I can see the necessary method for > > grouping in QueryResponse, which is getGroupResponse(). The only thing > left > > that I don't understand is where do I set on which field to group. There > is > > no method that looks like it does so on the SolrQuery object.. > > > > Ideas anyone ? > > > > thanks, > > Omri > > > > > > -- > Met vriendelijke groet, > > Martijn van Groningen >
Re: solr-ruby: Error undefined method `closed?' for nil:NilClass
It is nothing special - just like this: conn = Solr::Connection.new("http://#{LOCAL_SHARD}";, {:timeout => 1000, :autocommit => :on}) options[:shards] = HA_SHARDS response = conn.query(query, options) Where LOCAL_SHARD points to a haproxy of a single shard and HA_SHARDS is an array of 18 shards (via haproxy). Ian. On Mon, Aug 8, 2011 at 12:50 PM, Erik Hatcher wrote: > Ian - > > What does your solr-ruby using code look like? > > Solr::Connection is light-weight, so you could just construct a new one of > those for each request. Are you keeping an instance around? > >Erik > > > On Aug 8, 2011, at 12:03 , Ian Connor wrote: > > > Hi, > > > > I have seen some of these errors come through from time to time. It looks > > like: > > > > /usr/lib/ruby/1.8/net/http.rb:1060:in > > `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post' > > > > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in > > `post' > > > > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in > > `send' > > > > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in > > `create_and_send_query' > > > > /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in > > `query' > > > > It is as if the http object has gone away. Would it be good to create a > new > > one inside of the connection or is something more serious going on? > > ubuntu 10.04 > > passenger 3.0.8 > > rails 2.3.11 > > > > -- > > Regards, > > > > Ian Connor > > -- Regards, Ian Connor 1 Leighton St #723 Cambridge, MA 02141 Call Center Phone: +1 (714) 239 3875 (24 hrs) Fax: +1(770) 818 5697 Skype: ian.connor
Re: paging size in SOLR
Yep. ResultWindowSize in >> solrconfig.xml >> >> Best >> Erick >> >> On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet wrote: >> > thanks erick ... that means it depends upon the memory allocated to the >> JVM >> > . >> > >> > going back queryCacheResults factor i have got this doubt .. >> > say, i have got 10 threads with 10 different queries ..and each of them >> in >> > parallel are searching the same index with millions of docs in it >> > (multisharded ) . >> > now each of the queries have large number of results in it hence got to >> page >> > them all.. >> > which all thread's (query ) result-set will be cached ? so that >> subsequent >> > pages can be retrieved quickly ..? >> > >> > On 14 August 2011 17:40, Erick Erickson wrote: >> > >> >> There isn't an "optimum" page size that I know of, it'll vary with lots >> of >> >> stuff, not the least of which is whatever servlet container limits there >> >> are. >> >> >> >> But I suspect you can get quite a few (1000s) without >> >> too much problem, and you can always use the JSON response >> >> writer to pack in more pages with less overhead. >> >> >> >> You pretty much have to try it and see. >> >> >> >> Best >> >> Erick >> >> >> >> On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet >> wrote: >> >> > speaking about pagesizes, what is the optimum page size that should be >> >> > retrieved each time ?? >> >> > i understand it depends upon the data you are fetching back fromeach >> hit >> >> > document ... but lets say when ever a document is hit am fetching back >> >> 100 >> >> > bytes worth data from each of those docs in indexes (along with solr >> >> > response statements ) . >> >> > this will make 100*x bytes worth data in each page if x is the page >> size >> >> .. >> >> > what is the optimum value of this x that solr can return each time >> >> without >> >> > going into exceptions >> >> > >> >> > On 13 August 2011 19:59, Erick Erickson >> wrote: >> >> > >> >> >> Jame: >> >> >> >> >> >> You control the number via settings in solrconfig.xml, so it's >> >> >> up to you. >> >> >> >> >> >> Jonathan: >> >> >> Hmmm, that's seems right, after all the "deep paging" penalty is >> really >> >> >> about keeping a large sorted array in memory but at least you >> only >> >> >> pay it once per 10,000, rather than 100 times (assuming page size is >> >> >> 100)... >> >> >> >> >> >> Best >> >> >> Erick >> >> >> >> >> >> On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet >> >> >> wrote: >> >> >> > when you say queryResultCache, does it only cache n number of >> result >> >> for >> >> >> the >> >> >> > last one query or more than one queries? >> >> >> > >> >> >> > >> >> >> > On 10 August 2011 20:14, simon wrote: >> >> >> > >> >> >> >> Worth remembering there are some performance penalties with deep >> >> >> >> paging, if you use the page-by-page approach. may not be too much >> of >> >> a >> >> >> >> problem if you really are only looking to retrieve 10K docs. >> >> >> >> >> >> >> >> -Simon >> >> >> >> >> >> >> >> On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson >> >> >> >> wrote: >> >> >> >> > Well, if you really want to you can specify start=0 and >> rows=1 >> >> and >> >> >> >> > get them all back at once. >> >> >> >> > >> >> >> >> > You can do page-by-page by incrementing the "start" parameter as >> >> you >> >> >> >> > indicated. >> >> >> >> > >> >> >> >> > You can keep from re-executing the search by setting your >> >> >> >> queryResultCache >> >> >> >> > appropriately, but this affects all searches so might be an >> issue. >> >> >> >> > >> >> >> >> > Best >> >> >> >> > Erick >> >> >> >> > >> >> >> >> > On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet < >> jamevaa...@gmail.com >> >> > >> >> >> >> wrote: >> >> >> >> >> hi, >> >> >> >> >> i want to retrieve all the data from solr (say 10,000 ids ) and >> my >> >> >> page >> >> >> >> size >> >> >> >> >> is 1000 . >> >> >> >> >> how do i get back the data (pages) one after other ?do i have >> to >> >> >> >> increment >> >> >> >> >> the "start" value each time by the page size from 0 and do the >> >> >> iteration >> >> >> >> ? >> >> >> >> >> In this case am i querying the index 10 time instead of one or >> >> after >> >> >> >> first >> >> >> >> >> query the result will be cached somewhere for the subsequent >> pages >> >> ? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> JAME VAALET >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > >> >> >> > -JAME >> >> >> > >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > -JAME >> >> > >> >> >> > >> > >> > >> > -- >> > >> > -JAME >> > >> > > > > -- > > -JAME >
Re: exceeded limit of maxWarmingSearchers ERROR
Ah, thanks, Mark... I must have been looking at the wrong JIRAs. Erick On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller wrote: > > On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: > >> You either have to go to near real time (NRT), which is under >> development, but not committed to trunk yet > > NRT support is committed to trunk. > > - Mark Miller > lucidimagination.com > > > > > > > > >
Re: exceeded limit of maxWarmingSearchers ERROR
Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson wrote: > Ah, thanks, Mark... I must have been looking at the wrong JIRAs. > > Erick > > On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller > wrote: > > > > On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: > > > >> You either have to go to near real time (NRT), which is under > >> development, but not committed to trunk yet > > > > NRT support is committed to trunk. > > > > - Mark Miller > > lucidimagination.com > > > > > > > > > > > > > > > > > > >
Re: exceeded limit of maxWarmingSearchers ERROR
It's somewhat confusing - I'll straighten it out though. I left the issue open to keep me from taking forever to doc it - hasn't helped much yet - but maybe later today... On Aug 14, 2011, at 12:12 PM, Erick Erickson wrote: > Ah, thanks, Mark... I must have been looking at the wrong JIRAs. > > Erick > > On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller wrote: >> >> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: >> >>> You either have to go to near real time (NRT), which is under >>> development, but not committed to trunk yet >> >> NRT support is committed to trunk. >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> - Mark Miller lucidimagination.com
Re: Some questions about SolrJ
On 8/13/2011 9:59 AM, Michael Sokolov wrote: Shawn, my experience with SolrJ in that configuration (no autoCommit) is that you have control over commits: if you don't issue an explicit commit, it won't happen. Re lifecycle: we don't use a static instance; rather our app maintains a small pool of CommonsHttpSolrServer instances that we re-use across requests. I think that will be preferable since I don't think the underlying HttpClient is thread safe? Hmm, I just checked and actually CommonsHttpSolrServer uses MultiThreadedHttpConnectionManager so it should be thread-safe, and OK to use a static instance as per documentation. Sorry for the misinformation. Thanks for the help! I've been able to muddle my way through part of my implementation on my own. There doesn't seem to be any way to point to the base /solr/ url and then ask SolrJ to add a core when creating requests. I did see that you can set the URL for the server object after it's created, but if I ever make this thing multithreaded, I fear doing so will cause problems. I'm going with one server object (solrServer) for CoreAdmin and another object (solrCore) for requests against the core. This new build system has an object representing one complete index, which uses a container of seven objects representing each of the shards. Each of the shard objects has two objects representing a build core and a live core. Each of the core objects contains the solrServer and solrCore already mentioned. Since I have two complete indexes, this means that the final product will initialize 56 server objects. I couldn't use static server objects as recommended by the docs, because I have so many instances that all need different URLs. They are private class members that get created only once, so I think it will be OK. A static object would be a good idea for a search application, because it likely only needs to deal with one URL. Our webapp developers told me that they will be putting the server object into a bean in the application context. When I've got everything done and debugged, I will use what I've learned to augment the SolrJ wiki page. Who is the best community person to coordinate with on that to make sure I put up good information? Thanks, Shawn
Re: solr-ruby: Error undefined method `closed?' for nil:NilClass
Does instantiating a Solr::Connection for each request make things better? Erik On Aug 14, 2011, at 11:34 , Ian Connor wrote: > It is nothing special - just like this: > >conn = Solr::Connection.new("http://#{LOCAL_SHARD}";, > {:timeout => 1000, :autocommit => :on}) >options[:shards] = HA_SHARDS >response = conn.query(query, options) > > Where LOCAL_SHARD points to a haproxy of a single shard and HA_SHARDS is an > array of 18 shards (via haproxy). > > Ian. > > On Mon, Aug 8, 2011 at 12:50 PM, Erik Hatcher wrote: > >> Ian - >> >> What does your solr-ruby using code look like? >> >> Solr::Connection is light-weight, so you could just construct a new one of >> those for each request. Are you keeping an instance around? >> >> Erik >> >> >> On Aug 8, 2011, at 12:03 , Ian Connor wrote: >> >>> Hi, >>> >>> I have seen some of these errors come through from time to time. It looks >>> like: >>> >>> /usr/lib/ruby/1.8/net/http.rb:1060:in >>> `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post' >>> >>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in >>> `post' >>> >>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in >>> `send' >>> >>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in >>> `create_and_send_query' >>> >>> /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in >>> `query' >>> >>> It is as if the http object has gone away. Would it be good to create a >> new >>> one inside of the connection or is something more serious going on? >>> ubuntu 10.04 >>> passenger 3.0.8 >>> rails 2.3.11 >>> >>> -- >>> Regards, >>> >>> Ian Connor >> >> > > > -- > Regards, > > Ian Connor > 1 Leighton St #723 > Cambridge, MA 02141 > Call Center Phone: +1 (714) 239 3875 (24 hrs) > Fax: +1(770) 818 5697 > Skype: ian.connor
Re: exceeded limit of maxWarmingSearchers ERROR
It's worth noting that the fast commit rate is only an indirect part of the issue you're seeing. As the error comes from cache warming - a consequence of committing, it's not the fault of commiting directly. It's well worth having a good close look at exactly what you're caches are doing when they are warmed, and trying as much as possible to remove any uneeded facet/field caching etc. The time it takes to repopulate the caches causes the error - if it's slower than the commit rate, you'll get into the 'try again later' spiral. There are a number of ways to help mitigate this - NRT is the certainly the [hopefullly near] future for this. Other strategies include distributed search/cloud/ZK - splitting the index into logical shards, so your commits and their associated caches are smaller and more targeted. You can also use two Solr instances - one optimized for writes/commits, one for reads, (write commits are async of the 'read' instance), plus there are customized solutions like RankingAlgorithm, Zoie etc. On Sun, Aug 14, 2011 at 2:47 AM, Naveen Gupta wrote: > Hi, > > Most of the settings are default. > > We have single node (Memory 1 GB, Index Size 4GB) > > We have a requirement where we are doing very fast commit. This is kind of > real time requirement where we are polling many threads from third party and > indexes into our system. > > We want these results to be available soon. > > We are committing for each user (may have 10k threads and inside that 1 > thread may have 10 messages). So overall documents per user will be having > around .1 million (10) > > Earlier we were using commit Within as 10 milliseconds inside the document, > but that was slowing the indexing and we were not getting any error. > > As we removed the commit Within, indexing became very fast. But after that > we started experiencing in the system > > As i read many forums, everybody told that this is happening because of very > fast commit rate, but what is the solution for our problem? > > We are using CURL to post the data and commit > > Also till now we are using default solrconfig. > > Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. > exceeded limit of maxWarmingSearchers=2, try again later. > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052) > at > org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424) > at > org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) > at > org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) > at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > at > org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) > at > org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) > at > org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) > at java.lang.Thread.run(Thread.java:662) >
Re: exceeded limit of maxWarmingSearchers ERROR
Naveen: NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Ericksonwrote: Ah, thanks, Mark... I must have been looking at the wrong JIRAs. Erick On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller wrote: On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: You either have to go to near real time (NRT), which is under development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com
Loggly support
How do you setup log4j to work with Loggly for SOLR logs? Anyone have this set up? Bill
Re: exceeded limit of maxWarmingSearchers ERROR
OK, I'll ask the elephant in the room. What is the difference between the new UpdateHandler from Mark and the SOLR-RA? The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? Pros/Cons? On 8/14/11 8:10 PM, "Nagendra Nagarajayya" wrote: >Naveen: > >NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a >document to become searchable. Any document that you add through update >becomes immediately searchable. So no need to commit from within your >update client code. Since there is no commit, the cache does not have >to be cleared or the old searchers closed or new searchers opened, and >warmed (error that you are facing). > >Regards > >- Nagendra Nagarajayya >http://solr-ra.tgels.org >http://rankingalgorithm.tgels.org > > > >On 8/14/2011 10:37 AM, Naveen Gupta wrote: >> Hi Mark/Erick/Nagendra, >> >> I was not very confident about NRT at that point of time, when we >>started >> project almost 1 year ago, definitely i would try NRT and see the >> performance. >> >> The current requirement was working fine till we were using >>commitWithin 10 >> millisecs in the XMLDocument which we were posting to SOLR. >> >> But due to which, we were getting very poor performance (almost 3 mins >>for >> 15,000 docs) per user. There are many paraller user committing to our >>SOLR. >> >> So we removed the commitWithin, and hence performance was much much >>better. >> >> But then we are getting this maxWarmingSearcher Error, because we are >> committing separately as a curl request after once entire doc is >>submitted >> for indexing. >> >> The question here is what is difference between commitWithin and commit >> (apart from the fact that commit takes memory and processes and >>additional >> hardware usage) >> >> Why we want it to be visible as soon as possible, since we are applying >>many >> business rules on top of the results (older indexes as well as new one) >>and >> apply different filters. >> >> upto 5 mins is fine for us. but more than that we need to think then >>other >> optimizations. >> >> We will definitely try NRT. But please tell me other options which we >>can >> apply in order to optimize.? >> >> Thanks >> Naveen >> >> >> On Sun, Aug 14, 2011 at 9:42 PM, Erick >>Ericksonwrote: >> >>> Ah, thanks, Mark... I must have been looking at the wrong JIRAs. >>> >>> Erick >>> >>> On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller >>> wrote: On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: > You either have to go to near real time (NRT), which is under > development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com >
Re: exceeded limit of maxWarmingSearchers ERROR
Bill: The technical details of the NRT implementation in Apache Solr with RankingAlgorithm (SOLR-RA) is available here: http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf (Some changes for Solr 3.x, but for most it is as above) Regarding support for 4.0 trunk, should happen sometime soon. Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 7:11 PM, Bill Bell wrote: OK, I'll ask the elephant in the roomŠ. What is the difference between the new UpdateHandler from Mark and the SOLR-RA? The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? Pros/Cons? On 8/14/11 8:10 PM, "Nagendra Nagarajayya" wrote: Naveen: NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Ericksonwrote: Ah, thanks, Mark... I must have been looking at the wrong JIRAs. Erick On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller wrote: On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: You either have to go to near real time (NRT), which is under development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com
Re: Cache replication
OK. But SOLR has built-in caching. Do you not like the caching? What so you think we should change to the SOLR cache? Bill On 8/10/11 9:16 AM, "didier deshommes" wrote: >Consider putting a cache (memcached, redis, etc) *in front* of your >solr slaves. Just make sure to update it when replication occurs. > >didier > >On Tue, Aug 9, 2011 at 6:07 PM, arian487 wrote: >> I'm wondering if the caches on all the slaves are replicated across >>(such as >> queryResultCache). That is to say, if I hit one of my slaves and cache >>a >> result, and I make a search later and that search happens to hit a >>different >> slave, will that first cached result be available for use? >> >> This is pretty important because I'm going to have a lot of slaves and >>if >> this isn't done, then I'd have a high chance of running a lot uncached >> queries. >> >> Thanks :) >> >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Cache-replication-tp3240708p3240708.ht >>ml >> Sent from the Solr - User mailing list archive at Nabble.com. >>
Re: exceeded limit of maxWarmingSearchers ERROR
I understand. Have you looked at Mark's patch? From his performance tests, it looks pretty good. When would RA work better? Bill On 8/14/11 8:40 PM, "Nagendra Nagarajayya" wrote: >Bill: > >The technical details of the NRT implementation in Apache Solr with >RankingAlgorithm (SOLR-RA) is available here: > >http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf > >(Some changes for Solr 3.x, but for most it is as above) > >Regarding support for 4.0 trunk, should happen sometime soon. > >Regards > >- Nagendra Nagarajayya >http://solr-ra.tgels.org >http://rankingalgorithm.tgels.org > > > > > >On 8/14/2011 7:11 PM, Bill Bell wrote: >> OK, >> >> I'll ask the elephant in the roomŠ. >> >> What is the difference between the new UpdateHandler from Mark and the >> SOLR-RA? >> >> The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? >> >> Pros/Cons? >> >> >> On 8/14/11 8:10 PM, "Nagendra >>Nagarajayya" >> wrote: >> >>> Naveen: >>> >>> NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a >>> document to become searchable. Any document that you add through update >>> becomes immediately searchable. So no need to commit from within your >>> update client code. Since there is no commit, the cache does not have >>> to be cleared or the old searchers closed or new searchers opened, and >>> warmed (error that you are facing). >>> >>> Regards >>> >>> - Nagendra Nagarajayya >>> http://solr-ra.tgels.org >>> http://rankingalgorithm.tgels.org >>> >>> >>> >>> On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Ericksonwrote: > Ah, thanks, Mark... I must have been looking at the wrong JIRAs. > > Erick > > On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller > wrote: >> On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: >> >>> You either have to go to near real time (NRT), which is under >>> development, but not committed to trunk yet >> NRT support is committed to trunk. >> >> - Mark Miller >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> >
Re: exceeded limit of maxWarmingSearchers ERROR
Bill: I did look at Marks performance tests. Looks very interesting. Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 7:47 PM, Bill Bell wrote: I understand. Have you looked at Mark's patch? From his performance tests, it looks pretty good. When would RA work better? Bill On 8/14/11 8:40 PM, "Nagendra Nagarajayya" wrote: Bill: The technical details of the NRT implementation in Apache Solr with RankingAlgorithm (SOLR-RA) is available here: http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf (Some changes for Solr 3.x, but for most it is as above) Regarding support for 4.0 trunk, should happen sometime soon. Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 7:11 PM, Bill Bell wrote: OK, I'll ask the elephant in the roomŠ. What is the difference between the new UpdateHandler from Mark and the SOLR-RA? The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? Pros/Cons? On 8/14/11 8:10 PM, "Nagendra Nagarajayya" wrote: Naveen: NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Ericksonwrote: Ah, thanks, Mark... I must have been looking at the wrong JIRAs. Erick On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller wrote: On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: You either have to go to near real time (NRT), which is under development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com
Re: Can Master push data to slave
Regarding point b, i mean that when Slave server does a replication from Master, it creates a lock-file in it's index directory. How to avoid that? On Tue, Aug 9, 2011 at 2:56 AM, Markus Jelsma wrote: > Hi, > > > Hi > > > > I am using Solr 1.4. and doing a replication process where my slave is > > pulling data from Master. I have 2 questions > > > > a. Can Master push data to slave > > Not in current versions. Not sure about exotic patches for this. > > > b. How to make sure that lock file is not created while replication > > What do you mean? > > > > > Please help > > > > thanks > > Pawan >
filtering non english text from my results
Hi All, I am looking for a solution to filter out text which contains non english words. Where my goal is to present my english speaking users with results in their language. any ideas? thanks Omri