Solr on Several Machines Communication Fails
Hi, I' running Solr on 4 machines with the following configuration: Solr Vesrion: 4.3 Solr Cloud Machine 1: running shard 1 with embedded zokeeper - java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar Machine 2: Running shard 2 java -Djetty.port=7574 -DzkHost=shard1_Machine_IP:9983 -jar start.jar Error: ERROR - 2013-06-27 15:18:06.066; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.j ava:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.j ava:119) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) If I run the same 2 shards on one machine it works, I get correct response for queries. Thanks
No date.gap on pivoted facets
Consider the following query: select?q=*:* &facet=true &facet.date=added &facet.date.start=2013-04-01T00:00:00Z &facet.date.end=2013-06-30T00:00:00Z &facet.date.gap=%2b7DAYS &rows=0 &facet.pivot=added,provider In this query, the facet.date.gap is ignored and each individual second in faceted on. The issue remains the same even when reversing the order of the pivot: &facet.pivot=provider,added Is this a Solr bug, or am I pivoting wrong? This is on Solr 4.1.0 running on OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) on Ubuntu Server 12.04. Thank you! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Solr cloud shard goes down when after many broken pipe exceptions
First ClientAbortException comes, which is expected as there is timeout on client side with stack trace as follows Jun 30, 2013 2:24:30 PM org.apache.solr.common.SolrException log SEVERE: null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:369) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:339) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:392) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:381) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:89) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190) at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) at org.apache.solr.util.FastWriter.write(FastWriter.java:55) at org.apache.solr.response.JSONWriter.writeStr(JSONResponseWriter.java:449) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:124) at org.apache.solr.response.JSONWriter.writeSolrDocument(JSONResponseWriter.java:355) at org.apache.solr.response.TextResponseWriter.writeSolrDocumentList(TextResponseWriter.java:222) at org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:184) at org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:183) at org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:299) at org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:95) at org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:60) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:404) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:282) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:756) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:448) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:363) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:780) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:126) at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:593) at org.apache.coyote.Response.doWrite(Response.java:560) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:364) ... 33 more The above exception comes a number of times followed by a shard getting down Jun 30, 2013 2:24:33 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:162) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:135) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.Fu
Distributed search results in "SocketException: Connection reset"
Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone? Thanks, Shahar. Jun 03, 2013 5:27:38 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://127.0.0.1:8210/solr/core_3 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://127.0.0.1:8210/solr/core_3 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ... 1 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:149) at org.apache.http.impl.io.SocketInputBuffer.fillBuffe
Re: No date.gap on pivoted facets
Sorry, but Solr pivot faceting is based solely on "field" facets, not "range" (or "date") facets. You can approximate date gaps by making a copy of your raw date field and then manually "gap" (truncate) the date values so that the their discrete values correspond to your date gap. You can do that with an update processor, or do it before you send the data to Solr. In the next release of my book I have a script for a StatelessScriptUpdateProccessor (with examples) that supports truncation of dates to a desired resolution, copying or modifying the input date as desired. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Sunday, June 30, 2013 5:51 AM To: solr-user@lucene.apache.org Subject: No date.gap on pivoted facets Consider the following query: select?q=*:* &facet=true &facet.date=added &facet.date.start=2013-04-01T00:00:00Z &facet.date.end=2013-06-30T00:00:00Z &facet.date.gap=%2b7DAYS &rows=0 &facet.pivot=added,provider In this query, the facet.date.gap is ignored and each individual second in faceted on. The issue remains the same even when reversing the order of the pivot: &facet.pivot=provider,added Is this a Solr bug, or am I pivoting wrong? This is on Solr 4.1.0 running on OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) on Ubuntu Server 12.04. Thank you! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Improving performance to return 2000+ documents
Thanks Erick/Peter. This is an offline process, used by a relevancy engine implemented around solr. The engine computes boost scores for related keywords based on clickstream data. i.e.: say clickstream has: ipad=upc1,upc2,upc3 I query solr with keyword: "ipad" (to get 2000 documents) and then make 3 individual queries for upc1,upc2,upc3 (which are fast). The data is then used to compute related keywords to "ipad" with their boost values. So, I cannot really replace that, since I need full text search over my dataset to retrieve top 2000 documents. I tried paging: I retrieve 500 solr documents 4 times (0-500, 500-1000...), but don't see any improvements. Some questions: 1. Maybe the JVM size might help? This is what I see in the dashboard: Physical Memory 76.2% Swap Space NaN% (don't have any swap space, running on AWS EBS) File Descriptor Count 4.7% JVM-Memory 73.8% Screenshot: http://i.imgur.com/aegKzP6.png 2. Will reducing the shards from 3 to 1 improve performance? (maybe increase the RAM from 30 to 60GB) The problem I will face in that case will be fitting 50M documents on 1 machine. Thanks, -Utkarsh On Sat, Jun 29, 2013 at 3:58 PM, Peter Sturge wrote: > Hello Utkarsh, > This may or may not be relevant for your use-case, but the way we deal with > this scenario is to retrieve the top N documents 5,10,20or100 at a time > (user selectable). We can then page the results, changing the start > parameter to return the next set. This allows us to 'retrieve' millions of > documents - we just do it at the user's leisure, rather than make them wait > for the whole lot in one go. > This works well because users very rarely want to see ALL 2000 (or whatever > number) documents at one time - it's simply too much to take in at one > time. > If your use-case involves an automated or offline procedure (e.g. running a > report or some data-mining op), then presumably it doesn't matter so much > it takes a bit longer (as long as it returns in some reasonble time). > Have you looked at doing paging on the client-side - this will hugely > speed-up your search time. > HTH > Peter > > > > On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson >wrote: > > > Well, depending on how many docs get served > > from the cache the time will vary. But this is > > just ugly, if you can avoid this use-case it would > > be a Good Thing. > > > > Problem here is that each and every shard must > > assemble the list of 2,000 documents (just ID and > > sort criteria, usually score). > > > > Then the node serving the original request merges > > the sub-lists to pick the top 2,000. Then the node > > sends another request to each shard to get > > the full document. Then the node merges this > > into the full list to return to the user. > > > > Solr really isn't built for this use-case, is it actually > > a compelling situation? > > > > And having your document cache set at 1M is kinda > > high if you have very big documents. > > > > FWIW, > > Erick > > > > > > On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar > >wrote: > > > > > Also, I don't see a consistent response time from solr, I ran ab again > > and > > > I get this: > > > > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > " > > > > > > > > > Benchmarking x.amazonaws.com (be patient) > > > Completed 100 requests > > > Completed 200 requests > > > Completed 300 requests > > > Completed 400 requests > > > Completed 500 requests > > > Finished 500 requests > > > > > > > > > Server Software: > > > Server Hostname: x.amazonaws.com > > > Server Port:8983 > > > > > > Document Path: > > > > > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > Document Length:1538537 bytes > > > > > > Concurrency Level: 10 > > > Time taken for tests: 10.858 seconds > > > Complete requests: 500 > > > Failed requests:8 > > >(Connect: 0, Receive: 0, Length: 8, Exceptions: 0) > > > Write errors: 0 > > > Total transferred: 769297992 bytes > > > HTML transferred: 769268492 bytes > > > Requests per second:46.05 [#/sec] (mean) > > > Time per request: 217.167 [ms] (mean) > > > Time per request: 21.717 [ms] (mean, across all concurrent > > requests) > > > Transfer rate: 69187.90 [Kbytes/sec] received > > > > > > Connection Times (ms) > > > min mean[+/-sd] median max > > > Connect:00 0.3 0 2 > > > Processing: 110 215 72.0190 497 > > > Waiting: 91 180 70.5152 473 > > > Total:112 216 72.0191 497 > > > > > > Percentage of the requests served within a certain time (ms) > > > 50%191 > > > 66%225 > > > 75%252 > > > 80%272 > > > 90%319 > > > 95%364 > > > 98%420 > > > 99%453 > > > 100%497 (longest request) > > > > > > > > > Som
Re: cores sharing an instance
I see. If I wanted to try the second option ("find a place inside solr before the core is created") then where would that place be in the flow of app waking up? Currently what I am doing is each core loads its app caches via a requesthandler (in solrconfig.xml) that initializes the java class that does the loading. For instance: AppCaches So each core has its own so specific cachedResources handler. Where in SOLR would I need to place the AppCaches code to make it visible to all other cores then? thank you Roman On Jun 29, 2013, at 10:58 AM, Roman Chyla wrote: > Cores can be reloaded, they are inside solrcore loader /I forgot the exact > name/, and they will have different classloaders /that's servlet thing/, so > if you want singletons you must load them outside of the core, using a > parent classloader - in case of jetty, this means writing your own jetty > initialization or config to force shared class loaders. or find a place > inside the solr, before the core is created. Google for montysolr to see > the example of the first approach. > > But, unless you really have no other choice, using singletons is IMHO a bad > idea in this case > > Roman > > On 29 Jun 2013 10:18, "Peyman Faratin" wrote: >> >> its the singleton pattern, where in my case i want an object (which is > RAM expensive) to be a centralized coordinator of application logic. >> >> thank you >> >> On Jun 29, 2013, at 1:16 AM, Shalin Shekhar Mangar > wrote: >> >>> There is very little shared between multiple cores (instanceDir paths, >>> logging config maybe?). Why are you trying to do this? >>> >>> On Sat, Jun 29, 2013 at 1:14 AM, Peyman Faratin > wrote: Hi I have a multicore setup (in 4.3.0). Is it possible for one core to > share an instance of its class with other cores at run time? i.e. At run time core 1 makes an instance of object O_i core 1 --> object O_i core 2 --- core n then can core K access O_i? I know they can share properties but is it > possible to share objects? thank you >>> >>> >>> >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>
Re: cores sharing an instance
That is what I had assumed but it appears not to be the case. A class (and its properties) of one core is not visible to another class in another core - in the same JVM. Peyman On Jun 29, 2013, at 1:23 PM, Erick Erickson wrote: > Well, the code is all in the same JVM, so there's no > reason a singleton approach wouldn't work that I > can think of. All the multithreaded caveats apply. > > Best > Erick > > > On Fri, Jun 28, 2013 at 3:44 PM, Peyman Faratin wrote: > >> Hi >> >> I have a multicore setup (in 4.3.0). Is it possible for one core to share >> an instance of its class with other cores at run time? i.e. >> >> At run time core 1 makes an instance of object O_i >> >> core 1 --> object O_i >> core 2 >> --- >> core n >> >> then can core K access O_i? I know they can share properties but is it >> possible to share objects? >> >> thank you >> >>
Re: Distributed search results in "SocketException: Connection reset"
This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: Hi all, We're getting the below exception sporadically when using distributed search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned in the 'shards' parameter. Any ideas anyone? Thanks, Shahar. Jun 03, 2013 5:27:38 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://127.0.0.1:8210/solr/core_3 at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://127.0.0.1:8210/solr/core_3 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ... 1 more Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at org.apache.http.impl.io.Abst
Re: getting different search results for words with same meaning in Japanese language
The MappingCharFilter allows you to map both characters to one characters. If you do this during indexing and querying, searching with one should find the other. This is sort of like synonyms, but on a character-by-character basis. Lance On 06/18/2013 11:08 PM, Yash Sharma wrote: > Hi, > > we have two japanese words with the same meaning ソフトウェア and ソフトウエア (notice > the difference in capital I looking character - word meaning is 'software' > in the english language). When ソフトウェア is searched, it gives around 8 search > results but when ソフトウエア is searched, it gives only 2 search results. > > The japanese translator told that this is something called yugari (which > means that the above words can be seen as authorise and authorize, so they > should yield same search results as they have same meaning but spelled > differently). > > we have one solution to this issue - to use synonyms.txt and place all > these similar words in this text file. This solved our problem to some > extent but, in real time scenario, we do not have all the japanese > technical words like software, product, technology, and so on and we cannot > keep updating synonyms.txt on a daily basis. > > Is there any better solution, so that all the similar japanese words give > same search results ? > Any help is greatly appreciated. >
Re: Improving performance to return 2000+ documents
50M documents, depending on a bunch of things, may not be unreasonable for a single node, only testing will tell. But the question I have is whether you should be using standard Solr queries for this or building a custom component that goes at the base Lucene index and "does the right thing". Or even re-indexing your entire corpus periodically to add this kind of data. FWIW, Erick On Sun, Jun 30, 2013 at 2:00 PM, Utkarsh Sengar wrote: > Thanks Erick/Peter. > > This is an offline process, used by a relevancy engine implemented around > solr. The engine computes boost scores for related keywords based on > clickstream data. > i.e.: say clickstream has: ipad=upc1,upc2,upc3 > I query solr with keyword: "ipad" (to get 2000 documents) and then make 3 > individual queries for upc1,upc2,upc3 (which are fast). > The data is then used to compute related keywords to "ipad" with their > boost values. > > So, I cannot really replace that, since I need full text search over my > dataset to retrieve top 2000 documents. > > I tried paging: I retrieve 500 solr documents 4 times (0-500, 500-1000...), > but don't see any improvements. > > > Some questions: > 1. Maybe the JVM size might help? > This is what I see in the dashboard: > Physical Memory 76.2% > Swap Space NaN% (don't have any swap space, running on AWS EBS) > File Descriptor Count 4.7% > JVM-Memory 73.8% > > Screenshot: http://i.imgur.com/aegKzP6.png > > 2. Will reducing the shards from 3 to 1 improve performance? (maybe > increase the RAM from 30 to 60GB) The problem I will face in that case will > be fitting 50M documents on 1 machine. > > Thanks, > -Utkarsh > > > On Sat, Jun 29, 2013 at 3:58 PM, Peter Sturge >wrote: > > > Hello Utkarsh, > > This may or may not be relevant for your use-case, but the way we deal > with > > this scenario is to retrieve the top N documents 5,10,20or100 at a time > > (user selectable). We can then page the results, changing the start > > parameter to return the next set. This allows us to 'retrieve' millions > of > > documents - we just do it at the user's leisure, rather than make them > wait > > for the whole lot in one go. > > This works well because users very rarely want to see ALL 2000 (or > whatever > > number) documents at one time - it's simply too much to take in at one > > time. > > If your use-case involves an automated or offline procedure (e.g. > running a > > report or some data-mining op), then presumably it doesn't matter so much > > it takes a bit longer (as long as it returns in some reasonble time). > > Have you looked at doing paging on the client-side - this will hugely > > speed-up your search time. > > HTH > > Peter > > > > > > > > On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson > >wrote: > > > > > Well, depending on how many docs get served > > > from the cache the time will vary. But this is > > > just ugly, if you can avoid this use-case it would > > > be a Good Thing. > > > > > > Problem here is that each and every shard must > > > assemble the list of 2,000 documents (just ID and > > > sort criteria, usually score). > > > > > > Then the node serving the original request merges > > > the sub-lists to pick the top 2,000. Then the node > > > sends another request to each shard to get > > > the full document. Then the node merges this > > > into the full list to return to the user. > > > > > > Solr really isn't built for this use-case, is it actually > > > a compelling situation? > > > > > > And having your document cache set at 1M is kinda > > > high if you have very big documents. > > > > > > FWIW, > > > Erick > > > > > > > > > On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar > > >wrote: > > > > > > > Also, I don't see a consistent response time from solr, I ran ab > again > > > and > > > > I get this: > > > > > > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > > > > > > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > > " > > > > > > > > > > > > Benchmarking x.amazonaws.com (be patient) > > > > Completed 100 requests > > > > Completed 200 requests > > > > Completed 300 requests > > > > Completed 400 requests > > > > Completed 500 requests > > > > Finished 500 requests > > > > > > > > > > > > Server Software: > > > > Server Hostname: x.amazonaws.com > > > > Server Port:8983 > > > > > > > > Document Path: > > > > > > > > > > > > > > /solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > > Document Length:1538537 bytes > > > > > > > > Concurrency Level: 10 > > > > Time taken for tests: 10.858 seconds > > > > Complete requests: 500 > > > > Failed requests:8 > > > >(Connect: 0, Receive: 0, Length: 8, Exceptions: 0) > > > > Write errors: 0 > > > > Total transferred: 769297992 bytes > > > > HTML transferred: 769268492 bytes > > > > Requests per second:46.05 [#/sec] (mean) > > > > Time per request: 217.167 [ms] (mean)
Re: Improving performance to return 2000+ documents
Solrconfig.xml has got entries which you can tweak for your use case. One of them is queryresultwindowsize. You can try using the value of 2000 and see if it helps improving performance. Please make sure you have enough memory allocated for queryresultcache. A combination of sharding and distribution of workload(requesting 2000/number of shards) with an aggregator would be a good way to maximize performance. Thanks, Jagdish On Sun, Jun 30, 2013 at 6:48 PM, Erick Erickson wrote: > 50M documents, depending on a bunch of things, > may not be unreasonable for a single node, only > testing will tell. > > But the question I have is whether you should be > using standard Solr queries for this or building a custom > component that goes at the base Lucene index > and "does the right thing". Or even re-indexing your > entire corpus periodically to add this kind of data. > > FWIW, > Erick > > > On Sun, Jun 30, 2013 at 2:00 PM, Utkarsh Sengar >wrote: > > > Thanks Erick/Peter. > > > > This is an offline process, used by a relevancy engine implemented around > > solr. The engine computes boost scores for related keywords based on > > clickstream data. > > i.e.: say clickstream has: ipad=upc1,upc2,upc3 > > I query solr with keyword: "ipad" (to get 2000 documents) and then make 3 > > individual queries for upc1,upc2,upc3 (which are fast). > > The data is then used to compute related keywords to "ipad" with their > > boost values. > > > > So, I cannot really replace that, since I need full text search over my > > dataset to retrieve top 2000 documents. > > > > I tried paging: I retrieve 500 solr documents 4 times (0-500, > 500-1000...), > > but don't see any improvements. > > > > > > Some questions: > > 1. Maybe the JVM size might help? > > This is what I see in the dashboard: > > Physical Memory 76.2% > > Swap Space NaN% (don't have any swap space, running on AWS EBS) > > File Descriptor Count 4.7% > > JVM-Memory 73.8% > > > > Screenshot: http://i.imgur.com/aegKzP6.png > > > > 2. Will reducing the shards from 3 to 1 improve performance? (maybe > > increase the RAM from 30 to 60GB) The problem I will face in that case > will > > be fitting 50M documents on 1 machine. > > > > Thanks, > > -Utkarsh > > > > > > On Sat, Jun 29, 2013 at 3:58 PM, Peter Sturge > >wrote: > > > > > Hello Utkarsh, > > > This may or may not be relevant for your use-case, but the way we deal > > with > > > this scenario is to retrieve the top N documents 5,10,20or100 at a time > > > (user selectable). We can then page the results, changing the start > > > parameter to return the next set. This allows us to 'retrieve' millions > > of > > > documents - we just do it at the user's leisure, rather than make them > > wait > > > for the whole lot in one go. > > > This works well because users very rarely want to see ALL 2000 (or > > whatever > > > number) documents at one time - it's simply too much to take in at one > > > time. > > > If your use-case involves an automated or offline procedure (e.g. > > running a > > > report or some data-mining op), then presumably it doesn't matter so > much > > > it takes a bit longer (as long as it returns in some reasonble time). > > > Have you looked at doing paging on the client-side - this will hugely > > > speed-up your search time. > > > HTH > > > Peter > > > > > > > > > > > > On Sat, Jun 29, 2013 at 6:17 PM, Erick Erickson < > erickerick...@gmail.com > > > >wrote: > > > > > > > Well, depending on how many docs get served > > > > from the cache the time will vary. But this is > > > > just ugly, if you can avoid this use-case it would > > > > be a Good Thing. > > > > > > > > Problem here is that each and every shard must > > > > assemble the list of 2,000 documents (just ID and > > > > sort criteria, usually score). > > > > > > > > Then the node serving the original request merges > > > > the sub-lists to pick the top 2,000. Then the node > > > > sends another request to each shard to get > > > > the full document. Then the node merges this > > > > into the full list to return to the user. > > > > > > > > Solr really isn't built for this use-case, is it actually > > > > a compelling situation? > > > > > > > > And having your document cache set at 1M is kinda > > > > high if you have very big documents. > > > > > > > > FWIW, > > > > Erick > > > > > > > > > > > > On Fri, Jun 28, 2013 at 8:44 PM, Utkarsh Sengar < > utkarsh2...@gmail.com > > > > >wrote: > > > > > > > > > Also, I don't see a consistent response time from solr, I ran ab > > again > > > > and > > > > > I get this: > > > > > > > > > > ubuntu@ip-10-149-6-68:~$ ab -c 10 -n 500 " > > > > > > > > > > > > > > > > > > > > http://x.amazonaws.com:8983/solr/prodinfo/select?q=allText:huggies%20diapers%20size%201&rows=2000&wt=json > > > > > " > > > > > > > > > > > > > > > Benchmarking x.amazonaws.com (be patient) > > > > > Completed 100 requests > > > > > Completed 200 requests > > > > > Completed 300 requests > > > > > Completed 400 requests > > > > > Complete
RE: Distributed search results in "SocketException: Connection reset"
Thanks Lance. If that is the case, are there any timeout mechanisms defined by Solr other than Jetty timeout definitions? Thanks, Shahar. -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Monday, July 01, 2013 4:18 AM To: solr-user@lucene.apache.org Subject: Re: Distributed search results in "SocketException: Connection reset" This usually means the end server timed out. On 06/30/2013 06:31 AM, Shahar Davidson wrote: > Hi all, > > We're getting the below exception sporadically when using distributed > search. (using Solr 4.2.1) Note that 'core_3' is one of the cores mentioned > in the 'shards' parameter. > > Any ideas anyone? > > Thanks, > > Shahar. > > > Jun 03, 2013 5:27:38 PM org.apache.solr.common.SolrException log > SEVERE: org.apache.solr.common.SolrException: > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: http://127.0.0.1:8210/solr/core_3 > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:300) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:365) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > at > org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998) > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856) > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Unknown Source) Caused by: > org.apache.solr.client.solrj.SolrServerException: IOException occured when > talking to server at: http://127.0.0.1:8210/solr/core_3 > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413) > at > org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166) > at > org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133) > at java.util.concurrent.FutureTask$Sync.innerRun(Unknown > Source) > at java.util.concurrent.FutureTask.run(Unknown Source) > at > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.util.concurrent.FutureTask$Sync.