strange error on closing server
Hi all I am having a strange error whenever I close my index (calling server.close() The error is shown below. I am not sure where I should look - the configuration file? The code? Or index fragments? Or else? The code causing the error is very simple, just the “close()” method. Many thanks! CachingDirectoryFactory:184 - Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<> 2016-02-21 11:09:33 ERROR CachingDirectoryFactory:150 - Error closing directory:org.apache.solr.common.SolrException: Timeout waiting for all directory ref counts to be released - gave up waiting on CachedDir<> at org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:187) at org.apache.solr.core.SolrCore.close(SolrCore.java:1257) at org.apache.solr.core.SolrCores.close(SolrCores.java:124) at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:562) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.shutdown(EmbeddedSolrServer.java:263) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.close(EmbeddedSolrServer.java:268) at uk.ac.shef.dcs.jate.app.App.extract(App.java:276) at uk.ac.shef.dcs.jate.app.AppTermEx.main(AppTermEx.java:35) Line 276 of App class is: solrServer.close();
Does solr remove "\r" from text content for indexing?
Hi I am trying to pin-point a mismatch between the offsets produced by solr indexing process when I use the offsets to substring from the original document content. It seems that if the text content contains "\r" (windows carriage sign), solr automatically removes it, so "ok\r\nthis is the text\r\nand..." becomes "ok\nthis is the text\nand..." and as a reulst the offsets created by solr indexing do not work with the original content. I have asked this issue on the lucene mailing list but have been suggested that it is likely to be solr that caused this. *To reproduce this issue, here is what I have done:* 1. Compile OpenNLPTokenizer.java and OpenNLPTokenizerFactory.java (in attachment), which I use to analyse a text field. OpenNLPTokenizer.java is almost identical to that at https://issues.apache.org/jira/browse/LUCENE-6595 except that I adapted it to lucene 5.3.0. If you look at line 74 of OpenNLPTokenizer, it takes the "input" variable (of type Reader) from its superclass Tokenizer, and tokenizes its content. At runtime time by debugging, I can see the string content held by this variable has already removed "\r" (details below) 2. configure solrconfig.xml and schema.xml to use the above tokenizer: In solrconfig, do something like below and place the compiled code into the folder In schema.xml define a new field type: positionIncrementGap="100"> class="org.apache.lucene.analysis.opennlp.OpenNLPTokenizerFactory" sentenceModel=".../your_path/en-sent.bin" tokenizerModel=".../your_path/en-token.bin"/> Download "en-sent.bin" and "en-token.bin" from below and place it somewhere and then change the sentenceModel and tokenizerModel params above to point to them: http://opennlp.sourceforge.net/models-1.5/en-token.bin http://opennlp.sourceforge.net/models-1.5/en-sent.bin Then define a new field in the schema: multiValued="false" termVectors="true" termPositions="true" termOffsets="true"/> 3. Run the testing class TestIndexing.java (attachment) in debugging mode, *you need to place a break point on line 74 of OpenNLPTokenizer*. * **To see the problem, notice that: *- Line 19 of TestIndexing.java passes raw string "ok\r\nthis is the text\r\nand..." to be added to field "content", which is to be analyzed by the "testFieldType" defined above. So this will trigger the OpenNLPTokenizer class - When you are at line 74 of OpenNLPTokenizer, inspect the value of the variable "input". It is instantiated as a *ReusableStringReader*, and its value is now "ok\nthis is the text\nand...", all "\r" has been removed. *In an attempt to solve the problem, I have learnt that: *- (suggested by a lucene developer) the ReuseableStringReader I see is caused by the way how Solr sets the field contents (as String). If the StringReader has no \r anymore, then it is Solr's fault. - trying to follow the debugger I pin pointed at line 299 of DefaultIndexingChain that is shown as below: for (IndexableField field : docState.doc) { fieldCount = processField(field, fieldGen, fieldCount); } And again during debugging, I can see the field "content" is encapsulated in an "IndexableField" object and its content is already "\r" removed. However at this point I cannot trace further to find how such IndexableFields are created by solr, or lucene... Any thoughts on this would be much appreciated! package org.apache.lucene.analysis.opennlp; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.IOException; import java.io.Reader; import java.util.Arrays; import opennlp.tools.sentdetect.SentenceDetector; import opennlp.tools.util.Span; import org.apache.commons.io.IOUtils; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.util.AttributeFactory; import org.apache.sis.util.iso.AbstractFactory; /** * Run OpenNLP SentenceDetector and Tokenizer. * Must have Sentence and/or Tokenizer. */ public final class OpenNLPTokenizer extends Tokenizer { private static final int DEFAULT_BUFFER_SIZE = 256; private int
EmbeddedSolrServer problem when using one-jar-with-dependency including solr
Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use maven to build a jar-with-dependency and run a java program pointing its classpath to this jar. However I keep getting errors even when I just try to create an instance of EmbeddedSolrServer: */code/ *String solrHome = "/home/solr/"; String solrCore = "fw"; solrCores = new EmbeddedSolrServer( Paths.get(solrHome), solrCore ).getCoreContainer(); /// My project has dependencies defined in the pom shown below: **When block A is not present**, running the code that calls: * pom /* org.apache.jena jena-arq 3.0.1 BLOCK A org.apache.httpcomponents httpclient 4.5.2 BLOCK A ENDS org.apache.solr solr-core 6.1.0 org.slf4j slf4j-log4j12 log4j log4j org.slf4j slf4j-jdk14 org.apache.solr solr-solrj 6.1.0 org.slf4j slf4j-log4j12 log4j log4j org.slf4j slf4j-jdk14 /// Block A is added because when it is missing, the following error is thrown on the java code above: * ERROR 1 ///* Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/http/impl/client/CloseableHttpClient at org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167) at org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:47) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:404) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.load(EmbeddedSolrServer.java:84) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(EmbeddedSolrServer.java:70) at uk.ac.ntu.sac.sense.SenseProperty.initSolrServer(SenseProperty.java:103) at uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:81) at uk.ac.ntu.sac.sense.kb.indexer.IndexMaster.(IndexMaster.java:31) at uk.ac.ntu.sac.sense.test.TestIndexer.main(TestIndexer.java:14) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: java.lang.ClassNotFoundException: org.apache.http.impl.client.CloseableHttpClient at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 14 more So I looked up online add Block A into pom, run maven clean install to build a jar-with-dependencies, and then start the program point to that jar as classpath, I get this error on the java code shown above: *// ERROR 2//* xception in thread "main" org.apache.solr.common.SolrException: SolrCore 'class' is not available due to init failure: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [] at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066) at uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:84) at uk.ac.ntu.sac.sense.kb.indexer.IndexMaster.(IndexMaster.java:31) at uk.ac.ntu.sac.sense.test.TestIndexer.main(TestIndexer.java:14) Caused by: org.apache.solr.common.SolrException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [] at org.apache.solr.core.SolrCore.(SolrCore.java:773) at org.apache.solr.core.SolrCore.(SolrCore.ja
Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr
Thanks I am not sure if Steve's suggestion was the right solution. Even when I did not have explicitly defined the dependency on lucene, I can see in the packaged jar it still contains org.apache.lucene. What solved my problem is to not pack a single jar but use a folder of individual jars. I am not sure why though. Regards On 02/08/2016 21:53, Rohit Kanchan wrote: We also faced same issue when we were running embedded solr 6.1 server. Actually I faced the same in our integration environment after deploying project. Solr 6.1 is using http client 4.4.1 which I think embedded solr server is looking for. I think when solr core is getting loaded then old http client is getting loaded from some where in your maven. Check dependency tree of your pom.xml and see if you can exclude this jar getting loaded from anywhere else. Just exclude them in your pom.xml. I hope this solves your issue, Thanks Rohit On Tue, Aug 2, 2016 at 9:44 AM, Steve Rowe wrote: solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which in turn has parent POM lucene-solr-grandparent[4], which has a section that specifies dependency versions & exclusions *for all direct dependencies*. The intent is for all Lucene/Solr’s internal dependencies to be managed directly, rather than through Maven’s transitive dependency mechanism. For background, see summary & comments on JIRA issue LUCENE-5217[5]. I haven’t looked into how this affects systems that depend on Lucene/Solr artifacts, but it appears to be the case that you can’t use Maven’s transitive dependency mechanism to pull in all required dependencies for you. BTW, if you look at the grandparent POM, the httpclient version for Solr 6.1.0 is declared as 4.4.1. I don’t know if depending on version 4.5.2 is causing problems, but if you don’t need a feature in 4.5.2, I suggest that you depend on the same version as Solr does. For error #2, you should depend on lucene-core[6]. My suggestion as a place to start: copy/paste the dependencies from solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t need. [1] < https://repo1.maven.org/maven2/org/apache/solr/solr-core/6.1.0/solr-core-6.1.0.pom [2] < https://repo1.maven.org/maven2/org/apache/solr/solr-solrj/6.1.0/solr-solrj-6.1.0.pom [3] < https://repo1.maven.org/maven2/org/apache/solr/solr-parent/6.1.0/solr-parent-6.1.0.pom [4] < https://repo1.maven.org/maven2/org/apache/lucene/lucene-solr-grandparent/6.1.0/lucene-solr-grandparent-6.1.0.pom [5] <https://issues.apache.org/jira/browse/LUCENE-5217> [6] < http://search.maven.org/#artifactdetails|org.apache.lucene|lucene-core|6.1.0|jar -- Steve www.lucidworks.com On Aug 2, 2016, at 12:03 PM, Ziqi Zhang wrote: Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use maven to build a jar-with-dependency and run a java program pointing its classpath to this jar. However I keep getting errors even when I just try to create an instance of EmbeddedSolrServer: */code/ *String solrHome = "/home/solr/"; String solrCore = "fw"; solrCores = new EmbeddedSolrServer( Paths.get(solrHome), solrCore ).getCoreContainer(); /// My project has dependencies defined in the pom shown below: **When block A is not present**, running the code that calls: * pom /* org.apache.jena jena-arq 3.0.1 BLOCK A org.apache.httpcomponents httpclient 4.5.2 BLOCK A ENDS org.apache.solr solr-core 6.1.0 org.slf4j slf4j-log4j12 log4j log4j org.slf4j slf4j-jdk14 org.apache.solr solr-solrj 6.1.0 org.slf4j slf4j-log4j12 log4j log4j org.slf4j slf4j-jdk14 /// Block A is added because when it is missing, the following error is thrown on the java code above: * ERROR 1 ///* Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/http/impl/client/CloseableHttpClient at org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167) at org.apache.solr.handler.component.ShardHan
Re: how to improve concurrent request performance and stress testing
Thanks Yonik, It uses a thread per request, simultaneously (up to any limit configured by the app server) How can I change this setting then? I suppose it is to do with Jetty or Tomcat whichever hosts solr application, not through the solrconfig? I still do not understand why sending 100 request (of same query) from 100 threads throws solr server to silence - is it because of the computational cost to deal with same query in 100 separate threads? I noticed that disabling facet counts improves things a bit, but not significant. Thanks in advance!
Re: how to improve concurrent request performance and stress testing
Thanks! Also make sure that common filters, sort fields, and facets have been warmed. I assume these are achieved by setting large cache size and large autowarmcount number in solr configuration? specifically filterCache queryResultCache documentCache Thanks! -- From: "Yonik Seeley" <[EMAIL PROTECTED]> Sent: Wednesday, February 06, 2008 7:50 AM To: Subject: Re: how to improve concurrent request performance and stress testing On Feb 6, 2008 6:37 PM, Ziqi Zhang <[EMAIL PROTECTED]> wrote: I still do not understand why sending 100 request (of same query) from 100 threads throws solr server to silence - is it because of the computational cost to deal with same query in 100 separate threads? Yes... sending a large number of requests at once can cause one to start hitting synchronization bottlenecks. -Yonik
Re: how to improve concurrent request performance and stress testing
Thanks Otis! I think I now got a clearer picture of the issue and its causes, thanks. Could you please elaborate on "warming up" searcher prior exposure to real requests, does this mean running through as many most often used queries as possible such that results are cached, and also use as much cache as possible? Thanks! -- From: "Otis Gospodnetic" <[EMAIL PROTECTED]> Sent: Wednesday, February 06, 2008 1:09 PM To: Subject: Re: how to improve concurrent request performance and stress testing Imagine this type of code: synchronized (someGlobalObject) { // search } What happens when 100 threads his this spot? The first one to get there gets in and runs the search and 99 of them wait. What happens if that "// search" also involves expensive operations, lots of IO, warming up, cache population, etc? Those 99 threads will have to wait a while :) That's why it is recommended to warm up the searcher ahead of time before exposing it to real requests. However, even if you warm things up, that sync block will remain there, and at some point this will become a bottleneck. What that point is depends on the hardware, index size, query complexity and rat, even JVM. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ziqi Zhang <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Wednesday, February 6, 2008 6:37:40 PM Subject: Re: how to improve concurrent request performance and stress testing Thanks Yonik, > It uses a thread per request, simultaneously (up to any limit > configured by the app server) How can I change this setting then? I suppose it is to do with Jetty or Tomcat whichever hosts solr application, not through the solrconfig? I still do not understand why sending 100 request (of same query) from 100 threads throws solr server to silence - is it because of the computational cost to deal with same query in 100 separate threads? I noticed that disabling facet counts improves things a bit, but not significant. Thanks in advance!
Re: how to improve concurrent request performance and stress testing
Thank you so much! I will look into firstSearcher configuration next! thanks -- From: "Chris Hostetter" <[EMAIL PROTECTED]> Sent: Wednesday, February 06, 2008 8:56 PM To: Subject: Re: how to improve concurrent request performance and stress testing : > Also make sure that common filters, sort fields, and facets have been : > warmed. : : I assume these are achieved by setting large cache size and large : autowarmcount number in solr configuration? specifically autowarming seeds the cahces of a new Searcher using hte keys of an old searcher -- it does nothing to help you when you first start up SOlr and all of hte caches are empty. for that you need to either need to manually trigger some sample queries externally (before your stress test) or configure something using the firstSearcher event listener in solrconfig.xml. If you saw all of your requests block untill the first one finished, then i suspect your queries involve a sort (or faceting) that use the FieldCache which is initialized in a single threaded mode (and can't be auto-warmed, you can put some simple queries that use those sort fields in the newSearcher listener to ensure that they get reinitialized for each new searcher) -Hoss
Migrating from Solr 6.6 getStatistics() to Solr 7.x
Hi all In my Solr 6.6 based code, I have the following line that get the total number of documents in a collection: totalDocs=indexSearcher.getStatistics().get("numDocs")) where indexSearcher is an instance of "SolrIndexSearcher". With Solr 7.2.1, 'getStatistics' is no longer available, and it seems that it is replaced by 'collectionStatistics' or 'termStatistics': https://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/search/SolrIndexSearcher.html?is-external=true So my questions is what is the equivalent statement in solr 7.2.1? Is it: solrIndexSearcher.collectionStatistics("numDocs").maxDoc(); The API warns that it is still experimental and might change in incompatible ways in the next release. Is there more 'stable' code for getting this done? Thanks
Re: Migrating from Solr 6.6 getStatistics() to Solr 7.x
Thank you! On Fri, Apr 6, 2018 at 10:34 PM, Chris Hostetter wrote: > > : In my Solr 6.6 based code, I have the following line that get the total > : number of documents in a collection: > : > : totalDocs=indexSearcher.getStatistics().get("numDocs")) > ... > : With Solr 7.2.1, 'getStatistics' is no longer available, and it seems > that > : it is replaced by 'collectionStatistics' or 'termStatistics': > ... > : So my questions is what is the equivalent statement in solr 7.2.1? Is it: > : > : solrIndexSearcher.collectionStatistics("numDocs").maxDoc(); > > Uh... no. that's not quite true. > > In the 6.x code line, getStatistics() was part of the SolrInfoMBean API > that SolrIndexSearcher and many other Solr objects implemented... > > http://lucene.apache.org/solr/6_6_0/solr-core/org/apache/ > solr/search/SolrIndexSearcher.html#getStatistics-- > > In 7.0, SolrInfoMBean was replaced with SolrInfoBean as part ofthe switch > over to the new more robust the Metrics API... > > https://lucene.apache.org/solr/guide/7_0/major-changes- > in-solr-7.html#jmx-support-and-mbeans > https://lucene.apache.org/solr/guide/7_0/metrics-reporting.html > http://lucene.apache.org/solr/7_0_0/solr-core/org/apache/ > solr/core/SolrInfoBean.html > > (The collectionStatistics() and termStatistics() methods are lower level > Lucene concepts) > > IIRC The closest 7.x equivilent to "indexSearcher.getStatistics()" is > "indexSearcher.getMetricsSnapshot()" ... but the keys in that map will > have slightly diff/longer names then they did before, you can use > "indexSearcher.getMetricNames()" so see the full list. > > ...but frankly that's all a very comlicated way to get "numDocs" > if you're writting a solr plugin that has direct access to a > SolrIndexSearcher instance ... you can just call > "solrIndexSearcher.numDocs()" method and make your life a lot simpler. > > > > -Hoss > http://www.lucidworks.com/ >