strange error on closing server

2016-02-21 Thread Ziqi Zhang
Hi all

I am having a strange error whenever I close my index (calling server.close()
The error is shown below. I am not sure where I should look - the configuration 
file? The code? Or index fragments? Or else? The code causing the error is very 
simple, just the “close()” method.

Many thanks!


CachingDirectoryFactory:184 - Timeout waiting for all directory ref counts to 
be released - gave up waiting on CachedDir<>
2016-02-21 11:09:33 ERROR CachingDirectoryFactory:150 - Error closing 
directory:org.apache.solr.common.SolrException: Timeout waiting for all 
directory ref counts to be released - gave up waiting on 
CachedDir<>
at 
org.apache.solr.core.CachingDirectoryFactory.close(CachingDirectoryFactory.java:187)
at org.apache.solr.core.SolrCore.close(SolrCore.java:1257)
at org.apache.solr.core.SolrCores.close(SolrCores.java:124)
at org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:562)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.shutdown(EmbeddedSolrServer.java:263)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.close(EmbeddedSolrServer.java:268)
at uk.ac.shef.dcs.jate.app.App.extract(App.java:276)
at uk.ac.shef.dcs.jate.app.AppTermEx.main(AppTermEx.java:35)


Line 276 of App class is:

solrServer.close();





Does solr remove "\r" from text content for indexing?

2015-10-04 Thread Ziqi Zhang

Hi

I am trying to pin-point a mismatch between the offsets produced by solr 
indexing process when I use the offsets to substring from the original 
document content. It seems that if the text content contains "\r" 
(windows carriage sign), solr automatically removes it, so "ok\r\nthis 
is the text\r\nand..." becomes "ok\nthis is the text\nand..." and as a 
reulst the offsets created by solr indexing do not work with the 
original content.


I have asked this issue on the lucene mailing list but have been 
suggested that it is likely to be solr that caused this.


*To reproduce this issue, here is what I have done:*

1. Compile OpenNLPTokenizer.java and OpenNLPTokenizerFactory.java (in 
attachment), which I use to analyse a text field. OpenNLPTokenizer.java 
is almost identical to that at 
https://issues.apache.org/jira/browse/LUCENE-6595 except that I adapted 
it to lucene 5.3.0. If you look at line 74 of OpenNLPTokenizer, it takes 
the "input" variable (of type Reader) from its superclass Tokenizer, and 
tokenizes its content. At runtime time by debugging, I can see the 
string content held by this variable has already removed "\r" (details 
below)


2. configure solrconfig.xml and schema.xml to use the above tokenizer:
In solrconfig, do something like below and place the compiled code into 
the folder



In schema.xml define a new field type:
positionIncrementGap="100">


class="org.apache.lucene.analysis.opennlp.OpenNLPTokenizerFactory"

sentenceModel=".../your_path/en-sent.bin"
tokenizerModel=".../your_path/en-token.bin"/>


 

Download "en-sent.bin" and "en-token.bin" from below and place it 
somewhere and then change the sentenceModel and tokenizerModel params 
above to point to them:

http://opennlp.sourceforge.net/models-1.5/en-token.bin
http://opennlp.sourceforge.net/models-1.5/en-sent.bin

Then define a new field in the schema:
multiValued="false" termVectors="true" termPositions="true" 
termOffsets="true"/>


3. Run the testing class TestIndexing.java (attachment) in debugging 
mode, *you need to place a break point on line 74 of OpenNLPTokenizer*.


*
**To see the problem, notice that:
*- Line 19 of TestIndexing.java passes raw string "ok\r\nthis is the 
text\r\nand..." to be added to field "content", which is to be analyzed 
by the "testFieldType" defined above. So this will trigger the 
OpenNLPTokenizer class
- When you are at line 74 of OpenNLPTokenizer, inspect the value of the 
variable "input". It is instantiated as a *ReusableStringReader*, and 
its value is now "ok\nthis is the text\nand...", all "\r" has been removed.



*In an attempt to solve the problem, I have learnt that:
*- (suggested by a lucene developer) the ReuseableStringReader I see is 
caused by the way how Solr sets the field contents (as String). If the 
StringReader has no \r anymore, then it is Solr's fault.
- trying to follow the debugger I pin pointed at line 299 of 
DefaultIndexingChain that is shown as below:


  for (IndexableField field : docState.doc) {
fieldCount = processField(field, fieldGen, fieldCount);
  }

And again during debugging, I can see the field "content" is 
encapsulated in an "IndexableField" object and its content is already 
"\r" removed.
However at this point I cannot trace further to find how such 
IndexableFields are created by solr, or lucene...



Any thoughts on this would be much appreciated!

package org.apache.lucene.analysis.opennlp;

/**
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 * http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

import java.io.IOException;
import java.io.Reader;
import java.util.Arrays;

import opennlp.tools.sentdetect.SentenceDetector;
import opennlp.tools.util.Span;

import org.apache.commons.io.IOUtils;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.util.AttributeFactory;
import org.apache.sis.util.iso.AbstractFactory;

/**
 * Run OpenNLP SentenceDetector and Tokenizer.
 * Must have Sentence and/or Tokenizer.
 */
public final class OpenNLPTokenizer extends Tokenizer {
private static final int DEFAULT_BUFFER_SIZE = 256;

private int 

EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-02 Thread Ziqi Zhang
Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use 
maven to build a jar-with-dependency and run a java program pointing its 
classpath to this jar. However I keep getting errors even when I just 
try to create an instance of EmbeddedSolrServer:


*/code/
*String solrHome = "/home/solr/";
String solrCore = "fw";
solrCores = new EmbeddedSolrServer(
Paths.get(solrHome), solrCore
).getCoreContainer();
///


My project has dependencies defined in the pom shown below:  **When 
block A is not present**, running the code that calls:


* pom /*

org.apache.jena
jena-arq
3.0.1




 BLOCK A
org.apache.httpcomponents
httpclient
4.5.2
 BLOCK A ENDS



org.apache.solr
solr-core
6.1.0


org.slf4j
slf4j-log4j12


log4j
log4j


org.slf4j
slf4j-jdk14




org.apache.solr
solr-solrj
6.1.0


org.slf4j
slf4j-log4j12


log4j
log4j


org.slf4j
slf4j-jdk14



///


Block A is added because when it is missing, the following error is 
thrown on the java code above:


* ERROR 1 ///*

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/http/impl/client/CloseableHttpClient
at 
org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167)
at 
org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:47)

at org.apache.solr.core.CoreContainer.load(CoreContainer.java:404)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.load(EmbeddedSolrServer.java:84)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.(EmbeddedSolrServer.java:70)
at 
uk.ac.ntu.sac.sense.SenseProperty.initSolrServer(SenseProperty.java:103)
at 
uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:81)
at 
uk.ac.ntu.sac.sense.kb.indexer.IndexMaster.(IndexMaster.java:31)

at uk.ac.ntu.sac.sense.test.TestIndexer.main(TestIndexer.java:14)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)
at 
com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassNotFoundException: 
org.apache.http.impl.client.CloseableHttpClient

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 14 more



So I looked up online add Block A into pom, run maven clean install to 
build a jar-with-dependencies, and then start the program point to that 
jar as classpath, I get this error on the java code shown above:


*// ERROR 2//*
xception in thread "main" org.apache.solr.common.SolrException: 
SolrCore 'class' is not available due to init failure: An SPI class of 
type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does 
not exist.  You need to add the corresponding JAR file supporting this 
SPI to your classpath.  The current classpath supports the following 
names: []
at 
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1066)
at 
uk.ac.ntu.sac.sense.SenseProperty.getClassIndex(SenseProperty.java:84)
at 
uk.ac.ntu.sac.sense.kb.indexer.IndexMaster.(IndexMaster.java:31)

at uk.ac.ntu.sac.sense.test.TestIndexer.main(TestIndexer.java:14)
Caused by: org.apache.solr.common.SolrException: An SPI class of 
type org.apache.lucene.codecs.PostingsFormat with name 'Lucene50' does 
not exist.  You need to add the corresponding JAR file supporting this 
SPI to your classpath.  The current classpath supports the following 
names: []

at org.apache.solr.core.SolrCore.(SolrCore.java:773)
at org.apache.solr.core.SolrCore.(SolrCore.ja

Re: EmbeddedSolrServer problem when using one-jar-with-dependency including solr

2016-08-03 Thread Ziqi Zhang

Thanks

I am not sure if Steve's suggestion was the right solution. Even when I 
did not have explicitly defined the dependency on lucene, I can see in 
the packaged jar it still contains org.apache.lucene.


What solved my problem is to not pack a single jar but use a folder of 
individual jars. I am not sure why though.


Regards


On 02/08/2016 21:53, Rohit Kanchan wrote:

We also faced same issue when we were running embedded solr 6.1 server.
Actually I faced the same in our integration environment after deploying
project. Solr 6.1 is using http client 4.4.1 which I think  embedded solr
server is looking for. I think when solr core is getting loaded then old
http client is getting loaded from some where in your maven. Check
dependency tree of your pom.xml and see if you can exclude this jar getting
loaded from anywhere else. Just exclude them in your pom.xml. I hope this
solves your issue,


Thanks
Rohit


On Tue, Aug 2, 2016 at 9:44 AM, Steve Rowe  wrote:


solr-core[1] and solr-solrj[2] POMs have parent POM solr-parent[3], which
in turn has parent POM lucene-solr-grandparent[4], which has a
 section that specifies dependency versions &
exclusions *for all direct dependencies*.

The intent is for all Lucene/Solr’s internal dependencies to be managed
directly, rather than through Maven’s transitive dependency mechanism.  For
background, see summary & comments on JIRA issue LUCENE-5217[5].

I haven’t looked into how this affects systems that depend on Lucene/Solr
artifacts, but it appears to be the case that you can’t use Maven’s
transitive dependency mechanism to pull in all required dependencies for
you.

BTW, if you look at the grandparent POM, the httpclient version for Solr
6.1.0 is declared as 4.4.1.  I don’t know if depending on version 4.5.2 is
causing problems, but if you don’t need a feature in 4.5.2, I suggest that
you depend on the same version as Solr does.

For error #2, you should depend on lucene-core[6].

My suggestion as a place to start: copy/paste the dependencies from
solr-core[1] and solr-solrj[2] POMs, and leave out stuff you know you won’t
need.

[1] <
https://repo1.maven.org/maven2/org/apache/solr/solr-core/6.1.0/solr-core-6.1.0.pom
[2] <
https://repo1.maven.org/maven2/org/apache/solr/solr-solrj/6.1.0/solr-solrj-6.1.0.pom
[3] <
https://repo1.maven.org/maven2/org/apache/solr/solr-parent/6.1.0/solr-parent-6.1.0.pom
[4] <
https://repo1.maven.org/maven2/org/apache/lucene/lucene-solr-grandparent/6.1.0/lucene-solr-grandparent-6.1.0.pom
[5] <https://issues.apache.org/jira/browse/LUCENE-5217>
[6] <
http://search.maven.org/#artifactdetails|org.apache.lucene|lucene-core|6.1.0|jar
--
Steve
www.lucidworks.com


On Aug 2, 2016, at 12:03 PM, Ziqi Zhang 

wrote:

Hi, I am using Solr, Solrj 6.1, and Maven to manage my project. I use

maven to build a jar-with-dependency and run a java program pointing its
classpath to this jar. However I keep getting errors even when I just try
to create an instance of EmbeddedSolrServer:

*/code/
*String solrHome = "/home/solr/";
String solrCore = "fw";
solrCores = new EmbeddedSolrServer(
Paths.get(solrHome), solrCore
).getCoreContainer();
///


My project has dependencies defined in the pom shown below:  **When

block A is not present**, running the code that calls:

* pom /*

org.apache.jena
jena-arq
3.0.1




 BLOCK A
org.apache.httpcomponents
httpclient
4.5.2
 BLOCK A ENDS



org.apache.solr
solr-core
6.1.0


org.slf4j
slf4j-log4j12


log4j
log4j


org.slf4j
slf4j-jdk14




org.apache.solr
solr-solrj
6.1.0


org.slf4j
slf4j-log4j12


log4j
log4j


org.slf4j
slf4j-jdk14



///


Block A is added because when it is missing, the following error is

thrown on the java code above:

* ERROR 1 ///*

Exception in thread "main" java.lang.NoClassDefFoundError:

org/apache/http/impl/client/CloseableHttpClient

at

org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:167)

at

org.apache.solr.handler.component.ShardHan

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang

Thanks Yonik,



It uses a thread per request, simultaneously (up to any limit
configured by the app server)


How can I change this setting then? I suppose it is to do with Jetty or 
Tomcat whichever hosts solr application, not through the solrconfig?


I still do not understand why sending 100 request (of same query) from 100 
threads throws solr server to silence - is it because of the computational 
cost to deal with same query in 100 separate threads?


I noticed that disabling facet counts improves things a bit, but not 
significant.


Thanks in advance! 



Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang

Thanks!

Also make sure that common filters, sort fields, and facets have been 
warmed.


I assume these are achieved by setting large cache size and large 
autowarmcount number in solr configuration? specifically


filterCache
queryResultCache
documentCache

Thanks!


--
From: "Yonik Seeley" <[EMAIL PROTECTED]>
Sent: Wednesday, February 06, 2008 7:50 AM
To: 
Subject: Re: how to improve concurrent request performance and stress 
testing



On Feb 6, 2008 6:37 PM, Ziqi Zhang <[EMAIL PROTECTED]> wrote:
I still do not understand why sending 100 request (of same query) from 
100
threads throws solr server to silence - is it because of the 
computational

cost to deal with same query in 100 separate threads?


Yes... sending a large number of requests at once can cause one to
start hitting synchronization bottlenecks.




-Yonik



Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang

Thanks Otis!

I think I now got a clearer picture of the issue and its causes, thanks.

Could you please elaborate on "warming up" searcher prior exposure to real 
requests, does this mean running through as many most often used queries as 
possible such that results are cached, and also use as much cache as 
possible?


Thanks!



--
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
Sent: Wednesday, February 06, 2008 1:09 PM
To: 
Subject: Re: how to improve concurrent request performance and stress 
testing



Imagine this type of code:

synchronized (someGlobalObject) {
 // search
}

What happens when  100 threads his this spot?  The first one to get there 
gets in and runs the search and 99 of them wait.
What happens if that  "// search" also involves expensive operations, lots 
of IO, warming up, cache population, etc?  Those 99 threads will have to 
wait a while :)


That's why it is recommended to warm up the searcher ahead of time before 
exposing it to real requests.  However, even if you warm things up, that 
sync block will remain there, and at some point this will become a 
bottleneck.  What that point is depends on the hardware, index size, query 
complexity and rat, even JVM.


Otis

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: Ziqi Zhang <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, February 6, 2008 6:37:40 PM
Subject: Re: how to improve concurrent request performance and stress 
testing


Thanks


Yonik,



>


It



uses



a



thread



per



request,



simultaneously



(up



to



any



limit

>


configured



by



the



app



server)


How


can



I



change



this



setting



then?



I



suppose



it



is



to



do



with



Jetty



or

Tomcat


whichever



hosts



solr



application,



not



through



the



solrconfig?


I


still



do



not



understand



why



sending



100



request



(of



same



query)



from



100

threads


throws



solr



server



to



silence



-



is



it



because



of



the



computational

cost


to



deal



with



same



query



in



100



separate



threads?


I


noticed



that



disabling



facet



counts



improves



things



a



bit,



but



not

significant.

Thanks


in



advance!









Re: how to improve concurrent request performance and stress testing

2008-02-07 Thread Ziqi Zhang

Thank you so much! I will look into firstSearcher configuration next! thanks

--
From: "Chris Hostetter" <[EMAIL PROTECTED]>
Sent: Wednesday, February 06, 2008 8:56 PM
To: 
Subject: Re: how to improve concurrent request performance and stress 
testing



: > Also make sure that common filters, sort fields, and facets have been
: > warmed.
:
: I assume these are achieved by setting large cache size and large
: autowarmcount number in solr configuration? specifically

autowarming seeds the cahces of a new Searcher using hte keys of an old
searcher -- it does nothing to help you when you first start up SOlr and
all of hte caches are empty.

for that you need to either need to manually trigger some sample queries
externally (before your stress test) or configure something using the
firstSearcher event listener in solrconfig.xml.

If you saw all of your requests block untill the first one finished, then
i suspect your queries involve a sort (or faceting) that use the
FieldCache which is initialized in a single threaded mode (and can't be
auto-warmed, you can put some simple queries that use those sort fields in
the newSearcher listener to ensure that they get reinitialized for each
new searcher)

-Hoss




Migrating from Solr 6.6 getStatistics() to Solr 7.x

2018-04-06 Thread ziqi zhang
Hi all

In my Solr 6.6 based code, I have the following line that get the total
number of documents in a collection:

totalDocs=indexSearcher.getStatistics().get("numDocs"))

where indexSearcher is an instance of "SolrIndexSearcher".


With Solr 7.2.1, 'getStatistics' is no longer available, and it seems that
it is replaced by 'collectionStatistics' or 'termStatistics':
https://lucene.apache.org/solr/7_2_1/solr-core/org/apache/solr/search/SolrIndexSearcher.html?is-external=true

So my questions is what is the equivalent statement in solr 7.2.1? Is it:

solrIndexSearcher.collectionStatistics("numDocs").maxDoc();

The API warns that it is still experimental and might change in
incompatible ways in the next release. Is there more 'stable' code for
getting this done?

Thanks


Re: Migrating from Solr 6.6 getStatistics() to Solr 7.x

2018-04-06 Thread ziqi zhang
Thank you!

On Fri, Apr 6, 2018 at 10:34 PM, Chris Hostetter 
wrote:

>
> : In my Solr 6.6 based code, I have the following line that get the total
> : number of documents in a collection:
> :
> : totalDocs=indexSearcher.getStatistics().get("numDocs"))
> ...
> : With Solr 7.2.1, 'getStatistics' is no longer available, and it seems
> that
> : it is replaced by 'collectionStatistics' or 'termStatistics':
> ...
> : So my questions is what is the equivalent statement in solr 7.2.1? Is it:
> :
> : solrIndexSearcher.collectionStatistics("numDocs").maxDoc();
>
> Uh... no.  that's not quite true.
>
> In the 6.x code line, getStatistics() was part of the SolrInfoMBean API
> that SolrIndexSearcher and many other Solr objects implemented...
>
> http://lucene.apache.org/solr/6_6_0/solr-core/org/apache/
> solr/search/SolrIndexSearcher.html#getStatistics--
>
> In 7.0, SolrInfoMBean was replaced with SolrInfoBean as part ofthe switch
> over to the new more robust the Metrics API...
>
> https://lucene.apache.org/solr/guide/7_0/major-changes-
> in-solr-7.html#jmx-support-and-mbeans
> https://lucene.apache.org/solr/guide/7_0/metrics-reporting.html
> http://lucene.apache.org/solr/7_0_0/solr-core/org/apache/
> solr/core/SolrInfoBean.html
>
> (The collectionStatistics() and termStatistics() methods are lower level
> Lucene concepts)
>
> IIRC The closest 7.x equivilent to "indexSearcher.getStatistics()" is
> "indexSearcher.getMetricsSnapshot()" ... but the keys in that map will
> have slightly diff/longer names then they did before, you can use
> "indexSearcher.getMetricNames()" so see the full list.
>
> ...but frankly that's all a very comlicated way to get "numDocs"
> if you're writting a solr plugin that has direct access to a
> SolrIndexSearcher instance ... you can just call
> "solrIndexSearcher.numDocs()" method and make your life a lot simpler.
>
>
>
> -Hoss
> http://www.lucidworks.com/
>