Re: Log message "zkClient has disconnected".

2014-10-28 Thread Modassar Ather
Thanks Shawn for your response and the link of GC tuning. Regards, Modassar On Tue, Oct 28, 2014 at 7:01 PM, Shawn Heisey wrote: > On 10/28/2014 1:48 AM, Modassar Ather wrote: > > These Solrcloud instances are 8-core machines with a RAM of 24 GB each > > assigned to tomcat. The Indexer machine

RE: Sharding configuration

2014-10-28 Thread Will Martin
Informational only. FYI Machine parallelism has been empirically proven to be application dependent. See DaCapo benchmarks (lucene indexing and lucene searching) use in http://dx.doi.org/10.1145/2479871.2479901 " Parallelism profiling and wall-time prediction for multi-threaded applicat

Re: Indexing documents/files for production use

2014-10-28 Thread Erick Erickson
And one other consideration in addition to the two excellent responses so far In a SolrCloud environment, SolrJ via CloudSolrServer will automatically route the documents to the correct shard leader, saving some additional overhead. Post.jar and cURL send the docs to a node, which in turn forw

Re: Indexing documents/files for production use

2014-10-28 Thread Jürgen Wagner (DVT)
Hello Olivier, for real production use, you won't really want to use any toys like post.jar or curl. You want a decent connector to whatever data source there is, that fetches data, possibly massages it a bit, and then feeds it into Solr - by means of SolrJ or directly into the web service of Sol

Re: Indexing documents/files for production use

2014-10-28 Thread Alexandre Rafalovitch
What is your production use? You have to answer that for yourself. post.jar makes a couple of things easy. If your production use fits into those (e.g. no cluster) - great, use it. It is certainly not any worse than cURL. But if you are running a cluster and have specific requirements, then yes,

Indexing documents/files for production use

2014-10-28 Thread Olivier Austina
Hi All, I am reading the solr documentation. I have understood that post.jar is not meant for production use, cURL is not recommande

Re: Sharding configuration

2014-10-28 Thread Ramkumar R. Aiyengar
As far as the second option goes, unless you are using a large amount of memory and you reach a point where a JVM can't sensibly deal with a GC load, having multiple JVMs wouldn't buy you much. With a 26GB index, you probably haven't reached that point. There are also other shared resources at an i

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread Michael Della Bitta
No you do not, although you may consider it, because you'd be getting a sort of integrated stack. But really, the decision to switch to running Solr in HDFS should not be taken lightly. Unless you are on a team familiar with running a Hadoop stack, or you're willing to devote a lot of effort t

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
Yeah , I get that not using a MarReduceIndexerTool could be more resource intensive , but the way this issue is manifesting which is resulting in disjoint SolrCloud replicas perplexes me . While you were tuning your SolrCloud environment to cater to the Hadoop indexing requirements , did you ever

Slow forwarding requests to collection leader

2014-10-28 Thread Matt Hilt
I have three equal machines each running solr cloud (4.8). I have multiple collections that are replicated but not sharded. I also have document generation processes running on these nodes which involves querying the collection ~5 times per document generated. Node 1 has a replica of collection

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
I m using Apache Hadoop and Solr , do I nee dto switch to Cloudera On Tue, Oct 28, 2014 at 1:27 PM, Michael Della Bitta < michael.della.bi...@appinions.com> wrote: > We index directly from mappers using SolrJ. It does work, but you pay the > price of having to instantiate all those sockets vs. th

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread Michael Della Bitta
We index directly from mappers using SolrJ. It does work, but you pay the price of having to instantiate all those sockets vs. the way MapReduceIndexerTool works, where you're writing to an EmbeddedSolrServer directly in the Reduce task. You don't *need* to use MapReduceIndexerTool, but it's m

Re: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-28 Thread S.L
Will, I think in one of your other emails(which I am not able to find) you has asked if I was indexing directly from MapReduce jobs, yes I am indexing directly from the map task and that is done using SolrJ with a SolrCloudServer initialized with the ZK ensemble URLs.Do I need to use something lik

Re: unable to build solr 4.10.1

2014-10-28 Thread Chris Hostetter
: I am getting below error while doing "ant dist" . The build system (up to 4.10.1) was unintentinally requiring that javadoc jars existed -- and this recently manifested as a problem when this particular javadoc jar somehow fvanished from maven.org. This issue tracks the fix which will be n 4

Re: unable to build solr 4.10.1

2014-10-28 Thread Steve Rowe
Hi Karunakar, 4.10.2 (which will be released some time this week) has a fix for this: https://issues.apache.org/jira/browse/LUCENE-6007 The build failure is caused by solr/contrib/dataimporthandler-extras/ivy.xml Apply this patch to make the build succeed:

Re: Collapse and Expand Results in Solr 4.10 / Highlighting

2014-10-28 Thread Joel Bernstein
You are correct. Highlighting is working from the DocList, which only includes the collapsed set when using Collapse/Expand. Joel Bernstein Search Engineer at Heliosearch On Tue, Oct 28, 2014 at 9:46 AM, Michael Hagström wrote: > Hello! > > > I'm testing the »Collapse and Expand« functionalit

Re: [ANN] Heliosearch 0.08 released

2014-10-28 Thread Yonik Seeley
On Tue, Oct 28, 2014 at 10:10 AM, Bernd Fehling wrote: > Is the new faceted search module the cause why I don't have > any lucene-facet-hs_0.08.jar in the binary distribution? Solr has never used that (and Heliosearch doesn't either). ES never has either AFAIK. > And what is with lucene-classi

Re: [ANN] Heliosearch 0.08 released

2014-10-28 Thread Bernd Fehling
Is the new faceted search module the cause why I don't have any lucene-facet-hs_0.08.jar in the binary distribution? And what is with lucene-classification and lucene-replicator? How can I build from source, with solr/hs.xml? Regards Bernd Am 27.10.2014 um 17:25 schrieb Yonik Seeley: > http:/

Define default Shard in implicit collection

2014-10-28 Thread nabil Kouici
Hi All, I have a collection with implicit router. When I try to load document where router doesn't exist I got error:org.apache.solr.common.SolrException: No shard called =2015 in DocCollection(COL)={ Is it possible to define a default shard where document will be loaded if router doesn't exist?

Re: Total term frequency in solr includes deleted documents

2014-10-28 Thread Alexandre Rafalovitch
Merge policy would probably affect at how often _some_ of the deleted documents are purged at the cost lower than the full optimization. https://cwiki.apache.org/confluence/display/solr/IndexConfig+in+SolrConfig#IndexConfiginSolrConfig-MergingIndexSegments But it is still not a 100% solution. Reg

Collapse and Expand Results in Solr 4.10 / Highlighting

2014-10-28 Thread Michael Hagström
Hello!   I'm testing the »Collapse and Expand« functionality of  Solr 4.10.   Collapsing and expanding results is working pretty well but it seems that there's no way to get highlighting snippets for the expanded results. Highlighting is only available for the result name=»response».   Am I ri

Re: Total term frequency in solr includes deleted documents

2014-10-28 Thread Shawn Heisey
On 10/28/2014 7:16 AM, nutchsolruser wrote: > How can we get exact term frequency with excluding deleted documents term > frequency, and that is without optimization because optimization is > expensive in our case ? > Is there any other way we can get term frequency for entire collection in > solr?

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Shawn Heisey
On 10/28/2014 3:42 AM, Bernd Fehling wrote: > Thanks for the explanations. > > My idea about 4 zookeepers is a result of having the same software > (java, zookeeper, solr, ...) installed on all 4 servers. > But yes, I don't need to start a zookeeper on the 4th server. > > 3 other machines outside

Re: Log message "zkClient has disconnected".

2014-10-28 Thread Shawn Heisey
On 10/28/2014 1:48 AM, Modassar Ather wrote: > These Solrcloud instances are 8-core machines with a RAM of 24 GB each > assigned to tomcat. The Indexer machine starts with -Xmx16g. > All these machines are connected to the same switch. If you have not tuned your garbage collection, a 16GB heap wil

Total term frequency in solr includes deleted documents

2014-10-28 Thread nutchsolruser
Currently I am working on getting term frequency (not document frequency) of term in particular field for whole index. For that I am using function query ttf(field_name,'term'), This returns me total occurrences of term in that field. But It seems it is also considering deleted documents while cal

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Markus Jelsma
On Tuesday 28 October 2014 10:42:11 Bernd Fehling wrote: > Thanks for the explanations. > > My idea about 4 zookeepers is a result of having the same software > (java, zookeeper, solr, ...) installed on all 4 servers. > But yes, I don't need to start a zookeeper on the 4th server. > > 3 other mac

RE: suggestion for new custom atomic update

2014-10-28 Thread Elran Dvir
Shalin and Matthew, Thank you very much. -Original Message- From: Matthew Nigl [mailto:matthew.n...@gmail.com] Sent: Monday, October 27, 2014 7:24 PM To: solr-user@lucene.apache.org Subject: Re: suggestion for new custom atomic update No problem Elran. As Shalin mentioned, you will need

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Bernd Fehling
Thanks for the explanations. My idea about 4 zookeepers is a result of having the same software (java, zookeeper, solr, ...) installed on all 4 servers. But yes, I don't need to start a zookeeper on the 4th server. 3 other machines outside the cloud for ZK seams a bit oversized. And you have anot

Sharding configuration

2014-10-28 Thread Anca Kopetz
Hi, We have a SolrCloud configuration of 10 servers, no sharding, 20 millions of documents, the index has 26 GB. As the number of documents has increased recently, the performance of the cluster decreased. We thought of sharding the index, in order to measure the latency. What is the best approa

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Daniel Collins
As Michael says, you really want an odd number of zookeepers in order to meet the quorum requirements (which based on your comments you seem to be aware of). There is nothing "wrong" with 4 ZKs as such, just that it doesn't buy you anything above having 3, so its one more that might go wrong and c

Re: Log message "zkClient has disconnected".

2014-10-28 Thread Modassar Ather
Hi Will, Thanks for your response. These Solrcloud instances are 8-core machines with a RAM of 24 GB each assigned to tomcat. The Indexer machine starts with -Xmx16g. All these machines are connected to the same switch. The batch size is 5000 documents and there are 8 threads which adds 5000 docu

Re: unable to build solr 4.10.1

2014-10-28 Thread Modassar Ather
The following link might help. https://wiki.apache.org/solr/HowToCompileSolr You might need to run "ant ivy-bootstrap" as described in the link above. On Tue, Oct 28, 2014 at 12:20 PM, Karunakar Reddy wrote: > Hi Martin, > Thanks for your quick response. Yes specified file is not present in th

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Bernd Fehling
Yes, garbage collection is a very good argument to have external zookeepers. I haven't thought about that. But does this also mean seperate server for each zookeeper or can they live side by side with solr on the same server? What is the problem with 4 zookeepers beside that I have no real gain a