Re: Auto replication mechanism in SolrCloud 5.1 not working

2015-04-27 Thread mihaela olteanu
Thanks for the reply. Now it's working but I'm not sure what change fixed this .. It might have been a communication error with ZooKeeper although I could not see anything as such in the logs. I found that ZooKeeper was for example generating some trace files in a location that was running out o

stats component performance

2015-04-27 Thread Matteo Grolla
Hi, is there any public benchmark or description of how the solr stats component works? Matteo

Replication not triggered

2015-04-27 Thread Michael Lackhoff
We have old fashioned replication configured between one master and one slave. Everything used to work but today I noticed that recent records were not present in the slave (same query gives hits on master but non on slave). The replication communication seems to work. This is what I get in the log

FW: TIKA OCR not working

2015-04-27 Thread Allison, Timothy B.
Trung, I haven't experimented with our OCR parser yet, but this should give a good start: https://wiki.apache.org/tika/TikaOCR . Have you installed tesseract? Tika colleagues, Any other tips? What else has to be configured and how? -Original Message- From: trung.ht [mailto:trung...@

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
It should work out of the box in Solr as long as Tesseract is installed and on the class path. Solr had an issue with it since Tika sends 2 startDocument calls, but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think? ++

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Yes that is fixed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Monday, April 27, 2015 4:29 PM > To: u...@tika.apache.org > Cc:

RE: TIKA OCR not working

2015-04-27 Thread Uwe Schindler
Hi, TIKA OCR is definitely working automatically with Solr 5.x. It is just important to install TesseractOCR on path (which is a native tool that does the actual work). On Ubuntu Linux, this should be quite simple ("apt-get install tesseract-ocr" or like that). You may also need to ainstall add

Field attribute default value

2015-04-27 Thread Steven White
Hi Everyone, I'm looking at https://cwiki.apache.org/confluence/display/solr/Defining+Fields and https://wiki.apache.org/solr/SchemaXml but cannot find an answer, so maybe it is someplace else? I need to know what is the default value for each field attribute (when that attribute is missing). Fo

Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread O. Olson
I can get the standard Solr example to run within Jetty and I can use it through the velocity templates. I'm now thinking of integrating Solr with a couple of existing websites. In this regard, I have the following questions: 1. For a medium sized website (about 100+ concurrent users), what is the

Re: Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread Doug Turnbull
1. Unless usage is very light, you likely want Solr to be on a different server. Its going to have different caching and system needs than your web app. You may also want to scale Solr independently from your web app. Think of it just like you think of a database-- do you want your MySQL instance o

Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Gopal Jee
We have a 26 node solr cloud cluster. During heavy re-indexing, some of nodes go into recovering state. as per current config, soft commit is set to 15 minute and hard commit to 30 sec. Moreover, zkClientTimeout is set to 30 sec in solr nodes. Please advise. Thanks Gopal

Re: Odp.: solr issue with pdf forms

2015-04-27 Thread Erick Erickson
We're still not quite there. There should be a "load term info" button on that page. Clicking that button will show you the terms in your index (as opposed to the raw stored input which is what you get when you look at results in the browser). My bet is that you'll see perfectly normal tokens in th

Re: Field attribute default value

2015-04-27 Thread Erick Erickson
I'd just define these in the fieldType definition explicitly. Then you're absolutely sure what each field has and can override as needed. Best, Erick On Mon, Apr 27, 2015 at 7:56 AM, Steven White wrote: > Hi Everyone, > > I'm looking at > https://cwiki.apache.org/confluence/display/solr/Defining

Re: Integrating Solr with an existing web application - and SolrJ

2015-04-27 Thread O. Olson
Thank you very much Doug. I was thinking of putting Solr on a separate server, but I did not expect you to so strongly recommend Jetty. I think I would stick to the embedded Jetty, because I don't need the security. I'm using Solr 4.10.3 at the moment, so I'm not familiar with Solr 5. Thanks agai

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Rajesh Hazari
our production solr nodes were having similar issue with 4 nodes everything is normal, but when we try to increase the replicas (nodes) to 10 most of then went to recovery. our config params : nodes : 20 (replica in each node) soft commit is 6 sec hard commit is 5 min indexing scheduled time : ever

Re: and stopword in user query is being change to q.op=AND

2015-04-27 Thread Rajesh Hazari
I did go through the documentation of edismax (solr 5.1 documentation), that suggests to use "*stopwords"* query param that signal the parser to respect stopfilterfactory while parsing, still i did not find this is happening. my final query looks like this http://host/solr/collection/select?q=ter

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Shawn Heisey
On 4/27/2015 9:15 AM, Gopal Jee wrote: > We have a 26 node solr cloud cluster. During heavy re-indexing, some of > nodes go into recovering state. > as per current config, soft commit is set to 15 minute and hard commit to > 30 sec. Moreover, zkClientTimeout is set to 30 sec in solr nodes. > Please

Re: Solr node going to recovering state during heavy reindexing

2015-04-27 Thread Rajesh Hazari
thanks, i am sure that we have missed this command line property, this gives me more information on how to use latest solr scripts more effectively. *Thanks,* *Rajesh**.* On Mon, Apr 27, 2015 at 12:04 PM, Shawn Heisey wrote: > On 4/27/2015 9:15 AM, Gopal Jee wrote: > > We have a 26 node solr c

Re: TIKA OCR not working

2015-04-27 Thread Konstantin Gribov
JFYI, there's no tesseract & leptonica for centos6/rhel6 (even in epel), so I have specs for building tesseract and leptonica (its dependency) on github (https://github.com/grossws/tesseract-ocr-specs). Feel free to use if you're on centos/rhel. Also, tesseract language packs are trained for one l

Re: TIKA OCR not working

2015-04-27 Thread Mattmann, Chris A (3980)
Thanks Konstantin! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@na

Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Stephan Schubert
Hi everyone, how is it possible to start solr with an external set of zookeeper instances (quorum of 3 servers) on a windows server (2008R2)? >From the wiki I got ( https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble ) bin\solr restart -c -p 8983 -z sampl

Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Stephan Schubert
Hi everyone, how is it possible to start solr with an external set of zookeeper instances (quorum of 3 servers) on a windows server (2008R2)? >From the wiki I got ( https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble ) bin\solr restart -c -p 8983 -z sampl

Re: Field attribute default value

2015-04-27 Thread Chris Hostetter
the defaults for a come from the specified by the "type" attribute. >From that point, the default behavior of a can vary by the individual FieldType "class" implementation (ie: most fields default to omitTermFreqAndPositions="true" but TextField defaults to false) or by the "version" attri

Re: /suggest through SolrJ?

2015-04-27 Thread Alessandro Benedetti
Just had the very same problem, and I confirm that currently is quite a mess to manage suggestions in SolrJ ! I have to go with manual Json parsing. Cheers 2015-02-02 12:17 GMT+00:00 Jan Høydahl : > Using the /suggest handler wired to SuggestComponent, the > SpellCheckResponse objects are not pop

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Erick Erickson
What version of Solr are you using? 4.10.3? 5.1? And can we see the full output of your attempt to start Solr? There might be some more informative bits above the help response. Best, Erick On Mon, Apr 27, 2015 at 9:49 AM, Stephan Schubert wrote: > Hi everyone, > > how is it possible to start s

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Timothy Potter
Can you try defining the ZK_HOST in bin\solr.in.cmd instead of passing it on the command-line? On Mon, Apr 27, 2015 at 12:10 PM, Erick Erickson wrote: > What version of Solr are you using? 4.10.3? 5.1? > > And can we see the full output of your attempt to start Solr? There > might be some more in

Why are these two queries different?

2015-04-27 Thread Frank li
We did two SOLR qeries and they supposed to return the same results but didnot: Query 1: all_text:(US 4,568,649 A) "parsedquery": "(+((all_text:us ((all_text:4 all_text:568 all_text:649 all_text:4568649)~4))~2))/no_coord", Result: "numFound": 0, Query 2: all_text:(US 4568649) "parsedquery": "(

Re: Solr Cloud reclaiming disk space from deleted documents

2015-04-27 Thread Gili Nachum
To prevent it from re occurring you could monitor index size and once above a certain size threshold add another machine and split the shard between existing and new machine. On Apr 20, 2015 9:10 PM, "Rishi Easwaran" wrote: > So is there anything that can be done from a tuning perspective, to > r

SolrCloud Replication Issue

2015-04-27 Thread Amit L
Hi, A few days ago I deployed a solr 4.9.0 cluster, which consists of 2 collections. Each collection has 1 shard with 3 replicates on 3 different machines. On the first day I noticed this error appear on the leader. Full Log - http://pastebin.com/wcPMZb0s 4/23/2015, 2:34:37 PM SEVERE SolrCmdDist

Re: SolrCloud Replication Issue

2015-04-27 Thread Anshum Gupta
Looks like LeaderInitiatedRecovery or LIR. When a leader receives a document (update) but fails to successfully forward it to a replica, it marks that replica as down and asks the replica to recover (hence the name, Leader Initiated Recovery). It could be due to multiple reasons e.g. network issue/

Solr + RDF = SolRDF

2015-04-27 Thread Andrea Gazzarini
Hi guys, I'd like to share with you a project (actually a hobby for me) where I'm spending my free time, maybe someone could get some idea or benefit from it. https://github.com/agazzarini/SolRDF I called it SolRDF (Solr + RDF): It is a set of Solr extensions for managing (indexing and querying)

Re: SolrCloud Replication Issue

2015-04-27 Thread Amit L
Appreciate the response, to answer your questions. * Do you see this happen often? How often? It has happened twice in five days. The first two days after deployment. * Are there any known network issues? There are no obvious network issues but as these instances reside in AWS i cannot rule it ou

Re: SolrCloud default shard assignment order not correct

2015-04-27 Thread spillane
Shawn, The 4.2 cloud graph ordering turned out not to be a problem, after the first startup of my 5 leaders and 5 replicas their shard assignments were 'fixed' in Zookeeper. I can now start them in any order and get the same graph.

Load balancer for indexing?

2015-04-27 Thread spillane
I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but indexing does not, so my leader1 is getting clobbered. Should my SolrJ app be pointing at a load balancer and if so will indexing via the ConcurrentUpdateSolrServer class still work? -- View this message in conte

Re: Load balancer for indexing?

2015-04-27 Thread Shawn Heisey
On 4/27/2015 3:44 PM, spillane wrote: > I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but > indexing does not, so my leader1 is getting clobbered. Should my SolrJ app > be pointing at a load balancer and if so will indexing via the > ConcurrentUpdateSolrServer class sti

Re: Load balancer for indexing?

2015-04-27 Thread Chris Hostetter
: I manage a SolrCloud with 5 shards. Queries go thru an AWS load balancer but : indexing does not, so my leader1 is getting clobbered. Should my SolrJ app : be pointing at a load balancer and if so will indexing via the : ConcurrentUpdateSolrServer class still work? The "Concurrent" part

more like this generated query

2015-04-27 Thread alxsss
Hello, I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is missing some words. For example, for doc with title: Jennnifer Lopez keywords: Jennifer, concert, Hollywood the parsedquery generated by mlt for this doc is title:lopez keywords:jennifer keywords:concert

Re: Start Solr with multiple external zookeepers on Windows Server?

2015-04-27 Thread Zheng Lin Edwin Yeo
For my version in Solr-5.0.0, I use this command to start: java -DzkHost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar For my setup, the 3 zookeeper servers are running on the same machine, but you can replace the 'localhost' to your server IP addresses, and also replace the ports a

Re: SolrCloud Replication Issue

2015-04-27 Thread Erick Erickson
Amit: The fact that "all instances are using no more than 30%" isn't really indicative of whether or not GC pauses are a problem. If you have a large heap allocated to Java, then the to-be-collected objects will build up and _eventually_ you'll have a stop-the-world GC pause even though each t

Re: TIKA OCR not working

2015-04-27 Thread trung.ht
Hi Uwe, Thanks for the answer, but it looks like it does not work on my machine. I use Mac OS 10.10.3, tesseract is installed through homebrew, and tested with the same file I post to solr. I think tesseract is on path since I run this command successfully: "tesseract test_tesseract.png output"