Re: Sold Integration tests

2015-12-09 Thread Andrea Gazzarini
Hi Sathyakumar, better, as the previous example talks about RDF and some other irrelevant stuff in this context: this is [1] the first chapter of my book, where I run a sample Integration test using Solr 4.10.x, and this is [2] the Github project that contains the code described in the book. HTH,

Re: Nested Docs issue

2015-12-09 Thread Bogdan Marinescu
Any suggestions about this ? On 12/04/2015 08:26 AM, Bogdan Marinescu wrote: Hi Mikhail, I would expect the same behaviour as for a database. Meaning if I have a field declared as an uniqueKey, then there should only be one document with that key, regardless if it has a child or not. If you

Indexing of annotated corpora

2015-12-09 Thread Emmanuel CARTIER
Hi, I am a newbie in Solr and I would like to know 1. The most efficient way(s?) to index annotated corpora with Linguistic information at the token and chunk levels. My documents are in XML and has the following structure: I am a weak newbie ... My main use case is to be able to sear

Re: Sold Integration tests

2015-12-09 Thread Andrea Gazzarini
In that case is even easier, you can use the Cargo maven plugin and failsafe. This is an example [1] but the point is: - configure Cargo to start a container (e.g. jetty, tomcat) with Solr deployed, in the pre-integration-test phase - execute any *IT test case class - stop the container in the

Re: Solr Integration tests

2015-12-09 Thread Sathyakumar Seshachalam
Am giving up on this, and resorting to my own test framework using JettySolrRunner Before a Suite runs. Facing quite a few hurdles - 1. When I run tests from IDE I get, java.lang.AssertionError: fix your classpath to have tests-framework.jar before lucene-core.jar 2. Gradle "test" works Ok, But ge

Re: Sold Integration tests

2015-12-09 Thread Sathyakumar Seshachalam
Hello Andrea, Thanks for the link. Am running 4.10.3. However the test-framework classes haven¹t changed much. I will give this a try. On 09/12/15, 4:11 PM, "Andrea Gazzarini" wrote: >Hi Sathyakumar, >check this post [1] (assuming you're using Solr 5.x), maybe it can help. > >Andrea > >[1] >ht

Re: JVM error v ~StubRoutines::jbyte_disjoint_arraycopy

2015-12-09 Thread Binoy Dalal
According to this post on stackoverflow: http://stackoverflow.com/questions/18136108/java-jvm-crashes-before-running-my-program the SIGBUS (0x7) error has to do with insufficient disk space in the /tmp directory. See if there's enough space there, else try making some. On Thu, Dec 10, 2015 at 5:59

JVM error v ~StubRoutines::jbyte_disjoint_arraycopy

2015-12-09 Thread abhayd
hi we are using solr 4.10. Jave version: Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13) We have been running this setup for more than a year now. Suddenly we started getting errors during startup # # A fatal error has been detected by the Java Runtime Environment: # # SIGBUS

Re: Fully automated replica creation in AWS

2015-12-09 Thread Jeff Wartes
It’s a pretty common misperception that since solr scales, you can just spin up new nodes and be done. Amazon ElasticSearch and older solrcloud getting-started docs encourage this misperception, as does the HDFS-only autoAddReplicas flag. I agree that auto-scaling should be approached carefully,

Re: Fully automated replica creation in AWS

2015-12-09 Thread Erick Erickson
bq: As a side note, we do this for our customers as that's baked into our cloud provisioning software, Exactly, nothing OOB is there, but all the data is available, you "just" have to write a tool that knows where to look ;) That said, this would certainly be something that would have to be option

Re: Fully automated replica creation in AWS

2015-12-09 Thread Jean-Sebastien Vachon
Not sure if this will meet all your needs but you can probably do most of the work using AWS lambda. I haven't used it personally but it is supposed to launch custom code following some events. I guess you could create a small Java class to do the required work following the birth of a new serv

Re: Fully automated replica creation in AWS

2015-12-09 Thread Sameer Maggon
Erick, Typically, while creating collections, a replicationFactor is specified. Thus, the meta data about the collection does have information about what the "desired" replicationFactor is for the collection. If that's the case, when a Solr node joins the cluster, there could be a pro-active add-r

Re: secure solr 5.3.1

2015-12-09 Thread Ishan Chattopadhyaya
I don't have much personal experience with setting up a kerberos server on a Windows machine, but I remember things being painful when I tried and failed once. If you have an option to use a VM, I suggest try setting up the KDC in a GNU/Linux VM (through VirtualBox). In that case, make sure the Win

Re: Grouping by simhash signature

2015-12-09 Thread Nickolay41189
Maybe there is some way to override equals function of grouping (change "==" to strdist)? -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-by-simhash-signature-tp4243236p4244541.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: secure solr 5.3.1

2015-12-09 Thread Ishan Chattopadhyaya
Alternatively, you could also try using BasicAuth, which doesn't require this additional setup of a KDC. On Thu, Dec 10, 2015 at 1:04 AM, Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > I don't have much personal experience with setting up a kerberos server on > a Windows machine, but

Re: Fully automated replica creation in AWS

2015-12-09 Thread Erick Erickson
Not that I know of. The two systems are somewhat disconnected. AWS doesn't know that Solr lives on those nodes, it's just spinning one up, right? Albeit with Solr running. There's nothing in Solr that auto-detects the existence of a new Solr node and automagically assigns collections and/or repli

Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
I install MIT Kerberos for Windows 4.0.1 2015-12-09 19:05 GMT+00:00 Ishan Chattopadhyaya : > What exactly is your confusion? Do you have access to a KDC? > > Briefly: > Login to your KDC server, do a kadmin.local: > Then, > addprinc HTTP/192.168.0.107 > ktadd -k /tmp/107.keytab HTTP/192.168.0.107

Re: secure solr 5.3.1

2015-12-09 Thread Ishan Chattopadhyaya
What exactly is your confusion? Do you have access to a KDC? Briefly: Login to your KDC server, do a kadmin.local: Then, addprinc HTTP/192.168.0.107 ktadd -k /tmp/107.keytab HTTP/192.168.0.107 Then copy the keytab file to your solr node to the appropriate places. On Thu, Dec 10, 2015 at 12:08 A

Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-09 Thread Susheel Kumar
Yes, Either look into log files as Eric suggested or run with -f and see the startup error on the console. Kill any existing instance or remove any old PID file before starting with -f. Thnx On Wed, Dec 9, 2015 at 12:46 PM, Erick Erickson wrote: > What does the Solr log file say? Often this is

Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
I folow this two resources and Iam stuck in - Create service principals and keytab files. 2015-12-09 18:06 GMT+00:00 Ishan Chattopadhyaya : > The kerberos plugin is available for use with Solr out of the box. The two > resources which Bosco mentioned should get you up and running. > > On Wed

kerberos and solr5 Service Principals and Keytab Files

2015-12-09 Thread kostali hassan
I am trying to secure solr using kerberos plugin , I want test kerberos in localhost but i dont know how create kerberos principal At the KDC server.and where generate keytab file from the KDC server’s /tmp/107.keytab.

Fully automated replica creation in AWS

2015-12-09 Thread Ugo Matrangolo
Hi, I was trying to setup a SolrCloud cluster in AWS backed by an ASG (auto scaling group) serving a replicated collection. I have just came across a case when one of the Solr node became unresponsive with AWS killing it and spinning a new one. Unfortunately, this new Solr node did not join as a

Re: secure solr 5.3.1

2015-12-09 Thread Ishan Chattopadhyaya
The kerberos plugin is available for use with Solr out of the box. The two resources which Bosco mentioned should get you up and running. On Wed, Dec 9, 2015 at 11:34 PM, Don Bosco Durai wrote: > There are two resources available: > > > https://cwiki.apache.org/confluence/display/solr/Kerberos+A

Re: secure solr 5.3.1

2015-12-09 Thread Don Bosco Durai
There are two resources available: https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin https://cwiki.apache.org/confluence/display/RANGER/How+to+configure+Solr+Cloud+with+Kerberos+for+Ranger+0.5 Bosco On 12/9/15, 3:14 AM, "kostali hassan" wrote: >how I settin

Re: Increasing Solr5 time out from 30 seconds while starting solr

2015-12-09 Thread Erick Erickson
What does the Solr log file say? Often this is the result of Solr being in a weird state due to startup problems. Best, Erick On Tue, Dec 8, 2015 at 8:43 PM, Debraj Manna wrote: > . After failed attempt to start solr if I try to start solr again on same > port it says solr is already running. Tr

Re: capacity of storage a single core

2015-12-09 Thread Erick Erickson
I object to the question. And the advice. And... ;). Practically, IMO guidance that "the entire index should fit into memory" is misleading, especially for newbies. Let's break it down: 1> "the entire index". What's this? The size on disk? 90% of that size on disk may be stored data which uses v

Re: capacity of storage a single core

2015-12-09 Thread Susheel Kumar
Thanks, Jack for quick reply. With Replica / Shard I mean to say on a given machine there may be two/more replicas and all of them may not fit into memory. On Wed, Dec 9, 2015 at 11:00 AM, Jack Krupansky wrote: > Yes, there are nuances to any general rule. It's just a starting point, and > your

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Jack Krupansky
You can also use Solr Cell to send entire PDF or office documents: https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika -- Jack Krupansky On Wed, Dec 9, 2015 at 3:09 AM, subinalex wrote: > Hi, > > I am a solr newbie,just got a quick question. > > SOLR

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Walter Underwood
Often Solr documents are “semi-structured”. They have some structured fields and some free-text fields. e-mail messages are like that, with structured headers and an unstructured body. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 9, 2015, at

Re: capacity of storage a single core

2015-12-09 Thread Jack Krupansky
Yes, there are nuances to any general rule. It's just a starting point, and your own testing will confirm specific details for your specific app and data. For example, maybe you don't query all fields commonly, so each field-specific index may not require memory or not require it so commonly. And,

Re: capacity of storage a single core

2015-12-09 Thread Susheel Kumar
Hi Jack, Just to add, OS Disk Cache will still make query performant even though entire index can't be loaded into memory. How much more latency compare to if index gets completely loaded into memory may vary depending to index size etc. I am trying to clarify this here because lot of folks takes

RE: Solr Heap memory vs. OS memory

2015-12-09 Thread Markus Jelsma
Yes. This is still accurate, Lucene still relies on memory mapped files. And Solr usually doesn't require that much RAM, except if you have lots of massive cache entries. Markus -Original message- > From:Kelly, Frank > Sent: Wednesday 9th December 2015 16:19 > To: solr-user@lucene.apac

Solr Heap memory vs. OS memory

2015-12-09 Thread Kelly, Frank
Hi Folks, I was wondering if this link I found recommended by Erick is still accurate (for Solr 5.3.1) "For configuring your Java VM, you should rethink your memory requirements: Give only the really needed amount of heap space and leave as much as possible to the O/S. As a rule of thumb: Don

RE: Solr memory usage

2015-12-09 Thread Markus Jelsma
Steven - this fluctuation is normal, it is eating memory when documents are indexed or when searches are handled, this makes the meter go up. The garbage collector then frees the memory again. You can start to worry if there is a lot of activity but no fluctuation. M. -Original message---

Re: Solr memory usage

2015-12-09 Thread Steven White
Thanks Erick!! Your summary and the blog by Uwe (thank you too Uwe) are very helpful. A follow up question. I also noticed the "JVM-Memory" report off Solr's home page is fluctuating. I expect some fluctuation, but it kinda worries me when it fluctuates up / down in a range of 4 GB and maybe mo

Re: Unstructured/Structured data for indexing

2015-12-09 Thread Alexandre Rafalovitch
Don't think about indexing so much, think about searching. Say you are searching a video? What does that mean? Do you want to match random sequence of binary values that represent inter-frame change? Probably not. When you answer what you want to actually search (title? length? subscripts?), you w

Re: secure solr 5.3.1

2015-12-09 Thread kostali hassan
how I setting up Solr to use Kerberos ? i have to dowload kerberos and put the plug-in implementation in the classpath(/server/solr). 2015-12-08 22:19 GMT+00:00 Ishan Chattopadhyaya : > Right, as Bosco said, this has been tested well and supported on SolrCloud. > It should be possible to run it i

Re: Sold Integration tests

2015-12-09 Thread Andrea Gazzarini
Hi Sathyakumar, check this post [1] (assuming you're using Solr 5.x), maybe it can help. Andrea [1] http://andreagazzarini.blogspot.it/2015/10/how-to-do-integration-tests-with-solr.html 2015-12-09 11:32 GMT+01:00 Sathyakumar Seshachalam < sathyakumar_seshacha...@trimble.com>: > Are there any do

Sold Integration tests

2015-12-09 Thread Sathyakumar Seshachalam
Are there any documentations around Solr test framework. (http://mvnrepository.com/artifact/org.apache.solr/solr-test-framework) I am looking to do integration tests to just check if I am able to add document and search it via JUnitTests. There does seem to be a test-framework from solr, but th

Re: integrate solr with preprocessor tools

2015-12-09 Thread Emir Arnautovic
Hi Sara, You need to wrap your code in tokenizer or token filter https://wiki.apache.org/solr/SolrPlugins If you want to improve existing and believe others can benefit from improvement, you can open ticket and submit patch. Thanks, Emir On 09.12.2015 10:41, sara hajili wrote: hi i wanna t

integrate solr with preprocessor tools

2015-12-09 Thread sara hajili
hi i wanna to use solr , and language of my documents that i stored in solr is persian. solr doesn't support persian as well as i want.so i find preprocessor tools like a normalization,tockenizer and etc ... i don't want to use solr persian filter like persian tockenizer,i mean i wanna to improve i

Re: Unstructured/Structured data for indexing

2015-12-09 Thread subinalex
Thanks jurgen...for clarifying...:-) On 9 Dec 2015 2:06 pm, Jürgen Wagner (DVT)" [via Lucene]" < ml-node+s472066n4244411...@n3.nabble.com> wrote: > Subin, > Only the envelope is structured. What's inside the individual fields of > the structure may be single values (possibly considered structure

Re: Unstructured/Structured data for indexing

2015-12-09 Thread DVT
Subin, Only the envelope is structured. What's inside the individual fields of the structure may be single values (possibly considered structured meta-data) or unstructured (like free text or other fields with informal semantics). Even if you pass a 5-hour video as a major case of unstructured d

Unstructured/Structured data for indexing

2015-12-09 Thread subinalex
Hi, I am a solr newbie,just got a quick question. SOLR is designed for querying unstructured data,but then why we have to send it in a structured form(json,xml) for indexing?. Thanks & Regards,S Subin -- View this message in context: http://lucene.472066.n3.nabble.com/Unstructured-Structure