Union and intersection methods in solr DocSet

2015-05-04 Thread Gajendra Dadheech
I have a requirement where i need to find matching docsets for different queries and then do either union or intersection on those docsets. e.g : DocSet docset1 = Searcher.getDocSet(query1) DocSet docset2 = Searcher.getDocSet(query2); Docset finalDocset = docset1.intersection(docset2); Is this a

Re: Solr Cloud

2015-05-04 Thread Anirudha Jadhav
the jmx metrics are good, you can start there, lets talk offline for more. -Ani On Mon, May 4, 2015 at 10:51 PM, Jilani Shaik wrote: > Thanks Shawn, It has provided the pointers of open source, I am really > interested to look for open source solution, I have basic knowledge of > Ganglia and Nag

Re: Solr Cloud

2015-05-04 Thread Jilani Shaik
Thanks Shawn, It has provided the pointers of open source, I am really interested to look for open source solution, I have basic knowledge of Ganglia and Nagios. I have gone through the "sematext" and our company already using "newrelic" on this space. But I am interested in open source similar to

Re: SolrCloud+HDFS disappointed indexing performance

2015-05-04 Thread xinwu
Can someone help me ? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-HDFS-disappointed-indexing-performance-tp4203155p4203852.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Injecting synonymns into Solr

2015-05-04 Thread Zheng Lin Edwin Yeo
Yes, the underlying mechanism uses java. But the collection isn't able to load when the Solr starts up, so it didn't return anything even if I use url. Is it just due to my machine not having enough memory? Regards, Edwin On 4 May 2015 20:12, "Roman Chyla" wrote: > It shouldn't matter. Btw try

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Chris Hostetter
XY-ish problem -- if you are deleting a bunch of documents by id, why have you switched from using delete-by-id to using delete-by-query? What drove that decision? Did you try using delete-by-query in your 3.6 setup? : my f1 field is my key field. It is unique. ... : On my old solr 3

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Shawn, Thanks for your inputs. The 12GB is for solr. I did read through your wiki and your G1 related recommended settings are already included. Tried a lower memory config (7G) as well and it did not result in any better results. Right now, in the process of changing the updates to use Solrj

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina
Hello Chris, yes I confirm on my SOLR3.6 it works fine since several years, and each doc added with same code is updated not added. To be more clear, I receive docs with a field name "pn" and it's the uniqueKey, and it always in uppercase so I must define in my schema.xml required="tru

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Shawn Heisey
On 5/4/2015 2:36 PM, Vinay Pothnis wrote: > But nonetheless, we will give the latest solrJ client + cloudSolrServer a > try. > > * Yes, the documents are pretty small. > * We are using G1 collector and there are no major GCs, but however, there > are a lot of minor GCs sometimes going upto 2s per m

Re: Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Chris Hostetter
: On SOLR3.6, I defined a string_ci field like this: : : : : : : : : : I'm really suprised that field would have worked for you (reliably) as a uniqueKey field even in Solr 3.6. the best practice for something like what you describe has always (going back to S

Solr 5.0 - uniqueKey case insensitive ?

2015-05-04 Thread Bruno Mannina
Dear Solr users, I have a problem with SOLR5.0 (and not on SOLR3.6) What kind of field can I use for my uniqueKey field named "code" if I want it case insensitive ? On SOLR3.6, I defined a string_ci field like this: and it works fine. - If I add a document with

Re: Answer engine - NLP related question

2015-05-04 Thread Upayavira
What you seem to be asking for is POS (parts of speech) analysis. You can use OpenNLP to do that for you, likely outside of Solr. OpenNLP will identify nouns, verbs, etc in your sentences. The question is, can you identify certain of those types to be filtered out from your queries? A simple bit

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Chris Hostetter
: I need to run solr 5.1.0 on port 80 with some basic apache authentication. : Normally, under earlier versions of solr I would set it up to run under : tomcat, then connect it to apache web server using mod_jk. the general gist of what you should look into is running Solr (via ./bin/solr) on so

Re: Optimal configuration for high throughput indexing

2015-05-04 Thread Vinay Pothnis
Hi Erick, Thanks for your inputs. I think long before we had made a conscious decision to skip solrJ client and use plain http. I think it might have been because at the time solrJ client was queueing update in its memory or something. But nonetheless, we will give the latest solrJ client + clou

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey
On 5/4/2015 1:50 PM, Tim Dunphy wrote: > However it sounds like you're sure it's supposed to work this way. Can > I get some advice on this error? If you tried copying JUST the .war file with any version from 4.3 on, something similar would happen. At the request of many of our more advanced user

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy
> > The container in the default 5.x install is a completely unmodified > Jetty 8.x (soon to be Jetty 9.x) with a stripped and optimized config. > The config for Jetty is similar to tomcat, you just need to figure out > how to make it work with Apache like you would with Tomcat. > > Incidentially,

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
ok, I note all these information, thanks ! I will update if it's needed. 2go seems to be ok. Le 04/05/2015 18:46, Shawn Heisey a écrit : On 5/4/2015 10:28 AM, Bruno Mannina wrote: solr@linux:~$ java -version java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubunt

Re: apache 5.1.0 under apache web server

2015-05-04 Thread Shawn Heisey
On 5/4/2015 1:04 PM, Tim Dunphy wrote: > I need to run solr 5.1.0 on port 80 with some basic apache authentication. > Normally, under earlier versions of solr I would set it up to run under > tomcat, then connect it to apache web server using mod_jk. > > However 5.1.0 seems totally different. I see

apache 5.1.0 under apache web server

2015-05-04 Thread Tim Dunphy
Hey all, I need to run solr 5.1.0 on port 80 with some basic apache authentication. Normally, under earlier versions of solr I would set it up to run under tomcat, then connect it to apache web server using mod_jk. However 5.1.0 seems totally different. I see that tomcat support has been removed

Answer engine - NLP related question

2015-05-04 Thread bbarani
Hi, Note: I have very basic knowledge on NLP.. I am working on an answer engine prototype where when the user enters a keyword and searches for it we show them the answer corresponding to that keyword (rather than displaying multiple documents that match the keyword) For Ex: When user searches

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey
On 5/4/2015 10:28 AM, Bruno Mannina wrote: > solr@linux:~$ java -version > java version "1.7.0_79" > OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) > OpenJDK Server VM (build 24.79-b02, mixed mode) > solr@linux:~$ > > solr@linux:~$ uname -a > Linux linux 3.13.0-51-generic

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Shaun thanks a lot for this comment, So, I have this information, no information about 32 or 64 bits... solr@linux:~$ java -version java version "1.7.0_79" OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2) OpenJDK Server VM (build 24.79-b02, mixed mode) solr@linux:~$ sol

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Shawn Heisey
On 5/4/2015 9:09 AM, Bruno Mannina wrote: > Yes ! it works !!! > > Scott perfect > > For my config 3g do not work, but 2g yes ! If you can't start Solr with a 3g heap, chances are that you are running a 32-bit version of Java. A 32-bit Java cannot go above a 2GB heap. A 64-bit JVM requires

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
Yes it was that ! I increased the SOLR_JAVA_MEM to 2g (with 8Go Ram i do more, 3g fail to run solr on my brand new computer) thanks ! Le 04/05/2015 17:03, Shawn Heisey a écrit : On 5/4/2015 8:38 AM, Bruno Mannina wrote: ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryEr

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Yes ! it works !!! Scott perfect For my config 3g do not work, but 2g yes ! Thanks Le 04/05/2015 16:50, Scott Dawson a écrit : Bruno, You have the wrong kind of dash (a long dash) in front of the Xmx flag. Could that be causing a problem? Regards, Scott On Mon, May 4, 2015 at 5:06 AM,

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey
On 5/4/2015 8:38 AM, Bruno Mannina wrote: > ok I have this OOM error in the log file ... > > # > # java.lang.OutOfMemoryError: Java heap space > # -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh > 8983/home/solr/solr-5.0.0/server/logs" > # Executing /bin/sh -c "/home/solr/solr-5.0.0

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
I increase the formdataUploadLimitInKB to 2048000 and the problem is the same, same error an idea ? Le 04/05/2015 16:38, Bruno Mannina a écrit : ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/

Re: Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Scott Dawson
Bruno, You have the wrong kind of dash (a long dash) in front of the Xmx flag. Could that be causing a problem? Regards, Scott On Mon, May 4, 2015 at 5:06 AM, Bruno Mannina wrote: > Dear Solr Community, > > I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR > 5.0, Java 7 >

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
ok I have this OOM error in the log file ... # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="/home/solr/solr-5.0.0/bin/oom_solr.sh 8983/home/solr/solr-5.0.0/server/logs" # Executing /bin/sh -c "/home/solr/solr-5.0.0/bin/oom_solr.sh 8983/home/solr/solr-5.0.0/server/lo

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Thanks Shawn.. yeah regular optimize might be the route we take, if this becomes a recurring issue. I remember in our old multicore deployment CPU used to spike and the core almost became non responsive. My guess with solr cloud architecture, any slack by leader while optimizing is picked up

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran
Walter, Unless I am missing something here.. I completely get that, when a few segment merges solr requires 2x space of segments to accomplish this. Usually any index has multiple segments files so this fragmented 2x space consumption is not an issue, even as merged segments grow bigger. But w

How to get exact match along with text edge_ngram

2015-05-04 Thread Vishal Swaroop
We have item_name indexed as text edge_ngram which returns like results... Please suggest what will be the best approach (like "string" index (in addition to "...edge_ngram"... or using copyField...) to search ALSO for exact matches? e.g. url should return item_name as "abc" entries only... I tri

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Shawn Heisey
On 5/4/2015 6:29 AM, Steven White wrote: > Thanks Doug. This is extremely helpful. It is much appreciated that you > took the time to write it all. > > Do we have a Solr / Lucene wiki with such "did you know?" write ups? If > not, just having this kind of knowledge in an email isn't good enough

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Walter Underwood
One segment is in-use, being searched. That segment (and others) are merged into a new segment. After the new segment is ready, searches are directed to the new copy and the old copies are deleted. That is how two copies are needed. If you cannot provide 2X the disk space, you will not have a s

Re: Solr Cloud

2015-05-04 Thread Shawn Heisey
On 5/4/2015 6:16 AM, Jilani Shaik wrote: > Do we have any monitoring tools for Apache Solr Cloud? similar to Apache > Ambari which is used for Hadoop Cluster. > > Basically I am looking for tool similar to Apache Ambari, which will give > us various metrics in terms of graphs and charts along with

Re: Injecting synonymns into Solr

2015-05-04 Thread Shawn Heisey
On 5/4/2015 12:07 AM, Zheng Lin Edwin Yeo wrote: > Would like to check, will this method of splitting the synonyms into > multiple files use up a lot of memory? > > I'm trying it with about 10 files and that collection is not able to be > loaded due to insufficient memory. > > Although currently

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Shawn Heisey
On 5/4/2015 4:55 AM, Rishi Easwaran wrote: > Sadly with the size of our complex, spiting and adding more HW is not a > viable long term solution. > I guess the options we have are to run optimize regularly and/or become > aggressive in our merges proactively even before solr cloud gets into thi

Re: Delete document stop my solr 5.0 ?!

2015-05-04 Thread Shawn Heisey
On 5/4/2015 3:19 AM, Bruno Mannina wrote: > All work fine but each Tuesday I need to delete some docs inside, so I > create a batch file > with inside line like this: > /home/solr/solr-5.0.0/bin/post -c docdb -commit no -d > "f1:58644" > /home/solr/solr-5.0.0/bin/post -c docdb -commit no -d > "f1

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-05-04 Thread Steven White
Thanks Doug. This is extremely helpful. It is much appreciated that you took the time to write it all. Do we have a Solr / Lucene wiki with such "did you know?" write ups? If not, just having this kind of knowledge in an email isn't good enough as it won't be as searchable as a wiki. Steve On

Solr Cloud

2015-05-04 Thread Jilani Shaik
Hi All, Do we have any monitoring tools for Apache Solr Cloud? similar to Apache Ambari which is used for Hadoop Cluster. Basically I am looking for tool similar to Apache Ambari, which will give us various metrics in terms of graphs and charts along with deep details for each node in Hadoop clu

Re: Injecting synonymns into Solr

2015-05-04 Thread Roman Chyla
It shouldn't matter. Btw try a url instead of a file path. I think the underlying loading mechanism uses java File , it could work. On May 4, 2015 2:07 AM, "Zheng Lin Edwin Yeo" wrote: > Would like to check, will this method of splitting the synonyms into > multiple files use up a lot of memory?

Re: Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Toke Eskildsen
On Mon, 2015-05-04 at 10:03 +0100, Vijay Bhoomireddy wrote: > Just wondering whether there is a provision to store SolrCloud index data on > Amazon S3? Please let me know any pointers. Not to my knowledge. >From what I can read, Amazon S3 is intended for bulk data and has really poor latency. For

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Rishi Easwaran
Thanks for the responses Mark and Ramkumar. The question I had was, why does Solr need 2 copies at any given time, leading to 2x disk space usage. Not sure if this information is not published anywhere, and makes HW estimation almost impossible for large scale deployment. Even if the copies

Re: Solr Cloud reclaiming disk space from deleted documents

2015-05-04 Thread Rishi Easwaran
Sadly with the size of our complex, spiting and adding more HW is not a viable long term solution. I guess the options we have are to run optimize regularly and/or become aggressive in our merges proactively even before solr cloud gets into this situation. Thanks, Rishi. -Orig

Delete document stop my solr 5.0 ?!

2015-05-04 Thread Bruno Mannina
Dear Solr Users, I have a brand new computer where I installed Ubuntu 14.04, 8Go RAM, SOLR 5.0, Java 7 I indexed 92 000 000 docs (little text file ~2ko each) I have around 30 fields All work fine but each Tuesday I need to delete some docs inside, so I create a batch file with inside line like t

Solr 5.0, Ubuntu 14.04, SOLR_JAVA_MEM problem

2015-05-04 Thread Bruno Mannina
Dear Solr Community, I have a recent computer with 8Go RAM, I installed Ubuntu 14.04 and SOLR 5.0, Java 7 This is a brand new installation. all work fine but I would like to increase the JAVA_MEM_SOLR (40% of total RAM available). So I edit the bin/solr.in.sh # Increase Java Min/Max Heap as

Storing SolrCloud index data in Amazon S3

2015-05-04 Thread Vijay Bhoomireddy
Hi, Just wondering whether there is a provision to store SolrCloud index data on Amazon S3? Please let me know any pointers. Regards Vijay -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please del

Editing the Solr Wiki

2015-05-04 Thread Nicole Butterfield
Dear Solr Admins, I'm writing on behalf of Manning Publications regarding the Solr wiki page:  https://wiki.apache.org/solr/.  I would like to edit the book listings on the Solr wiki to include our new MEAP "Taming Search": http://www.manning.com/turnbull/. I have already set up an account with t