Recommended ZooKeeper topology in Production

2014-06-09 Thread Gili Nachum
Is there a recommended ZooKeeper topology for production Solr environments? I was planning: 3 ZK nodes, each on its own dedicated machine. Thinking that dedicated machines, separate from Solr servers, would keep ZK isolated from resource contention spikes that may occur on Solr. Also, if a Solr m

Re: solr4 optimization

2014-06-09 Thread Vineet Mishra
As Otis mentioned, its obviously good to run Optimization once in a while or when you are done with most of your heavy indexing operation. Its not concern with the Disk Capacity rather with the IO and seeking in segements, When comparably it has less segments to query the IO operation will be less

Re: Collection communication internally

2014-06-09 Thread Vineet Mishra
Then are there some other alternative so that we can achieve the goal. As querying with this way of set of foreign id is really going to make the query very large and the response is also awaited for long(previously tested with the standalone Solr core with Master Slave Architecture). Thanks!

Re: Integrate solr with openNLP

2014-06-09 Thread Vivekanand Ittigi
Hi Aman, Yeah, We are also thinking the same. Using UIMA is better. And thanks to everyone. You guys really showed us the way(UIMA). We'll work on it. Thanks, Vivek On Fri, Jun 6, 2014 at 5:54 PM, Aman Tandon wrote: > Hi Vikek, > > As everybody in the mail list mentioned to use UIMA you shou

How to simplifying my query for appropriate scoring.

2014-06-09 Thread kritarth.anand
hi all, I need help simplifying my query. The doc structure is as follows. docStructure id A cat : p, q, r id B cat : m, n ,o id C cat: l,b, o Now given this structure my job is to find documents which have cat ids belonging to a list. Right now this is achieved in this fashion using OR of mu

Re: accessing individual elements of a multivalued field

2014-06-09 Thread kritarth.anand
Thanks for the response Jack -- View this message in context: http://lucene.472066.n3.nabble.com/accessing-individual-elements-of-a-multivalued-field-tp4140862p4140911.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: accessing individual elements of a multivalued field

2014-06-09 Thread Jack Krupansky
Not currently. You could have separate explicit fields for the categories such as "cat_1", "cat_2", etc. The data would need to be replicated (possibly using a ), but redundancy to facilitate access is a reasonable approach. -- Jack Krupansky -Original Message- From: kritarth.anand

Re: writing logs of a speicific solr posting to a file

2014-06-09 Thread Sameer Maggon
Check out the patch on the issue below. We hit the same issue and posted a patch, none of the committers have picked it up yet, but would be good to get some feedback on it and get this into the next dot release. If it works for you, please vote it up. https://issues.apache.org/jira/browse/SOLR-59

Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Gili Nachum
> the incoming document rate could be as high as 20k/second... That sounds like a lot of CPU eager indexing work, given the 128 CPU cores available, from indexing speed perspective: would you recommend having a similar number of solr cores created, or Solr does just a when several with a small numb

Re: solr4 optimization

2014-06-09 Thread Otis Gospodnetic
Hi, I don't remember last time I ran optimize. Sure, yes, things will work faster if you optimize an index and reduce the number of segments, but if you are regularly writing to that index and performance is OK, leave it to Lucene segment merges to purge deletes. Otis -- Performance Monitoring *

SolrCloud collection create / delete failure

2014-06-09 Thread John Smodic
Hey guys, I'm trying to simply create collection foo in SolrCloud (to a collection that failed to create once due to a badly formatted schema). I try the following: createCollection foo -> could not create a new core solr/foo_shard1_replica1 as another core is already defined there deleteColle

solr4 optimization

2014-06-09 Thread Joshi, Shital
Hi, We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes. On some of the boxes we have about 5 million deleted docs and we have never run optimization since beginning. Does number of deleted docs have anything to do with performance of query? Should we consider optimization at all i

accessing individual elements of a multivalued field

2014-06-09 Thread kritarth.anand
hi, prod: p cat : catA,catB,catC prod :q cat : catB, catC,catD My schema consists of documents with uid : 'prod's and then they belong can to multiple categories called 'cat' and which are represented as a multivalued field. For a particular kind of query I need to access individual elements se

RE: COMMERCIAL: RE: SolrCloud: facet range option f..facet.mincount=1 omits buckets on response

2014-06-09 Thread Ronald Matamoros
Hi Chris, Created ticket https://issues.apache.org/jira/browse/SOLR-6154 Included to the ticket the data.xml and a PDF with instructions on how to replicate. Sending different updates to different ports was just how the confluence tutorial made the steps; it does not affect the result of the te

Re: SOLR Performance Benchmarking

2014-06-09 Thread Shawn Heisey
On 6/8/2014 12:09 PM, rashi gandhi wrote: > I am using SolrMeter for performance benchmarking. I am able to > successfully test my solr setup up to 1000 queries per min while > searching. > But when I am exceeding this limit say 1500 search queries per min, > facing "Server Refused Connection" in S

Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Shawn Heisey
On 6/8/2014 4:17 PM, shushuai zhu wrote: > I would like to get some advice to setup a Solr Cloud on a set of powerful > machines. The average size of the documents handled by the Solr Cloud is > about 0.5 KB, and the number of documents stored in Solr Cloud could reach > billions. When indexing,

How use gorup and facet ?

2014-06-09 Thread Phi Hoang Hai
Dear Solr expert. I have 2 problems need your help. 1) I have to group list with group.limit=1&group.main=true&group.sort=Date desc (many group and each group has 1 element is newest). Then from list group (each group has 1 element), I want to filter in order to remove items (in groups) not matches

Re: Collection communication internally

2014-06-09 Thread Erick Erickson
My first answer is "don't do it that way" :). Solr works best with flattened (de-normlized) data. If at all possible, you _really_ would be better off combining the two collections and flattening the data even though there would be more data. Whenever I see a question like this, I wonder if you'r

Re: Deepy nested structure

2014-06-09 Thread harikrishna
thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140803.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deepy nested structure

2014-06-09 Thread harikrishna
Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Deepy-nested-structure-tp4140397p4140802.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Any way to view lucene files

2014-06-09 Thread Aman Tandon
Yeah just got it thanks Fracois :) With Regards Aman Tandon On Mon, Jun 9, 2014 at 8:20 PM, François Schiettecatte < fschietteca...@gmail.com> wrote: > Just click the 'Releases' link: > > https://github.com/DmitryKey/luke/releases > > François > > On Jun 9, 2014, at 10:43 AM, Aman Tando

Re: Setup a Solr Cloud on a set of powerful machines

2014-06-09 Thread Erick Erickson
Well, you've omitted information about the most precious resource for Solr, memory. That said, this question is impossible to answer in the abstract, see: http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Sun, Jun 8, 2014 at 3:1

Re: Any way to view lucene files

2014-06-09 Thread François Schiettecatte
Just click the 'Releases' link: https://github.com/DmitryKey/luke/releases François On Jun 9, 2014, at 10:43 AM, Aman Tandon wrote: > No, Anyways thanks Alex, but where is the luke jar? > > With Regards > Aman Tandon > > > On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch > wro

Re: ANN: Solr Next

2014-06-09 Thread Yonik Seeley
On Tue, Jan 7, 2014 at 1:53 PM, Yonik Seeley wrote: [...] > Next major feature: Native Code Optimizations. > In addition to moving more large data structures off-heap(like > UnInvertedField?), I am planning to implement native code > optimizations for certain hotspots. Native code faceting would

Re: Any way to view lucene files

2014-06-09 Thread Aman Tandon
No, Anyways thanks Alex, but where is the luke jar? With Regards Aman Tandon On Mon, Jun 9, 2014 at 6:54 AM, Alexandre Rafalovitch wrote: > Have you looked at: > https://github.com/DmitryKey/luke > > Regards, >Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http:

Collection communication internally

2014-06-09 Thread Vineet Mishra
Hi All, I was curious to know how multiple Collection communication be achieved? If yes then by what means. The use case says, having multiple collection I need to query the first collection and get the unique ids from first collection to query the second one(Foreign Key Relation). Now if the no.

RE: Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Dyer, James
I believe it will return the terms that are most similar to the queried terms but have a greater term frequency than the queried terms. It doesn't actually care what the term frequencies are, only that they are greater than the frequencies of the terms you queried on. I do not know your use ca

Re: Solr Scale Toolkit Access Denied Error

2014-06-09 Thread Mark Gershman
Thanks, Tim. Worked like a charm. Appreciate your timely assistance. On Sat, Jun 7, 2014 at 9:13 PM, Timothy Potter wrote: > Hi Mark, > > Sorry for the trouble! I've now made the ami-1e6b9d76 AMI public; > total oversight on my part :-(. Please try again. Thanks Hoss for > trying to help out o

Re: Customizing Solr; Where to draw the line?

2014-06-09 Thread Jorge Luis Betancourt Gonzalez
I’ve certainly go for the 2nd option. Depending of what you need you won’t need to modify Solr itself but extend it using different plugins for what you need. You’ll need to write different components depending on your specific requirements. I definitely recommend the talks from Trey Grainger, f

Re: How Can I modify the DocList and DocSet in solr

2014-06-09 Thread Alexandre Rafalovitch
Can you make a custom Component? They are pluggable. Regards, Alex On 09/06/2014 6:24 pm, "Vishnu Mishra" wrote: > I am using solr 4.6 and I am using solr Sharding (Distributed Search). I > have > situation where I like to modify the solr search result (DocList and > DocSet) > inside solr Q

Re: Large disjunction query practices

2014-06-09 Thread Jack Krupansky
Are they expecting relevancy ranking or merely seeking to a bulk read of those documents? Please detail what the user is trying to accomplish with such a monster list of IDs. Generally, queries of more than a few dozen terms are a bad idea. If for no other reason than that if you need to debug

Large disjunction query practices

2014-06-09 Thread Joe Gresock
I'm wondering what the best practice for large disjunct queries in Solr is. A user wants to submit a query for several hundred thousand terms, like: (term1 OR term2 OR ... term500,000) I know it might be better to break this up into multiple queries that can be merged on the user's end, but I'm w

Re: SOLR Performance Benchmarking

2014-06-09 Thread Shalin Shekhar Mangar
To be of any help we'd need to know what your documents look like, what your queries look like, what is the specifications of your server? How much heap is dedicated to Solr, how much free memory is available for the OS file cache. You have to figure out the bottleneck. Is it CPU or RAM or Disk? Ma

How Can I modify the DocList and DocSet in solr

2014-06-09 Thread Vishnu Mishra
I am using solr 4.6 and I am using solr Sharding (Distributed Search). I have situation where I like to modify the solr search result (DocList and DocSet) inside solr QueryComponent right after the following method is called from process() method. searcher.search(result, cmd);

writing logs of a speicific solr posting to a file

2014-06-09 Thread pshahukhal
Hi I am using SimplepostTool to post the xml files to SOLR llke : java -Durl=http://localhost:8080/solr/collection1/update -jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/post.jar /var/lib/tomcat6/solr/collection1/dump/xmlinput/solr.xml When there are certain errors ,the response fro

Re: slow performance on simple filter

2014-06-09 Thread mizayah
I'm really at dead point. Mine indeks is 5,6GM and about 8mln documments. Field i'm using for filter is simple as hell. Can it be that other fields affect my search if i only do filter query? solr/puls-objects-prod/select?q=*%3A*&fq=class_name:License mine results: 831 *:* class_name:

Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Alistair
Hello all, I was wondering what does the "onlyMorePopular" option for spellchecking use as its threshold? Will it always pick the suggestion that returns the most queries or does it base its result based off of some threshold that can be configured? Thanks! Ali. -- View this message in conte

Re: Provide value to uniqueID

2014-06-09 Thread ienjreny
Thanks, it is working fine but I had to change the following line to On Mon, Jun 9, 2014 at 9:29 AM, Shalin Shekhar Mangar [via Lucene] < ml-node+s472066n4140715...@n3.nabble.com> wrote: > You can specify the file name as the id by adding a TemplateTransformer on > the entity "x" and specif

Re: Documents Added Not Available After Commit (Both Soft and Hard)

2014-06-09 Thread Shalin Shekhar Mangar
I think this may be the same bug as LUCENE-5289 which was fixed in 4.5.1. Can you upgrade to 4.5.1 and see if that solves the problem? On Fri, Jun 6, 2014 at 7:17 PM, Justin Sweeney wrote: > Hi, > > An application I am working on indexes documents to a Solr index. This Solr > index is setup a