Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
1. Yes, the default for mm is 1. 2. It depends on what you are really trying to do - you haven't told us. Generally, mm=1 is equivalent to q.op=OR, and mm=100% is equivalent to q.op=AND. Generally, use q.op unless you really know what you are doing. Generally, the intent of mm is to set the

Re: eDismax parser and the mm parameter

2014-03-30 Thread Ahmet Arslan
Hi, Using mm=1 with (e)dismax is not a good idea. Your user will be unhappy.  Because there in no coord factor with this parser. coord is about : "Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms." I suggest yo

Re: eDismax parser and the mm parameter

2014-03-30 Thread simpleliving...@gmail.com
Thanks Ahmet. So if its single term query like 'Ginseng' what does a mm=3 do to the query .I am guessing it would be reduced to 1 automatically in this case. Sent from my HTC - Reply message - From: "Ahmet Arslan" To: "solr-user@lucene.apache.org" Subject: eDismax parser and the mm pa

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack! I understand the intent of mm parameter, my question is that since the query terms being provided are not of fixed length I do not know what the mm should like for example "Ginseng","Siberian Ginseng" are my search terms. The first one can have an mm upto 1 and the second one can have

Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
It still depends on your objective - which you haven't told us yet. Show us some use cases and detail what your expectations are for each use case. The edismax phrase boosting is probably a lot more useful than messing around with mm. Take a look at pf, pf2, and pf3. See: http://wiki.apache.o

Re: Context-aware suggesters in Solr

2014-03-30 Thread Alan Woodward
Thanks Areek. So looking at the code in trunk, exposing it to Solr looks to be pretty straightforward - just extending DocumentDictionaryFactory to take a 'contextField' parameter as well, and passing that on to the DocumentDictionary constructor. I'll give it a go! Thanks again. Alan Woodwa

SolrCloud OR distributed Solr

2014-03-30 Thread Priti Solanki
Hello Member, Is there any difference between distributed solr & solrCloud ? Consider I have three countries' product. I have indexed one country data and it's index size is 160 gb+ Now we have other two countries and now I am confused ! My client ask me what is the difference if we procure ano

Re: SolrCloud OR distributed Solr

2014-03-30 Thread Gora Mohanty
On 30 March 2014 23:12, Priti Solanki wrote: > > Hello Member, > > Is there any difference between distributed solr & solrCloud ? You might be confusing the older Solr distributed search with the new SolrCloud: * Older distributed search: https://wiki.apache.org/solr/DistributedSearch * SolrCloud

Re: SolrCloud OR distributed Solr

2014-03-30 Thread Erick Erickson
Distributed solr is simply the ability for Solr to take the incoming query and send it to multiple shards, then aggregate the response. Here a "shard" is a physical partition of a single logical index. The assumption is that you can't fit the entire index on a single machine and still get the perfo

Re: zookeeper reconnect failure

2014-03-30 Thread Mark Miller
We don’t currently retry, but I don’t think it would hurt much if we did - at least briefly. If you want to file a JIRA issue, that would be the best way to get it in a future release. --  Mark Miller about.me/markrmiller On March 28, 2014 at 5:40:47 PM, Michael Della Bitta (michael.della.bi.

Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-30 Thread Rishi Easwaran
RAM shouldn't be a problem. I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. There are 9 instances wrting to 1TB of SSD disk space. Other 3 are writing to SATA drives, and have autosoftcommit disabled. -Original Message- From: Shawn Heisey To: solr-user

Re: SOLR Cloud 4.6 - PERFORMANCE WARNING: Overlapping onDeckSearchers=2

2014-03-30 Thread Shawn Heisey
On 3/30/2014 2:59 PM, Rishi Easwaran wrote: > RAM shouldn't be a problem. > I have a box with 144GB RAM, running 12 instances with 4GB Java heap each. > There are 9 instances wrting to 1TB of SSD disk space. > Other 3 are writing to SATA drives, and have autosoftcommit disabled. This brought up

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jacks Thanks Again, I am searching Chinese medicine documents , as the example I gave earlier a user can search for "Ginseng" or Siberian Ginseng or Red Siberian Ginseng , I certainly want to use pf parameter (which is not driven by mm parameter) , however for giving higher score to documents th

Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
If you use pf, pf2, and pf3 and boost appropriately, the effects of mm will be dwarfed. The general goal is to assure that the top documents really are the best, not to necessarily limit the total document count. Focusing on the latter could be a real waste of time. It's still not clear why

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Jack, I mis-stated the problem , I am not using the OR operator as default now(now that I think about it it does not make sense to use the default operator OR along with the mm parameter) , the reason I want to use pf and mm in conjunction is because of my understanding of the edismax parser and

Re: eDismax parser and the mm parameter

2014-03-30 Thread Jack Krupansky
The mm parameter is really only relevant when the default operator is OR or explicit OR operators are used. Again: Please provide your use case examples and your expectations for each use case. It really doesn't make a lot of sense to prematurely focus on a solution when you haven't clearly de

Re: eDismax parser and the mm parameter

2014-03-30 Thread S.L
Thanks Jack , my use cases are as follows. 1. Search for "Ginseng" everything related to ginseng should show up. 2. Search For "White Siberian Ginseng" results with the whole phrase show up first followed by 2 words from the phrase followed by a single word in the phrase 3. Fuzzy S

how to index 20 MB plain-text xml

2014-03-30 Thread Floyd Wu
I have many plain text xml that I transfer to form of solr xml format. But every time I send them to solr, I hit OOM exception. How to configure solr to "eat" these big xml? Please guide me a way. Thanks floyd

Re: how to index 20 MB plain-text xml

2014-03-30 Thread Alexandre Rafalovitch
Without digging too deep into why exactly this is happening, here are the general options: 0. Are you actually committing? Check the messages in the logs and see if the records show up when you expect them too. 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP buffer that's blo

Re: how to index 20 MB plain-text xml

2014-03-30 Thread Floyd Wu
Hi Alex, Thanks for your responding. Personally I don't want to feed these big xml to solr. But users wants. I'll try your suggestions later. Many thanks. Floyd 2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch : > Without digging too deep into why exactly this is happening, here are > the ge

Re: how to index 20 MB plain-text xml

2014-03-30 Thread primoz . skale
Hi! I had the same issue with XML files. Even small XML files produced OOM exception. I read that the way XMLs are parsed can sometimes blow up memory requirements to such values that java runs out of heap. My solution was: 1. Don't parse XML files 2. Parse only small XML files and hope for th