Re: SolrJ and Lucene queries

2014-10-21 Thread Ramzi Alqrainy
Yeah, it's a shame such a ser/deser feature isn't available in Lucene. My idea is to have a separate module that the Query classes can delegate to for serialization and deserialization, handling recursion for nested query objects, and then have modules for XML, JSON, and a pseudo-Java functiona

Re: Suggester on Dynamic fields

2014-10-21 Thread Ramzi Alqrainy
Use query() to have Solr search for results. You have to pass a SolrQuery object that describes the query, and you will get back a QueryResponse (from the org.apache.solr.client.solrj.response package). SolrQuery has methods that make it easy to add parameters to choose a request handler and send p

Re: mark solr documents as duplicates on hashing the combination of some fields

2014-10-21 Thread Chris Hostetter
you can still use the SignatureUpdateProcessorFactory for your usecase, just don't configure teh signatureField to be the same as your uniqueKey field. configure some othe fieldname (ie "signature") instead. : Date: Tue, 14 Oct 2014 12:08:26 +0330 : From: Ali Nazemian : Reply-To: solr-user@l

Re: SolrJ and Lucene queries

2014-10-21 Thread mmastroianni
Thanks for the reply. The issue I have is trying to figure out how to either translate my large programmatically generated lucene query to a string i can set as the q parameter (which is non-trivial, since the toString methods on lucene queries don't necessarily produce a parseable string), or ge

Re: SolrJ and Lucene queries

2014-10-21 Thread Ramzi Alqrainy
Use query() to have Solr search for results. You have to pass a SolrQuery object that describes the query, and you will get back a QueryResponse (from the org.apache.solr.client.solrj.response package). SolrQuery has methods that make it easy to add parameters to choose a request handler and send p

Re: Facet counts and RankQuery

2014-10-21 Thread Erick Erickson
This is contrived I admit, but let's say you have a query with 100 hits with a score distribution of 1 doc with a score of 100 98 docs with a score of 91 1 doc with a score of 1 Now I get 99 docs in my results set. Next I delete the doc that scored 1 and my returned doc set _for the exact same que

SolrJ and Lucene queries

2014-10-21 Thread mmastroianni
I have an existing application using raw lucene that does some entity extraction on a raw query and mixes in some other params to augment or replace pieces of a large boolean query that it then constructs, which is a mix of term queries, range queries, and recursiveprefixtree queries. I'm now swi

Re: Nested documents in Solr

2014-10-21 Thread Ramzi Alqrainy
I think if I have your question right, You can use multiple custom query syntax. You explicitly specify an alternative query parser such as DisMax or eDisMax, you're using the standard Lucene query parser by default. In your case, I think I can solve it by using this query chapter_title:Introducti

Re: Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi Joel, Thanks for the pointer. Can you point me to any example implementation. Parvesh Garg, Founding Architect http://www.zettata.com On Tue, Oct 21, 2014 at 9:32 PM, Joel Bernstein wrote: > The RankQuery cannot be used as filter. It is designed for custom > ordering/ranking of results on

Re: Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi Erick, Thanks for the input. We have other requirements regarding precision and recall, especially when other sorts are specified. So need to suppress docs based on thresholds. Parvesh Garg, Founding Architect http://www.zettata.com On Tue, Oct 21, 2014 at 8:20 PM, Erick Erickson wrote:

Re: Some clarification needed on "migrate" command in Collections API

2014-10-21 Thread Ramzi Alqrainy
The MIGRATE command is a synchronous operation and therefore keeping a large read timeout on the invocation is advised. The request may still timeout due to inherent limitations of the Collection APIs but that doesn’t necessarily mean that the operation has failed. Users should check logs, cluster

Re: unstable results on refresh

2014-10-21 Thread Erick Erickson
Giovanni: To see how this happens, consider a shard with a leader and two followers. Assume your autocommit interval is 60 seconds on each. This interval can expire at slightly different "wall clock" times. Even if the servers started perfectly in synch, they can get slightly out of sync. So, you

Re: Facet counts and RankQuery

2014-10-21 Thread Joel Bernstein
The RankQuery cannot be used as filter. It is designed for custom ordering/ranking of results only. If it's used as filter the facet counts will not match up. If you need a filter collector then you need to use a PostFilter. Joel Bernstein Search Engineer at Heliosearch On Tue, Oct 21, 2014 at 10

Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
I noticed again the problem, now I was able to collect some data. in my paste http://pastebin.com/nVwf327c you can see the result of the same query issued twice, the 2nd and 3rd group are swapped. I pasted also the clusterstate and the core state for each core. The logs did'n show any problem rel

Re: Facet counts and RankQuery

2014-10-21 Thread Erick Erickson
I _very strongly_ recommend that you do _not_ do this. First, the "problem" of having documents in the results list with, say, scores < 20% of the max takes care of itself; users stop paging pretty quickly. You're arbitrarily denying the users any chance of finding some documents that _do_ match t

Re: How to properly use Levenstein distance with ~ in Java

2014-10-21 Thread Erick Erickson
When used on bare terms, ~ is indeed "fuzzy matching" rather than proximity, it's an overloaded operator in that sense. If I had to guess, I'd guess that your analysis chain for the field is doing "interesting" things for "taveranx" and the resulting token is far enough "away" (in the Levenshtein

Some clarification needed on "migrate" command in Collections API

2014-10-21 Thread Tassi Pierluigi
Ciao to all! We're testing Collections API on our SolrCloud test cluster (4.10.1) managed by a standalone Zookeeper server (3.4.6). We're following Collection API documentation and Yonik Seeley's blog post about Migration feature available since 4.7.x you can read also at: http://helios

Re: Is there a problem with -Infinity as boost?

2014-10-21 Thread O. Olson
Thank you Walter. I liked your solution! This is what I was looking for i.e. &boost=log(sum(1,qty)) O. O. Walter Underwood wrote > The usual fix for this is log(1+qty). If you might have negative values, > you can use log(max(1,qty)). > > wunder > Walter Underwood > wunder@ > http://observer

Re: Shared Directory for two Solr Clouds(Writer and Reader)

2014-10-21 Thread Erick Erickson
Hmmm, I sure hope you have _lots_ of shards. At that rate, a single shard is probably going to run up against internal limits in a _very_ short time (the most docs I've seen successfully served on a single shard run around 300M). It seems, to handle any reasonable retention period, you need lots a

RE: Word Break Spell Checker Implementation algorithm

2014-10-21 Thread Dyer, James
David, I do not know of a published algorithm for this. All it does is in the case of terms with 0 frequency, it checks the document frequency of the various parts that can be made from the terms by breaking them and/or by combining adjacent terms. There are tuning parameters available that le

Re: Recovering from Out of Mem

2014-10-21 Thread Shawn Heisey
On 10/21/2014 1:24 AM, Salman Akram wrote: > Yes so the most imp thing is what's the best way to 'know' that there is > OOM? Some script of a ping with 1-2 mins time? To touch on both your question and that posed by Toke Eskildsen: Java itself has a configuration option to call a program or scrip

Re: Strange behaviour with negatives queries

2014-10-21 Thread lansing
Thank you for your reply -- View this message in context: http://lucene.472066.n3.nabble.com/Strange-behaviour-with-negatives-queries-tp4164710p4165152.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros : > Hi Giovanni, > > we had this problem as well. > The cause was that the different nod

Re: unstable results on refresh

2014-10-21 Thread Giovanni Bricconi
I noticed the problem looking at a group query, the groups returned where sorted on the score field of the first result, and then showed to the user. Repeating the same query I noticed that the order of two group started switching Thank you, I will look for the thread you said 2014-10-20 22:07 GM

Re: Recovering from Out of Mem

2014-10-21 Thread Salman Akram
Yes so the most imp thing is what's the best way to 'know' that there is OOM? Some script of a ping with 1-2 mins time? The reason I want auto restart or at least some error (so that it can switch to another slave) is I want to have a good sleep if something goes wrong at night so that the systems

Re: Recovering from Out of Mem

2014-10-21 Thread Toke Eskildsen
On Mon, 2014-10-20 at 16:25 +0200, Shawn Heisey wrote: > In general, once OOME happens, program operation (and in some cases the > status of the most recently indexed documents) is completely > undetermined. We can be sure that the data which has already been > written to disk will be correct, but

Facet counts and RankQuery

2014-10-21 Thread Parvesh Garg
Hi All, We have written a RankQuery plugin with a custom TopDocsCollector to suppress documents below a certain threshold w.r.t. to the maxScore for that query. It works fine and is reflected well with numFound and start parameters. Our problem lies with facet counts. Even though the docs numFoun