Re: how to sampling search result

2016-09-27 Thread Alexandre Rafalovitch
I am not sure I understand what the business case is. However, you might be able to do something with a custom post-filter. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 27 September 2016 at 22:29, Yongtao Liu wrote: > Mikhai

Re: Configuring a custom Analyzer for the SynonymFilter

2016-09-27 Thread Alexandre Rafalovitch
Before you go down this rabbit hole, are you actually sure this does what you think it does? As far as I can tell, that parameter is for analyzing/parsing the synonym entries in the synonym file. Not the incoming search queries or text actually being indexed. Did you get it to work with the simple

Re: Solr suddenly starts creating .cfs (compound) segments during indexing

2016-09-27 Thread Tomás Fernández Löbbe
By default, TieredMergePolicy uses CFS for segments that are less than 10% of the index[1]. If you set the "useCompoundFile" element in solrconfig (to either true or false) you can override this[2]. TMP also has some other limits and logic on when to (and when not to) use CFS. You can take a look a

Re: Faceting search issues

2016-09-27 Thread Tomás Fernández Löbbe
I wonder why in the "facet_field" section of the first query it says: "facet_fields": {"id": []} when it should be saying "facet_fields": {"name": []} Also, why is the second query not including the fq in the echoParams section. What is that other query with fq=aggregationname:story? This is not

Solr suddenly starts creating .cfs (compound) segments during indexing

2016-09-27 Thread simon
Our index builds take around 6 hours, and I've noticed recently that segments created towards the end of the build (in the last hour or so) use the compound file format (.cfs). I assumed that this might be due to the number of open files approaching a maximum, but both the hard and soft open file

RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu
Shamik, Thanks a lot. Collapsing query parser solve the issue. Thanks, Yongtao -Original Message- From: shamik [mailto:sham...@gmail.com] Sent: Tuesday, September 27, 2016 3:09 PM To: solr-user@lucene.apache.org Subject: RE: how to remove duplicate from search result Did you take a look

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Solr User
Further testing indicates that any performance difference is not due to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes. The times appear to converge on an optimized index. Below are the details. Not sure what else to make of this at this point other than moving forward w

RE: how to remove duplicate from search result

2016-09-27 Thread shamik
Did you take a look at Collapsin Query Parser ? https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-remove-duplicate-from-search-result-tp4298272p4298305.html Sent from the Solr - User mailing l

Re: How to retrieve parent documents without a nested structure (block-join)

2016-09-27 Thread shamik
Thanks again Alex. I should have clarified the use of browse request handler. The reason I'm simulating the request handler parameters of my production system using browse. I used a separate request handler, stripped down all properties to match "select". I finally narrowed down the issue to Minim

JSON Facet "allBuckets" behavior

2016-09-27 Thread Karthik Ramachandran
While performing json faceting with "allBuckets" and "mincount", I not sure if I am expecting a wrong result or there is bug? By "allBucket" definition the response, representing the union of all of the buckets. Schema: Dataset: 1filename11 2filename21 3filename31 4filename41 5fil

Re: Faceting search issues

2016-09-27 Thread Jan Høydahl
Please tell some more - Solr version - Add to your query: &debugQuery=true&echoParams=all and paste the result - How is “string_ci” defined ()? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 26. sep. 2016 kl. 23.59 skrev Beyene, Iyob : > > Hi, > > When I query solr

RE: how to remove duplicate from search result

2016-09-27 Thread Yongtao Liu
David, Thanks for your reply. Group cannot solve the issue. We also need run facet and stats based on search result. With group, facet and stats result still count duplicate. Thanks, Yongtao -Original Message- From: David Santamauro [mailto:david.santama...@gmail.com] Sent: Tuesday, Sep

Re: Configuring a custom Analyzer for the SynonymFilter

2016-09-27 Thread Raf
On Tue, Sep 27, 2016 at 4:22 PM, Alexandre Rafalovitch wrote: > Looking at the code (on GitHub is easiest), it can take either > analyzer or tokenizer but definitely not any chain definitions. This > seems to be the same all the way to 6.2.1. > Thanks for your answer Alex. Does anyone know if i

RE: how to sampling search result

2016-09-27 Thread Yongtao Liu
Mikhail, Thanks for your reply. Random field is based on index time. We want to do sampling based on search result. Like if the random field has value 1 - 100. And the query touched documents may all in range 90 - 100. So random field will not help. Is it possible we can sampling based on searc

Re: how to remove duplicate from search result

2016-09-27 Thread David Santamauro
Have a look at https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 09/27/2016 11:03 AM, googoo wrote: hi, We want to provide remove duplicate from search result function. like we have below documents. id(uniqueKey) guid doc1G1 doc2G2

Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob
From: Beyene, Iyob Sent: Tuesday, September 27, 2016 11:22 AM To: solr-user Subject: Re: Faceting search issues Here is the result from running the first query, i.e http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=name&rows=0&facet.mincount=2&echoParams=all

Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob
Here is the result from running the first query, i.e http://localhost:8983/solr/core/select?q=*:*&facet=true&facet.field=name&rows=0&facet.mincount=2&echoParams=all

Re: how to sampling search result

2016-09-27 Thread Mikhail Khludnev
Perhaps, you can apply a filter on random field. On Tue, Sep 27, 2016 at 5:57 PM, googoo wrote: > Hi, > > Is it possible I can sampling based on "search result"? > Like run query first, and search result return 1 million documents. > With random sampling, 50% (500K) documents return for facet,

how to remove duplicate from search result

2016-09-27 Thread googoo
hi, We want to provide remove duplicate from search result function. like we have below documents. id(uniqueKey) guid doc1G1 doc2G2 doc3G3 doc4G1 user run one query and hit doc1, doc2 and doc4. user want to remove

how to sampling search result

2016-09-27 Thread googoo
Hi, Is it possible I can sampling based on "search result"? Like run query first, and search result return 1 million documents. With random sampling, 50% (500K) documents return for facet, and stats. The sampling need based on "search result". Thanks, Yongtao -- View this message in context:

Re: Faceting search issues

2016-09-27 Thread Alexandre Rafalovitch
That's weird. Could you rerun both queries with echoParams=all and see if some additional conditions will show up unexpectedly. Specifically, an 'fq' in the first query that the second query overrides. Alternatively, do you definitely have 316544 documents in the index? That's the number that's y

Re: Migrating to Solr 6.1.0 from 5.5.0

2016-09-27 Thread William Bell
the documentation is not good on this. Not sure how to fix it either. On Tue, Sep 27, 2016 at 3:41 AM, M, Arjun (Nokia - IN/Bangalore) < arju...@nokia.com> wrote: > Hi, > > We are getting the below errors when migrating Solr from 5.5.0 to > 6.1.0. Could anyone help in resolving the issue,

Re: Faceting search issues

2016-09-27 Thread Beyene, Iyob
Alessandro, thanks for your quick reply. When I say duplicates I meant to say how many documents the term appears in. All that I wanted to see is the number of times a particular name is appearing in documents in solr. thanks From: Alessandro Benedetti Sent:

Re: Configuring a custom Analyzer for the SynonymFilter

2016-09-27 Thread Alexandre Rafalovitch
Looking at the code (on GitHub is easiest), it can take either analyzer or tokenizer but definitely not any chain definitions. This seems to be the same all the way to 6.2.1. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 27 Se

Re: How to retrieve parent documents without a nested structure (block-join)

2016-09-27 Thread Alexandre Rafalovitch
I am not sure why you are even trying this against /browse, but the easy way to compare handlers is to add echoParams=all to both requests and compare all parameters side-by-side. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On

RE: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
Ok, thanks. I've added SOLR-9573 https://issues.apache.org/jira/browse/SOLR-9573 Regards, Markus -Original message- > From:Shalin Shekhar Mangar > Sent: Tuesday 27th September 2016 16:04 > To: solr-user@lucene.apache.org > Subject: Re: Results not ordered by score and debug info is i

Configuring a custom Analyzer for the SynonymFilter

2016-09-27 Thread Raf
Hi, is it possible to configure a custom analysis for synonyms the same way we do for index/query field analysis? Reading the *SynonymFilter* documentation[0], I have found I can specify a custom analyzer by writing its class name. Example:

Re: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Shalin Shekhar Mangar
Wow, it took me some time to realize what you were referring to :-) The manual (or the reference guide page) is under construction and I didn't finish that page. The sentence about faceting is just a copy/paste relic :) I'll go fix that asap. I remember a bunch of Jira issues w.r.t NPE and distri

Re: How to retrieve parent documents without a nested structure (block-join)

2016-09-27 Thread shamik
Sorry to bump this up, but can someone please explain the parsing behaviour of a join query (show above) in respect to different request handler ? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrieve-parent-documents-without-a-nested-structure-block-join-tp4297510

RE: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
Shalin, that does the trick indeed! Also noted that the manual is incorrect, setting it to a blank or false value, did not disable my facets: If set to "true," this parameter enables facet counts in the query response. If set to "false" to a blank or missing value, this parameter disables faceti

Re: Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Shalin Shekhar Mangar
This may be relevant or not, I am not sure but one difference between fl=title_nl,score,id and fl=score,id is that the former executes a two pass distributed search i.e. get ids, merge, get fields for top N docs but the latter skips the "get fields" phase because it already has all the right inform

Results not ordered by score and debug info is incorrect, crazy

2016-09-27 Thread Markus Jelsma
Hi, I just spotted something weird, again. A regular search popped up a weird candidate for first result, so i've reproduced it on our production system. Digging deeper, it appears that the fl parameter has something to do with it. Not the order of results but the scores / explain in the debug

Re: JNDI settings

2016-09-27 Thread Jamie Jackson
In 4, with a separate servlet container, I used externally-defined, container-managed jndi data sources. However, I'm in the middle of an upgrade to 5 (before moving on to 6), and I had to add the DIH lib to each core's solrconfig.xml and add connection details to each core's DIH config xml. See th

Migrating to Solr 6.1.0 from 5.5.0

2016-09-27 Thread M, Arjun (Nokia - IN/Bangalore)
Hi, We are getting the below errors when migrating Solr from 5.5.0 to 6.1.0. Could anyone help in resolving the issue, if you have come across this? org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:41569/solr/collection1

Re: Faceting search issues

2016-09-27 Thread Alessandro Benedetti
When you say "check for duplicates" what do you mean ? no duplicate tokens are in the index per field. What is your definition of duplicate for a term? Do you consider lowercase and uppercase version duplicate ? Maybe you have an analysis problem. MinCount=2 means : "include only terms appearing a

Re: Faceting and Grouping Performance Degradation in Solr 5

2016-09-27 Thread Alessandro Benedetti
Hi ! At the time we didn't investigate the deletion implication at all. This can be interesting. if you proceed with your investigations and discover what changed in the deletion approach, I would be more than happy to help! Cheers On Mon, Sep 26, 2016 at 10:59 PM, Solr User wrote: > Thanks aga

Re: Issue with Solr & Kerberos

2016-09-27 Thread Loïc Chanel
Hi Davide, Thanks for your quick answer. But I just updated all JCE packages and restarted solr : still get the same error. Another idea ? How did you solve it ? Regards, Loïc Loïc CHANEL System Big Data engineer MS&T - WASABI - Worldline (Villeurbanne, France) 2016-09-27 10:31 GMT+02:00 Davi

Re: json.facet without a facet ...

2016-09-27 Thread Bram Van Dam
On 26/09/16 17:06, Yonik Seeley wrote: > Statistics are now fully integrated into faceting. Since we start off > with a single facet bucket with a domain defined by the main query and > filters, we can even ask for statistics for this top level bucket, > before breaking up into further buckets via

R: Issue with Solr & Kerberos

2016-09-27 Thread Davide Isoardi
Hi, I had the same issue. You should try to download the latest versione of JCE and of JCE policy files. Davide Isoardi eCube S.r.l. isoa...@ecubecenter.it http://www.ecubecenter.it Tel.  +390113999301 Mobile +393288204915 Fax. +390113999309     Informativa ai sensi del Decr.Lgs Privacy n.196/200

Issue with Solr & Kerberos

2016-09-27 Thread Loïc Chanel
Hi all, I'm trying to install Solr to use it with Apache Ranger logs, but it seems it can't decrypt its Kerberos key. Indeed, when I try a curl : "curl -v -u solr: --noproxy '*' http://:8983/solr/ --negotiate", I get the following error : Error 403 GSSException: Failure unspecified at GSS-API leve