Nested geofilt query for LTR feature

2019-03-14 Thread Kamuela Lau
Hello, I'm currently using Solr 7.2.2 and trying to use the LTR contrib module to rerank queries. For my LTR model, I would like to use a feature that is essentially a "normalized distance," a value between 0 and 1 which is based on distance. When using geodist() to define a feature in the featur

Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
Thanks Shalin, Shawn. I ended up getting guidance from Anshum on this and we did indeed use the delete-replica api to delete all but one of the replicas, and bouncing the last replica to let it lead. I will let anshum share a post on the details of how to recover leader shards. > On Mar 14, 2

Re: Authorization fails but api still renders

2019-03-14 Thread Zheng Lin Edwin Yeo
Hi, Can't really catch your question. Are you facing the error 401 on all the clusters or just one of them? Also, which Solr version are you using? Regards, Edwin On Fri, 15 Mar 2019 at 05:15, Branham, Jeremy (Experis) wrote: > I’ve discovered the authorization works properly if I use the FQD

Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
correction: Thanks Shalin, Shawn. I ended up getting guidance from Anshum on this and we did indeed use the delete-replica api to delete all but one of the replicas, and bouncing the last replica to let it lead. I will let anshum share a post on the details of how to recover leaderless shard

Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Shalin Shekhar Mangar
What Shawn said. DeleteShard API is supposed to be used either when using implicit routing or when you have compositeId router but the shard has already been split and therefore in an inactive state. Delete Replica API is what you need if you want to delete an individual replica. On Fri, Mar 15,

Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Thanks, Nish. It turned out to be other issue. I had not restarted one of the node in the cluster which had become leader meanwhile. It is good to know though that there is malformed XML in the example. I will try to submit a document fix soon. On Thu, Mar 14, 2019 at 5:37 PM Nish Karve wrote: >

Re: Solr/Tika config question

2019-03-14 Thread Erick Erickson
Tika is already distributed with Solr. It should “just work” since the path is already in solrconfig.xml Other PDF converters? I’m sure there are, but Tika is free…. But, i wouldn’t really recommend that you just ship the docs to Solr, I’d recommend that you build a little program to do the e

Solr/Tika config question

2019-03-14 Thread Paul Buiocchi
Greetings, I am setting up solr 8 on a vanilla Linux Ubuntu server (16.04) The whole reason for the setup is to index 1000s of PDF files (newspaper scans). - I created my core and have Solr up and running.- I am assuming that I need Apache Tika to index the files-Do I tie Tika into Solr via the SO

Re: Bidirectional CDCR not working

2019-03-14 Thread Nish Karve
Arnold, Have you copied the configuration from the Solr docs? The bi directional cluster configuration (for cluster 1) has a malformed XML. It is missing the closing tag for the updateLogSynchronizer under the request handler configuration. Please disregard if you have already considered that in

Re: Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Shawn, On 3/14/19 10:46, Shawn Heisey wrote: > On 3/14/2019 8:23 AM, Christopher Schultz wrote: >> I believe that the only thing I want to do is to set the >> autoSoftCommit value to something "reasonable". I'll probably >> start with maybe 15000 (

Re: Authorization fails but api still renders

2019-03-14 Thread Branham, Jeremy (Experis)
I’ve discovered the authorization works properly if I use the FQDN to access the Solr node, but the short hostname completely circumvents it. They are all internal server clusters, so I’m using self-signed certificates [the same exact certificate] on each. The SAN portion of the cert contains the

Re: Help with a DIH config file

2019-03-14 Thread Jörn Franke
sorry for my late reply. thanks for sharing yes this is possible. maybe my last mail were confusing. I hope the examples below help Alternative 1 - Use only DIH without update processor tika-data-config-2xml - add transformer in entity and the transformation in field (here done for id and for fu

Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Configuration is almost identical for both clusters in terms of cdcr except for zkHost parameter configuration. On Thu, Mar 14, 2019 at 3:45 PM Arnold Bronley wrote: > Exactly. I have it defined in both clusters. I am following the > instructions from here . > https://lucene.apache.org/solr/guid

Re: Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Exactly. I have it defined in both clusters. I am following the instructions from here . https://lucene.apache.org/solr/guide/7_7/cdcr-config.html#bi-directional-updates On Thu, Mar 14, 2019 at 3:40 PM Amrit Sarkar wrote: > Hi Arnold, > > You need "cdcr-processor-chain" definitions in solrconfig

Re: ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi, I tried that as well. No change in scores. On Thu, Mar 14, 2019 at 3:37 PM Michael Gibney wrote: > Are you basing your conclusion (that it's not working as expected) on the > scores as reported in the debug output? If you haven't already, try adding > "score" to the "fl" param -- if differe

Re: Bidirectional CDCR not working

2019-03-14 Thread Amrit Sarkar
Hi Arnold, You need "cdcr-processor-chain" definitions in solrconfig.xml on both clusters' collections. Both clusters need to act as source and target. Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedi

Re: ExactStatsCache not working for distributed IDF

2019-03-14 Thread Michael Gibney
Are you basing your conclusion (that it's not working as expected) on the scores as reported in the debug output? If you haven't already, try adding "score" to the "fl" param -- if different (for a given doc) than the score as reported in debug, then it's probably working as intended ... just a lit

Bidirectional CDCR not working

2019-03-14 Thread Arnold Bronley
Hi, I used unidirectional CDCR in SolrCloud (7.7.1) without any issues. But after setting up bidirectional cdcr configuration, I am not able to index a document. Following is the error that I am getting: Async exception during distributed update: Error from server at http://host1:8983/solr/techp

ExactStatsCache not working for distributed IDF

2019-03-14 Thread Arnold Bronley
Hi, I am using ExactStatsCache in SolrCloud (7.7.1) by adding following to solrconfig.xml file for all collections. I restarted and indexed the documents of all collections after this change just to be sure. However, when I do multi-collection query, the scores do not change before and after ad

Re: Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Shawn Heisey
On 3/14/2019 12:47 PM, Aroop Ganguly wrote: I am trying to delete a shard from a collection using the collections api for the same. On the solr ui,  all the replicas are in “downed” state. However, when I run the delete shard command: /solr/admin/collections?action=DELETESHARD&collection=x&sha

Solr 7.5 DeleteShard not working when all cores are down

2019-03-14 Thread Aroop Ganguly
Hi All I am trying to delete a shard from a collection using the collections api for the same. On the solr ui, all the replicas are in “downed” state. However, when I run the delete shard command: /solr/admin/collections?action=DELETESHARD&collection=x&shard=shard84 I get this exception: {

Re: Boolean Searches?

2019-03-14 Thread David Hastings
oh, thought it was implied with this: " and also use the edismax query parser" On Thu, Mar 14, 2019 at 11:38 AM Andy C wrote: > Dave, > > You don't mention what query parser you are using, but with the default > query parser you can field qualify all the terms entered in a text box by > surrou

RE: Duplicate values in Multi Value Fields

2019-03-14 Thread Gerald Bonfiglio
I've used this before, by specifying the chain as the default processor chain by putting the following directly under the entry: uniq-fields Not sure if this is the best way, but since our app is the only one using Solr, we want every update to use the chain across all ou

Re: Boolean Searches?

2019-03-14 Thread Andy C
Dave, You don't mention what query parser you are using, but with the default query parser you can field qualify all the terms entered in a text box by surrounding them with parenthesis. So if you want to search against the 'title' field and they entered: train OR dragon You could generate the S

Authorization fails but api still renders

2019-03-14 Thread Branham, Jeremy (Experis)
I’m using Basic Auth on 3 different clusters. On 2 of the clusters, authorization works fine. A 401 is returned when I try to access the core/collection apis. On the 3rd cluster I can see the authorization failed, but the api results are still returned. Solr.log 2019-03-14 09:25:47.680 INFO (q

Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
Does anyone know how to config this in solrconfig?, the idea is that solr uses it when I execute the data import: _nombreArea_ uniq-fields From: Alexis Aravena Silva Sent: Thursday, March 14, 2019 11:26:07

RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Ok I think I'm getting it. At Index/Query time the analyzers fire and "do stuff". Ex: "the sheep jumped over the MOON" that could be Tokened on spaces, lowercased etc. and that is stored in the Inverted Index, something you probably can't really see. In solr the string above is what you see in

Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey
On 3/14/2019 8:49 AM, Moyer, Brett wrote: Thanks Shawn, " Analysis only happens to indexed data" Being the case when the data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? The analyzer I have defined is not set for Index or Query, so as I understand it will fire during

Re: Boolean Searches?

2019-03-14 Thread David Hastings
If you make your default operator "OR", or the q.op, and also use the edismax query parser you can use the qf field to boost the title heavily compared to the default field you are using, for example i use something like this, which may be over kill: title^100 description^50 topic^30 text i also ha

Re: Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread Shawn Heisey
On 3/14/2019 1:13 AM, VAIBHAV SHUKLA shuklavaibha...@yahoo.in wrote: When I restart Solr it throws the following error. Solr collection indexed to pdf in hdfs throws error during solr restart. Error Caused by: org.apache.lucene.store.LockObtainFailedException: Index dir 'hdfs://192.168.1.

Boolean Searches?

2019-03-14 Thread Dave Beckstrom
Hi Everyone, I'm building a SOLR search application and the customer wants the search to work like google search. They want the user to be able to enter boolean searches like: train OR dragon. which would find any matches that has the word "train" or the word "dragon" in the title. I know tha

RE: FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
Thanks Shawn, " Analysis only happens to indexed data" Being the case when the data gets Indexed, then wouldn't the Analyzer kickoff and lowercase the URL? The analyzer I have defined is not set for Index or Query, so as I understand it will fire during both events. If that is the case I still d

Re: Commits and new document visibility

2019-03-14 Thread Shawn Heisey
On 3/14/2019 8:23 AM, Christopher Schultz wrote: I believe that the only thing I want to do is to set the autoSoftCommit value to something "reasonable". I'll probably start with maybe 15000 (15sec) to match the hard-commit setting and see if we get any complaints about delays between "save" and

Re: FieldTypes and LowerCase

2019-03-14 Thread Shawn Heisey
On 3/14/2019 7:47 AM, Moyer, Brett wrote: I'm using the below FieldType/Field but when I index my documents, the URL is not being lower case. Any ideas? Do I have the below wrong? Example: http://connect.rightprospectus.com/RSVP/TADF Expect: http://connect.rightprospectus.com/rsvp/tadf

Re: [ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread Toke Eskildsen
On Thu, 2019-03-14 at 13:16 +0100, jim ferenczi wrote: > http://lucene.apache.org/solr/8_0_0/changes/Changes.html Thank you for the hard work of rolling the release! Looking forward to upgrading. - Toke Eskildsen, Royal Danish Library

Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
I've tried with the following, but it doesn't work, it seems like solr doesn't take the configuration: _nombreArea_ uniq-fields From: MUNENDRA S.N Sent: Thursday, March 14, 2019 11:17:40 AM To: solr-user@luc

Commits and new document visibility

2019-03-14 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 All, I recently had a situation where a document wasn't findable in a fairly small Solr core/collection and I didn't see any errors in either the application using Solr or within Solr itself. A Solr service restart caused the document to become visi

Re: Duplicate values in Multi Value Fields

2019-03-14 Thread MUNENDRA S.N
Probably you could add-distinct operation for unique values in multivalued fields https://lucene.apache.org/solr/guide/7_3/updating-parts-of-documents.html On Thu, Mar 14, 2019, 7:40 PM Jörn Franke wrote: > With an update request processor > > https://lucene.apache.org/solr/7_4_0//solr-core/org

Re: Duplicate values in Multi Value Fields

2019-03-14 Thread Jörn Franke
With an update request processor https://lucene.apache.org/solr/7_4_0//solr-core/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactory.html > Am 14.03.2019 um 15:01 schrieb Alexis Aravena Silva : > > Hello, > > > I'm indexing data into some MultiValueFields, but I have duplicates,

Duplicate values in Multi Value Fields

2019-03-14 Thread Alexis Aravena Silva
Hello, I'm indexing data into some MultiValueFields, but I have duplicates, how can I remove the duplicate values at indexing time? I'm using Solr 7. sample: _nombreArea_":["MICROBIOLOGÍA", "QUÍMICA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", "MICROBIOLOGÍA", "QUÍMICA", "QUÍMICA", "MICROBIOLOGÍA

FieldTypes and LowerCase

2019-03-14 Thread Moyer, Brett
I'm using the below FieldType/Field but when I index my documents, the URL is not being lower case. Any ideas? Do I have the below wrong? Example: http://connect.rightprospectus.com/RSVP/TADF Expect: http://connect.rightprospectus.com/rsvp/tadf Brett Moyer

Re: NPE deleting expired docs (SOLR-13281)

2019-03-14 Thread Christine Poerschke (BLOOMBERG/ LONDON)
Thank you for sharing that 7.6 has the same issue. If anyone is interested in delving into the code to investigate further, I've added short steps on https://issues.apache.org/jira/browse/SOLR-13281 as to how one could potentially make a start on that. From: solr-user@lucene.apache.org At: 03/1

[ANNOUNCE] Apache Solr 8.0.0 released

2019-03-14 Thread jim ferenczi
14 March 2019, Apache Solr™ 8.0.0 available The Lucene PMC is pleased to announce the release of Apache Solr 8.0.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted s

Re: Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread Jason Gerlowski
> When I restart Solr How exactly are you restarting Solr? Are you running a "bin/solr restart"? Or is Solr already shut down and you're just starting it back up with a "bin/solr start "? Depending on how Solr was shut down, you might be running into a bit of a known-issue with Solr's HDFS supp

Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread VAIBHAV SHUKLA shuklavaibha...@yahoo.in
When I restart Solr it throws the following error. Solr collection indexed to pdf in hdfs throws error during solr restart. Error java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [PDFIndex] at java.util.concurrent.FutureTask.report(Futur

Solr collection indexed to pdf in hdfs throws error during solr restart

2019-03-14 Thread VAIBHAV SHUKLA shuklavaibha...@yahoo.in
When I restart Solr it throws the following error. Solr collection indexed to pdf in hdfs throws error during solr restart. Error java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [PDFIndex] at java.util.concurrent.FutureTask.report(Futur

Re: solr search Ontology based data set

2019-03-14 Thread Charlie Hull
On 13/03/2019 17:01, Jie Luo wrote: Hi all, I have several ontology based data sets, I would like to use solr as search engine. Solr document is flat document. I would like to know how it is the best way to handle the search. Simple search is fine. One possible search I will need to retrieve