Questions for SynonymGraphFilter and WordDelimiterGraphFilter

2019-01-04 Thread Wei
Hello, We are upgrading to Solr 7.6.0 and noticed that SynonymFilter and WordDelimiterFilter have been deprecated. Solr doc recommends to use SynonymGraphFilter and WordDelimiterGraphFilter instead. In current schema, we have text field type defined as

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson
Ashish: Deleting and re-adding a replica is not a solution. Even if you did, that would then be identical only until you started indexing again, then the stats could skew a bit. When you index to NRT replicas, the wall clock times that cause the commits to trigger will be different due to network

Warnings in Zookeeper Server Logs

2019-01-04 Thread Joe Lerner
Hi (yes again): We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2), and 3 zookeeper instances (on servers #1, #2, and #3). Things appear to work fine, and I have confirmed that our basic configuration is correct. But we are seeing TONS of the following warnings in all of ou

Re: Time consuming for insert record

2019-01-04 Thread Shawn Heisey
On 12/25/2018 11:23 PM, jay harkhani wrote: We are using add method of CloudSolrClient for insert data into Solr Cloud Index. In specific scenario we need to insert record of around 3 MB document into Solr which takes 5-6 seconds. Is this a single document that's 3 MB in size, or many documen

Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2019-01-04 Thread Shawn Heisey
On 1/3/2019 11:15 AM, Bob Hathaway wrote: We want to use SOLR v7 but Sonatype scans past v6.5 show dozens of critical and severe security issues and dozens of licensing issues. None of the images that you attached to your message are visible to us.  Attachments are regularly stripped by Apache

Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2019-01-04 Thread Jörn Franke
Jackson-databind is actually not such an old version. The problem with Jackson databind is that for deserialization it has just a blacklist of objects not to deserialize and it is impossible to maintain that blacklist uptodate. For version 3.0 they change to a whitelist approach it seems which w

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Ashish Bisht
Hi Erick, I have updated that I am not facing this problem in a new collection. As per 3) I can try deleting a replica and adding it again, but the confusion is which one out of two should I delete.(wondering which replica is giving correct score for query) Both replicas give same number of d

Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2019-01-04 Thread Gus Heck
Hi Bob, Wrt licensing keep in mind that multi licensed software allows you to choose which license you are using the software under. Also there's some good detail on the Apache policy here: https://www.apache.org/legal/resolved.html#what-can-we-not-include-in-an-asf-project-category-x One has to

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

2019-01-04 Thread Davis, Daniel (NIH/NLM) [C]
DIH is also not designed to multi-thread very well. One way I've handled this is to have a DIH XML that breaks-up a database query into multiple processes by taking the modulo of a row, as follows: This allows me to do sub-queries within the entity, but it is often better to just write

Re: [solr-solrcloud] How does DIH work when there are multiple nodes?

2019-01-04 Thread Shawn Heisey
On 1/4/2019 1:04 AM, 유정인 wrote: The reader was looking for a way to do 'DIH' automatically. The reason was for HA configuration. If you send a DIH request to the collection (as opposed to a specific core), that request will be load balanced across the cloud.  You won't know which replica/cor

Re: Regarding Shards - Composite / Implicit , Replica Type - NRT / TLOG

2019-01-04 Thread Shawn Heisey
On 1/3/2019 11:26 PM, Doss wrote: We are planning to setup a SOLR cloud with 6 nodes for 3 million records (expected to grow to 5 million in a year), with 150 fields and over all index would come around 120GB. We plan to use NRT with 5 sec soft commit and 1 min hard commit. Five seconds is lik

Re: Regarding Shards - Composite / Implicit , Replica Type - NRT / TLOG

2019-01-04 Thread Erick Erickson
It's usually best to use compositeId routing. That distributes the load evenly. Otherwise, _you_ have to be responsible for making sure that the docs are reasonably evenly distributed, which can be a pain. Implicit routing is usually best in situations where you index to a particular shard for a w

Re: How to debug empty ParsedQuery from Edismax Query Parser

2019-01-04 Thread Kay Wrobel
I'd like to follow up on this post here because it has become relevant to me now. I have set up a debugging environment and took a deep-dive into the SOLR 7.6.0 source code with Eclipse as my IDE of choice for this task. I have isolated the exact line as to where things fall apart for my two sa

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Mikhail Khludnev
Replicated segments might have different deleted documents by design. Precise numbers can be achieved via exact stats. see https://lucene.apache.org/solr/guide/6_6/distributed-requests.html#DistributedRequests-ConfiguringstatsCache_DistributedIDF_ On Fri, Jan 4, 2019 at 2:40 PM AshB wrote: > Ve

Re: So Many Zookeeper Warnings--There Must Be a Problem

2019-01-04 Thread Erick Erickson
How brave are you? ;) I'll defer to Scott on the internals of ZK and why it might be necessary to delete the ZK data dirs, but what happens if you just correct your configuration and drive on? If that doesn't work here's something to try Shut down your Solr instances, then. - bin/solr zk

Re: So Many Zookeeper Warnings--There Must Be a Problem

2019-01-04 Thread Shawn Heisey
On 1/4/2019 5:24 AM, Joe Lerner wrote: server #1 = myid#1 server #2 = myid#2 server #3 = myid#2 My plan would be to do the following, while users are still online (it's a big [bad] deal if we need to take search offline): 1. Take zk #3 down. 2. Fix zk #3 by deleting the contents of the zk data

Re: SOLR v7 Security Issues Caused Denial of Use - Sonatype Application Composition Report

2019-01-04 Thread Bob Hathaway
The most important feature of any software running today is that it can be run at all. Security vulnerabilities can preclude software from running in enterprise environments. Today software must be free of critical and severe security vulnerabilities or they can't be run at all from Information Sec

Re: Solr relevancy score different on replicated nodes

2019-01-04 Thread Erick Erickson
See particularly point 3 here and to a lesser extent point 2. https://support.lucidworks.com/s/question/0D5803LRpijCAD/the-number-of-results-returned-is-not-constant-every-time-i-query-solr For point two (the internal Lucene doc IDs are different) you can easily correct it by adding sort=score

Continuous Zookeeper Client Warnings

2019-01-04 Thread Joe Lerner
Hi, We have a simple architecture: 2 SOLR Cloud servers (on servers #1 and #2), and 3 zookeeper instances (on servers #1, #2, and #3). Things appear to work fine but: We are getting *TONS* of continuous log warnings from our client applications. From one server it shows this: [MYAPP-WEB]

Re: So Many Zookeeper Warnings--There Must Be a Problem

2019-01-04 Thread Joe Lerner
wrt, "You'll probably have to delete the contents of the zk data directory and rebuild your collections." Rebuild my *SOLR* collections? That's easy enough for us. If this is how we're incorrectly configured now: server #1 = myid#1 server #2 = myid#2 server #3 = myid#2 My plan would be to do t

Solr relevancy score different on replicated nodes

2019-01-04 Thread AshB
Version Solr 7.4.0 zookeeper 3.4.11 Achitecture Two boxes Machine-1,Machine-2 holding single instances of solr We are having a collection which was single shard and single replica i.e s=1 and rf=1 Few days back we tried to add replica to it.But the score for same query is coming different from di

Inconsistent debugQuery score with multiplicative boost

2019-01-04 Thread Thomas Aglassinger
Hi! When debugging a query using multiplicative boost based on the product() function I noticed that the score computed in the explain section is correct while the score in the actual result is wrong. As an example here’s a simple query that boosts a field name_text_de (containing German produ

RE: [solr-solrcloud] How does DIH work when there are multiple nodes?

2019-01-04 Thread 유정인
Hi The reader was looking for a way to do 'DIH' automatically. The reason was for HA configuration. Thank you for answer. If you know how, please reply. -Original Message- From: Doss Sent: Friday, January 04, 2019 3:59 PM To: solr-user@lucene.apache.org Subject: RE: [solr-solrcloud] H