solrj howto update documents with expungeDeletes

2017-10-04 Thread Bernd Fehling
A simple question about solrj (Solr 6.4.2), how to update documents with expungeDeletes true/false? In org.apache.solr.client.solrj.SolrClient there are many add, commit, delete, optimize, ... but no "update". What is the best way to "update"? - just "add" the same docid with new content as upda

Re: Time to Load a Solr Core with Hdfs Directory Factory

2017-10-04 Thread Rick Leir
Shashank, I had a quick look at: https://lucene.apache.org/solr/guide/6_6/running-solr-on-hdfs.html Did you enable the Block Cache and the solr.hdfs.nrtcachingdirectory? cheers -- Rick On 2017-10-03 09:22 PM, Shashank Pedamallu wrote: Hi, I’m trying an experiment in which, I’m loading a core

Re: length of indexed value

2017-10-04 Thread alessandro.benedetti
Are the norms a good approximation for you ? If you preserve norms at indexing time ( it is a configuration that you can operate in the schema.xml) you can retrieve them with this specific function query : *norm(field)* Returns the "norm" stored in the index for the specified field. This is the pr

Re: length of indexed value

2017-10-04 Thread John Blythe
interesting idea. the field in question is one that can have a good deal of stray zeros based on distributor skus for a product and bad entries from those entering them. part of the matching logic for some operations look for these discrepancies by having a simple regex that removes zeroes. so 400

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović
Hi Bernd, When it comes to updating, it does not exist because indexed documents are not updatable - you can add new document with the same id and old one will be flagged as deleted. No need to delete explicitly. When it comes to expungeDeletes - that is a flag that can be set when committing.

Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Hello, Using a 6.6.0, i just spotted one of our collections having a core of which over 80 % of the total number of documents were deleted documents. It has configured with no non-default settings. Is this supposed to happen? How can i prevent these kind of numbers? Thanks, Markus

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
Hi Markus, You can set reclaimDeletesWeight in merge settings to some higher value than default (I think it is 2) to favor segments with deleted docs when merging. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sem

Re: Very high number of deleted docs

2017-10-04 Thread Amrit Sarkar
Hi Markus, Emir already mentioned tuning *reclaimDeletesWeight which *affects segments about to merge priority. Optimising index time by time, preferably scheduling weekly / fortnight / ..., at low traffic period to never be in such odd position of 80% deleted docs in total index. Amrit Sarkar Se

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
I really doubt that is going to do anything, TieredMergePolicyFactory does not pass the settings from Solr to TieredMergePolicy. Thanks, Markus -Original message- > From:Emir Arnautović > Sent: Wednesday 4th October 2017 14:33 > To: solr-user@lucene.apache.org > Subject: Re: Very hi

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Do you mean a periodic forceMerge? That is usually considered a bad habit on this list (i agree). It is just that i am actually very surprised this can happen at all with default settings. This factory, unfortunately does not seem to support settings configured in solrconfig. Thanks, Markus -

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Bernd Fehling
Hi Emir, can you point out which commit you are using for expungeDeletes true/false? My commit has only commit(String collection, boolean waitFlush, boolean waitSearcher, boolean softCommit) Or is expungeDeletes true/false a special combination of the boolean parameters? Regards, Bernd Am 04.

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
Did you _ever_ do a forceMerge/optimize or expungeDeletes? Here's the problem TieredMergePolicy (TMP) has a maximum segment size it will allow, 5G by default. No segment is even considered for merging unless it has < 2.5G (or half whatever the default is) non-deleted docs, the logic being that to

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
Hi Markus, It is passed but not explicitly - it uses reflection to pass arguments - take a look at parent factory class. When it comes to force merging - you have extreme case - 80% is deleted (my guess frequent updates) and extreme cases require some extreme measures - it can be either periodi

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Erick Erickson
Do not use expungedeletes even if you find a way to call it in the scenario you're talking about. First of all I think you'll run into the issue here: https://issues.apache.org/jira/browse/LUCENE-7976 Second it is a very heavy weight operation. It potentially rewrites _all_ of your index and it so

Re: Solr cloud planning

2017-10-04 Thread gatanathoa
There is a very large amount of data and there will be a constant addition of more data. There will be hundreds of millions if not billions of items. We have to be able to be able to be constantly indexing items but also allow for searching. Sadly there is no way to know the amount of searching th

Re: solrj howto update documents with expungeDeletes

2017-10-04 Thread Emir Arnautović
Hi Bernd, I guess it is not exposed in Solrj. Maybe for good reason - it is rarely good to call it. You might better set reclaimDeletesWeight in your merge config and keep number of deleted docs under control that way. Regards, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection S

Re: length of indexed value

2017-10-04 Thread Erick Erickson
Check. The problem is they don't encode the exact length. I _think_ this patch shows you'd be OK with shorter lengths, but check: https://issues.apache.org/jira/browse/LUCENE-7730. Note it's not the patch that counts here, just look at the table of lengths. Best, Erick On Wed, Oct 4, 2017 at 4:2

Re: Solr cloud planning

2017-10-04 Thread Erick Erickson
You'll almost certainly have to shard then. First of all Lucene has a hard limit of 2^31 docs in a single index so there's a 2B limit. There's no such limit on the number of docs in the collection (i.e. 5 shards each can have 2B docs for 10B docs total in the collection). But nobody that I know of

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
rapid updates aren't the cause of a large percentage of deleted documents. See the JIRA I referenced for the probable cause: https://issues.apache.org/jira/browse/LUCENE-7976 If my suspicion is correct you'll see one or more of your segments occupy way more than 5G. Assuming my suspicion is correc

Re: Solr 5.4.0: Colored Highlight and multi-value field ?

2017-10-04 Thread Erick Erickson
How does it not work for you? Details matter, an example set of values and the response from Solr are good bits of info for us to have. On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina wrote: > Dear all, > > > > Is it possible to have a colored highlight in a multi-value field ? > > > > I’m succeed

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Ah thanks for that! -Original message- > From:Emir Arnautović > Sent: Wednesday 4th October 2017 15:03 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > It is passed but not explicitly - it uses reflection to pass arguments - take > a lo

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
No, that collection never receives a forceMerge nor expungeDeletes. Almost all (99.999%) documents are overwritten every 90 minutes. A single shard has 16k docs (97k total) but is only 300 MB large. Maybe that's a problem there. I can simply turn a switch to forgeMerge after the periodic update

ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Ascot Moss
Hi, I am trying to use hbase-indexer to index hbase table to Solr, Solr 6.6 Hbase-Indexer 1.6 Hbase 1.2.5 with Kerberos enabled, After putting new test rows into the Hbase table, I got the following error from hbase-indexer thus it cannot write the data to solr : WARN ipc.AbstractRpcClient: Ex

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
Hmmm, OK, I stand corrected. This is odd, though. I suspect a quirk in the merging algorithm when you have a small index.. Ahh, wait. What happens if you modify the segments per tier parameter of TMP? The default is 10, and perhaps because this is such a small index you don't have very many like

Solr boost function taking precedence over relevance boosting

2017-10-04 Thread ruby
I have a usecase where: if a document has the search string in it's name_property field, then I want to show that document on top. If multiple document has the search string in it's name_property field then I want to sort them by creation date. Following is my query: q={!boost+b=recip(ms(NOW,crea

RE: Default value from another field?

2017-10-04 Thread jimi.hullegard
Thank you Alexandre! It worked great. :) And here is how it is configured, if someone else wants to do this, but is too busy to read the documentation for these classes: source_field target_field target_field

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Well, that made a difference! Now we're back at 64 MB per replica. Thanks, Markus -Original message- > From:Erick Erickson > Sent: Wednesday 4th October 2017 16:19 > To: solr-user > Subject: Re: Very high number of deleted docs > > Hmmm, OK, I stand corrected. > > This is odd, tho

Solr test runs: test skipping logic

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi, I am seeing that in different test runs (e.g., by executing 'ant test' on the root folder in 'lucene-solr') a different subset of tests are skipped. Where can I find more about it? I am trying to create parity between test successes before and after my changes and this is causing confusion.

Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi, I have some custom code in solr (which is not of good quality for contributing back) so I need to setup my own continuous build solution. I tried jenkins and was hoping that ant build (ant clean compile) in Execute Shell textbox will work, but I am stuck at this ivy-fail error: To work around

RE: Moving to Point, trouble with IntPoint.newRangeQuery()

2017-10-04 Thread Chris Hostetter
: Ok, it has been resolved. I was lucky to have spotted i was looking at : the wrong schema fike! The one the test actually used was not yet : updated from Trie to Point! And boom goes the dynamite. This is a prime example of where having assumptions in your code (that the field type will by

Re: length of indexed value

2017-10-04 Thread John Blythe
ah, thanks for the link. -- John Blythe On Wed, Oct 4, 2017 at 9:23 AM, Erick Erickson wrote: > Check. The problem is they don't encode the exact length. I _think_ > this patch shows you'd be OK with shorter lengths, but check: > https://issues.apache.org/jira/browse/LUCENE-7730. > > Note it's

Complexphrase treats wildcards differently than other query parsers

2017-10-04 Thread Bjarke Buur Mortensen
Hi list, I'm trying to search for the term funktionsnedsättning* In my analyzer chain I use a MappingCharFilterFactory to change ä to a. So I would expect that funktionsnedsättning* would translate to funktionsnedsattning*. If I use e.g. the lucene query parser, this is indeed what happens: ...de

Re: Solr Spatial Query Problem Hk.

2017-10-04 Thread David Smiley
Hi, Firstly, if Solr returns an error referencing an exception then you can look in Solr's logs for the stack trace, which helps debugging problems a ton (at least for Solr devs). I suspect that the problem here is that your schema might have a dynamic field where *coordinates is defined to be a

Re: ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Ascot Moss
Does anyone use hbase indexer in index kerberos Hbase to solr? Pls help! On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss wrote: > Hi, > > I am trying to use hbase-indexer to index hbase table to Solr, > > Solr 6.6 > Hbase-Indexer 1.6 > Hbase 1.2.5 with Kerberos enabled, > > > After putting new test

Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal
So, i looked at this setup https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console which is using Maven, so i switched to maven too. I am hitting following error with maven build: Is that expected? Can someone share me the details about how https://builds.apache.org/job/Lucene-Solr-Mav

Re: Jenkins setup for continuous build

2017-10-04 Thread Nawab Zada Asad Iqbal
I looked at https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console and decided to switch to maven. However my maven build (without jenkins) is failing with this error: [INFO] Scanning classes for violations... [ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings [non-p

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe
Hi Nawab, > On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal wrote: > > I am hitting following error with maven build: > Is that expected? No. What commands did you use? > Can someone share me the details about how > https://builds.apache.org/job/Lucene-Solr-Maven-master is configured. The

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Nawab Zada Asad Iqbal
Hi Steve, I did this: ant get-maven-poms cd maven-build/ mvn -DskipTests install On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe wrote: > Hi Nawab, > > > On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal > wrote: > > > > I am hitting following error with maven build: > > Is that expected? > >

Re: ERROR ipc.AbstractRpcClient: SASL authentication failed

2017-10-04 Thread Rick Leir
Ascot, At the risk of ...   Can you disable Kerberos in Hbase? If not, then you will have to provide a password! Rick On 2017-10-04 07:32 PM, Ascot Moss wrote: Does anyone use hbase indexer in index kerberos Hbase to solr? Pls help! On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss wrote: Hi

Re: Maven build error (Was: Jenkins setup for continuous build)

2017-10-04 Thread Steve Rowe
When I run those commands (on Debian Linux 8.9, with Maven v3.0.5 and Oracle JDK 1.8.0.77), I get: - [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Tota

Re: Solr test runs: test skipping logic

2017-10-04 Thread Erick Erickson
There are some tests annotated @Nightly or @Weekly, or @Slow, is there a correlation to those? Best, Erick On Wed, Oct 4, 2017 at 8:59 AM, Nawab Zada Asad Iqbal wrote: > Hi, > > I am seeing that in different test runs (e.g., by executing 'ant test' on > the root folder in 'lucene-solr') a differ

Re: tipping point for using solrcloud—or not?

2017-10-04 Thread Shawn Heisey
On 9/29/2017 6:34 AM, John Blythe wrote: complete noob as to solrcloud here. almost-non-noob on solr in general. we're experiencing growing pains in our data and am thinking through moving to solrcloud as a result. i'm hoping to find out if it seems like a good strategy or if we need to get othe

FilterCache size should reduce as index grows?

2017-10-04 Thread S G
Hi, Here is a discussion we had recently with a fellow Solr user. It seems reasonable to me and wanted to see if this is an accepted theory. The bit-vectors in filterCache are as long as the maximum number of documents in a core. If there are a billion docs per core, every bit vector will have a