A simple question about solrj (Solr 6.4.2),
how to update documents with expungeDeletes true/false?
In org.apache.solr.client.solrj.SolrClient there are many add,
commit, delete, optimize, ... but no "update".
What is the best way to "update"?
- just "add" the same docid with new content as upda
Shashank,
I had a quick look at:
https://lucene.apache.org/solr/guide/6_6/running-solr-on-hdfs.html
Did you enable the Block Cache and the solr.hdfs.nrtcachingdirectory?
cheers -- Rick
On 2017-10-03 09:22 PM, Shashank Pedamallu wrote:
Hi,
I’m trying an experiment in which, I’m loading a core
Are the norms a good approximation for you ?
If you preserve norms at indexing time ( it is a configuration that you can
operate in the schema.xml) you can retrieve them with this specific function
query :
*norm(field)*
Returns the "norm" stored in the index for the specified field. This is the
pr
interesting idea.
the field in question is one that can have a good deal of stray zeros based
on distributor skus for a product and bad entries from those entering them.
part of the matching logic for some operations look for these discrepancies
by having a simple regex that removes zeroes. so 400
Hi Bernd,
When it comes to updating, it does not exist because indexed documents are not
updatable - you can add new document with the same id and old one will be
flagged as deleted. No need to delete explicitly.
When it comes to expungeDeletes - that is a flag that can be set when
committing.
Hello,
Using a 6.6.0, i just spotted one of our collections having a core of which
over 80 % of the total number of documents were deleted documents.
It has configured with no
non-default settings.
Is this supposed to happen? How can i prevent these kind of numbers?
Thanks,
Markus
Hi Markus,
You can set reclaimDeletesWeight in merge settings to some higher value than
default (I think it is 2) to favor segments with deleted docs when merging.
HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sem
Hi Markus,
Emir already mentioned tuning *reclaimDeletesWeight which *affects segments
about to merge priority. Optimising index time by time, preferably
scheduling weekly / fortnight / ..., at low traffic period to never be in
such odd position of 80% deleted docs in total index.
Amrit Sarkar
Se
I really doubt that is going to do anything, TieredMergePolicyFactory does not
pass the settings from Solr to TieredMergePolicy.
Thanks,
Markus
-Original message-
> From:Emir Arnautović
> Sent: Wednesday 4th October 2017 14:33
> To: solr-user@lucene.apache.org
> Subject: Re: Very hi
Do you mean a periodic forceMerge? That is usually considered a bad habit on
this list (i agree). It is just that i am actually very surprised this can
happen at all with default settings. This factory, unfortunately does not seem
to support settings configured in solrconfig.
Thanks,
Markus
-
Hi Emir,
can you point out which commit you are using for expungeDeletes true/false?
My commit has only
commit(String collection, boolean waitFlush, boolean waitSearcher, boolean
softCommit)
Or is expungeDeletes true/false a special combination of the boolean parameters?
Regards, Bernd
Am 04.
Did you _ever_ do a forceMerge/optimize or expungeDeletes?
Here's the problem TieredMergePolicy (TMP) has a maximum segment size
it will allow, 5G by default. No segment is even considered for
merging unless it has < 2.5G (or half whatever the default is)
non-deleted docs, the logic being that to
Hi Markus,
It is passed but not explicitly - it uses reflection to pass arguments - take a
look at parent factory class.
When it comes to force merging - you have extreme case - 80% is deleted (my
guess frequent updates) and extreme cases require some extreme measures - it
can be either periodi
Do not use expungedeletes even if you find a way to call it in the
scenario you're talking about. First of all I think you'll run into
the issue here: https://issues.apache.org/jira/browse/LUCENE-7976
Second it is a very heavy weight operation. It potentially rewrites
_all_ of your index and it so
There is a very large amount of data and there will be a constant addition of
more data. There will be hundreds of millions if not billions of items.
We have to be able to be able to be constantly indexing items but also allow
for searching. Sadly there is no way to know the amount of searching th
Hi Bernd,
I guess it is not exposed in Solrj. Maybe for good reason - it is rarely good
to call it. You might better set reclaimDeletesWeight in your merge config and
keep number of deleted docs under control that way.
Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
S
Check. The problem is they don't encode the exact length. I _think_
this patch shows you'd be OK with shorter lengths, but check:
https://issues.apache.org/jira/browse/LUCENE-7730.
Note it's not the patch that counts here, just look at the table of lengths.
Best,
Erick
On Wed, Oct 4, 2017 at 4:2
You'll almost certainly have to shard then. First of all Lucene has a
hard limit of 2^31 docs in a single index so there's a 2B limit.
There's no such limit on the number of docs in the collection (i.e. 5
shards each can have 2B docs for 10B docs total in the collection).
But nobody that I know of
rapid updates aren't the cause of a large percentage of deleted
documents. See the JIRA I referenced for the probable cause:
https://issues.apache.org/jira/browse/LUCENE-7976
If my suspicion is correct you'll see one or more of your segments
occupy way more than 5G. Assuming my suspicion is correc
How does it not work for you? Details matter, an example set of values and
the response from Solr are good bits of info for us to have.
On Tue, Oct 3, 2017 at 3:59 PM, Bruno Mannina
wrote:
> Dear all,
>
>
>
> Is it possible to have a colored highlight in a multi-value field ?
>
>
>
> I’m succeed
Ah thanks for that!
-Original message-
> From:Emir Arnautović
> Sent: Wednesday 4th October 2017 15:03
> To: solr-user@lucene.apache.org
> Subject: Re: Very high number of deleted docs
>
> Hi Markus,
> It is passed but not explicitly - it uses reflection to pass arguments - take
> a lo
No, that collection never receives a forceMerge nor expungeDeletes. Almost all
(99.999%) documents are overwritten every 90 minutes.
A single shard has 16k docs (97k total) but is only 300 MB large. Maybe that's
a problem there.
I can simply turn a switch to forgeMerge after the periodic update
Hi,
I am trying to use hbase-indexer to index hbase table to Solr,
Solr 6.6
Hbase-Indexer 1.6
Hbase 1.2.5 with Kerberos enabled,
After putting new test rows into the Hbase table, I got the following error
from hbase-indexer thus it cannot write the data to solr :
WARN ipc.AbstractRpcClient: Ex
Hmmm, OK, I stand corrected.
This is odd, though. I suspect a quirk in the merging algorithm when
you have a small index..
Ahh, wait. What happens if you modify the segments per tier parameter
of TMP? The default is 10, and perhaps because this is such a small
index you don't have very many like
I have a usecase where:
if a document has the search string in it's name_property field, then I want
to show that document on top. If multiple document has the search string in
it's name_property field then I want to sort them by creation date.
Following is my query:
q={!boost+b=recip(ms(NOW,crea
Thank you Alexandre! It worked great. :)
And here is how it is configured, if someone else wants to do this, but is too
busy to read the documentation for these classes:
source_field
target_field
target_field
Well, that made a difference! Now we're back at 64 MB per replica.
Thanks,
Markus
-Original message-
> From:Erick Erickson
> Sent: Wednesday 4th October 2017 16:19
> To: solr-user
> Subject: Re: Very high number of deleted docs
>
> Hmmm, OK, I stand corrected.
>
> This is odd, tho
Hi,
I am seeing that in different test runs (e.g., by executing 'ant test' on
the root folder in 'lucene-solr') a different subset of tests are skipped.
Where can I find more about it? I am trying to create parity between test
successes before and after my changes and this is causing confusion.
Hi,
I have some custom code in solr (which is not of good quality for
contributing back) so I need to setup my own continuous build solution. I
tried jenkins and was hoping that ant build (ant clean compile) in Execute
Shell textbox will work, but I am stuck at this ivy-fail error:
To work around
: Ok, it has been resolved. I was lucky to have spotted i was looking at
: the wrong schema fike! The one the test actually used was not yet
: updated from Trie to Point!
And boom goes the dynamite.
This is a prime example of where having assumptions in your code (that the
field type will by
ah, thanks for the link.
--
John Blythe
On Wed, Oct 4, 2017 at 9:23 AM, Erick Erickson
wrote:
> Check. The problem is they don't encode the exact length. I _think_
> this patch shows you'd be OK with shorter lengths, but check:
> https://issues.apache.org/jira/browse/LUCENE-7730.
>
> Note it's
Hi list,
I'm trying to search for the term funktionsnedsättning*
In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
So I would expect that funktionsnedsättning* would translate to
funktionsnedsattning*.
If I use e.g. the lucene query parser, this is indeed what happens:
...de
Hi,
Firstly, if Solr returns an error referencing an exception then you can
look in Solr's logs for the stack trace, which helps debugging problems a
ton (at least for Solr devs).
I suspect that the problem here is that your schema might have a dynamic
field where *coordinates is defined to be a
Does anyone use hbase indexer in index kerberos Hbase to solr?
Pls help!
On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss wrote:
> Hi,
>
> I am trying to use hbase-indexer to index hbase table to Solr,
>
> Solr 6.6
> Hbase-Indexer 1.6
> Hbase 1.2.5 with Kerberos enabled,
>
>
> After putting new test
So, i looked at this setup
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console which
is using Maven, so i switched to maven too.
I am hitting following error with maven build:
Is that expected? Can someone share me the details about how
https://builds.apache.org/job/Lucene-Solr-Mav
I looked at
https://builds.apache.org/job/Lucene-Solr-Maven-master/2111/console and
decided to switch to maven. However my maven build (without jenkins) is
failing with this error:
[INFO] Scanning classes for violations...
[ERROR] Forbidden class/interface use: org.bouncycastle.util.Strings
[non-p
Hi Nawab,
> On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal wrote:
>
> I am hitting following error with maven build:
> Is that expected?
No. What commands did you use?
> Can someone share me the details about how
> https://builds.apache.org/job/Lucene-Solr-Maven-master is configured.
The
Hi Steve,
I did this:
ant get-maven-poms
cd maven-build/
mvn -DskipTests install
On Wed, Oct 4, 2017 at 4:56 PM, Steve Rowe wrote:
> Hi Nawab,
>
> > On Oct 4, 2017, at 7:39 PM, Nawab Zada Asad Iqbal
> wrote:
> >
> > I am hitting following error with maven build:
> > Is that expected?
>
>
Ascot,
At the risk of ... Can you disable Kerberos in Hbase? If not, then you
will have to provide a password!
Rick
On 2017-10-04 07:32 PM, Ascot Moss wrote:
Does anyone use hbase indexer in index kerberos Hbase to solr?
Pls help!
On Wed, Oct 4, 2017 at 10:18 PM, Ascot Moss wrote:
Hi
When I run those commands (on Debian Linux 8.9, with Maven v3.0.5 and Oracle
JDK 1.8.0.77), I get:
-
[INFO]
[INFO] BUILD SUCCESS
[INFO]
[INFO] Tota
There are some tests annotated @Nightly or @Weekly, or @Slow, is there
a correlation to those?
Best,
Erick
On Wed, Oct 4, 2017 at 8:59 AM, Nawab Zada Asad Iqbal wrote:
> Hi,
>
> I am seeing that in different test runs (e.g., by executing 'ant test' on
> the root folder in 'lucene-solr') a differ
On 9/29/2017 6:34 AM, John Blythe wrote:
complete noob as to solrcloud here. almost-non-noob on solr in general.
we're experiencing growing pains in our data and am thinking through moving
to solrcloud as a result. i'm hoping to find out if it seems like a good
strategy or if we need to get othe
Hi,
Here is a discussion we had recently with a fellow Solr user.
It seems reasonable to me and wanted to see if this is an accepted theory.
The bit-vectors in filterCache are as long as the maximum number of
documents in a core. If there are a billion docs per core, every bit vector
will have a
43 matches
Mail list logo