Re: Deleted Docs increasing in Solr 6.1.0

2019-11-04 Thread Erick Erickson
First of all I wouldn’t worry about it unless you have a _significant_ number of deleted docs. The default TMP as of around Solr 7.5 should accumulate up to around 33% deleted docs. Prior to 7.5, the number of deleted docs could hover around 50% depending on the access pattern. expungeDeletes

Deleted Docs increasing in Solr 6.1.0

2019-11-04 Thread vishal patel
We have 2 shards and 2 replicas in a testing environment.Deleted Docs are 18749 for one collection[documents].I have attached a screenshot of solr admin panel. (1) Would there any impact on disk size if deleted docs will increase? (2) We try to remove deleted doc by executing command : curl

RE: Very high number of deleted docs, part 2

2018-01-11 Thread Markus Jelsma
ick Erickson > Sent: Wednesday 10th January 2018 22:41 > To: solr-user > Subject: Re: Very high number of deleted docs, part 2 > > There's some background here: > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ > > the 2.5 "live&

Re: Very high number of deleted docs, part 2

2018-01-10 Thread Erick Erickson
expungeDeletes did not do the job in testing Surprising. What actually happened? Do note that expungeDeletes does not promise to remove all deleted docs, it merges segments with < (some percentage) deleted documents. Best, Erick On Wed, Jan 10, 2018 at 9:45 AM, Markus Jelsma wrote: > Wel

RE: Very high number of deleted docs, part 2

2018-01-10 Thread Markus Jelsma
> Subject: Re: Very high number of deleted docs, part 2 > > I'm not 100% sure that playing with maxSegments will work. > > what will work is to re-index everything. You can re-index into the > existing collection, no need to start with a new collection. Eventually > you

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Erick Erickson
izing it again, with maxSegments set to ten, it should > recover right? > > -Original message- > > From:Shawn Heisey > > Sent: Friday 5th January 2018 14:34 > > To: solr-user@lucene.apache.org > > Subject: Re: Very high number of deleted docs, part

RE: Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
4 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs, part 2 > > On 1/5/2018 5:33 AM, Markus Jelsma wrote: > > Another collection, now on 7.1, also shows this problem and has default TMP > > settings. This time size is different, each shard of t

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Shawn Heisey
ou never optimize, then there will never be a segment larger than 5GB, and the deleted document percentage would be less likely to get out of control.  The optimize operation ignores the maximum segment size and reduces the index to a single large segment with zero deleted docs. TMP's b

Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
arkus [1] http://lucene.472066.n3.nabble.com/Very-high-number-of-deleted-docs-td4357327.html

Re: max docs, deleted docs optimization

2017-11-01 Thread kshitij tyagi
In an > index with only 10 lakh docs, it's unlikely even having 50% deleted > documents is going to make much of a difference. > > 3> Yes, the deleted docs are in segment until it's merged away. Lucene > is very efficient (according to Mike McCandless) at skipping d

Re: max docs, deleted docs optimization

2017-10-31 Thread Erick Erickson
s, the deleted docs are in segment until it's merged away. Lucene is very efficient (according to Mike McCandless) at skipping deleted docs. 4> It rewrites all segments, purging deleted documents. However, it has some pitfalls, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-

max docs, deleted docs optimization

2017-10-31 Thread kshitij tyagi
Hi, I am using atomic update to update one of the fields, I want to know : 1. if total docs in core are 10 lakh and I partially update 2 lakhs docs then what will be the number of deleted docs? 2. Does higher number of deleted docs have affect on query time? means does query time increases if

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Well, that made a difference! Now we're back at 64 MB per replica. Thanks, Markus -Original message- > From:Erick Erickson > Sent: Wednesday 4th October 2017 16:19 > To: solr-user > Subject: Re: Very high number of deleted docs > > Hmmm, OK, I stand corr

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
the periodic update cycle, but > i preferred Lucene to do it for me. > > Thanks, > Markus > > -Original message- >> From:Erick Erickson >> Sent: Wednesday 4th October 2017 14:56 >> To: solr-user >> Subject: Re: Very high number of deleted docs >

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
iodic update cycle, but i preferred Lucene to do it for me. Thanks, Markus -Original message- > From:Erick Erickson > Sent: Wednesday 4th October 2017 14:56 > To: solr-user > Subject: Re: Very high number of deleted docs > > Did you _ever_ do a forceMerge/optimize or expu

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Ah thanks for that! -Original message- > From:Emir Arnautović > Sent: Wednesday 4th October 2017 15:03 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > It is passed but not explicitly - it uses reflection to pass

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
spicion is correct, you have to either periodically optimize/forceMerge or expungeDeletes regularly. At that point, though, you might as well optimize/forceMerge. expungeDeletes would only save you re-writing segments with < 20% deleted docs (at least I think that's the cutoff). Or reindex from

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
rkus > > -Original message- >> From:Amrit Sarkar >> Sent: Wednesday 4th October 2017 14:42 >> To: solr-user@lucene.apache.org >> Subject: Re: Very high number of deleted docs >> >> Hi Markus, >> >> Emir already mentioned tuning *reclaimDeletesWei

Re: Very high number of deleted docs

2017-10-04 Thread Erick Erickson
Did you _ever_ do a forceMerge/optimize or expungeDeletes? Here's the problem TieredMergePolicy (TMP) has a maximum segment size it will allow, 5G by default. No segment is even considered for merging unless it has < 2.5G (or half whatever the default is) non-deleted docs, the logic being

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
-Original message- > From:Amrit Sarkar > Sent: Wednesday 4th October 2017 14:42 > To: solr-user@lucene.apache.org > Subject: Re: Very high number of deleted docs > > Hi Markus, > > Emir already mentioned tuning *reclaimDeletesWeight which *affects segments &

RE: Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
ct: Re: Very high number of deleted docs > > Hi Markus, > You can set reclaimDeletesWeight in merge settings to some higher value than > default (I think it is 2) to favor segments with deleted docs when merging. > > HTH, > Emir > -- > Monitoring - Log Managemen

Re: Very high number of deleted docs

2017-10-04 Thread Amrit Sarkar
Hi Markus, Emir already mentioned tuning *reclaimDeletesWeight which *affects segments about to merge priority. Optimising index time by time, preferably scheduling weekly / fortnight / ..., at low traffic period to never be in such odd position of 80% deleted docs in total index. Amrit Sarkar

Re: Very high number of deleted docs

2017-10-04 Thread Emir Arnautović
Hi Markus, You can set reclaimDeletesWeight in merge settings to some higher value than default (I think it is 2) to favor segments with deleted docs when merging. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training -

Very high number of deleted docs

2017-10-04 Thread Markus Jelsma
Hello, Using a 6.6.0, i just spotted one of our collections having a core of which over 80 % of the total number of documents were deleted documents. It has configured with no non-default settings. Is this supposed to happen? How can i prevent these kind of numbers? Thanks, Markus

Re: Solr Deleted Docs Issue

2015-03-19 Thread Shawn Heisey
On 3/19/2015 12:24 AM, vicky desai wrote: > I fail to understand why this deleted docs are not removed from index on > merging. Is there a good documentation which explains how exactly is merging > done? > > What can I do to solve this problem other than optimization? Deleted doc

Re: Solr Deleted Docs Issue

2015-03-18 Thread vicky desai
have observed is though merge factor seems to work we always end up with around 6 lakh deleted docs in index daily. On optimizing all this deleted docs are removed. We benefit on memory as well as query speed on optimization. But as I understand its a small time gain and situation repeats itself

Re: Solr Deleted Docs Issue

2015-03-16 Thread Erick Erickson
bq: If this operation is continuously done I would end up with a large set of deleted docs which will affect the performance of the queries I hit on this solr. No, you won't. They'll be "merged away" as background segments are merged. Here's a great visualization of t

Re: Solr Deleted Docs Issue

2015-03-16 Thread Shawn Heisey
ter every 10th update and so the max Segment Count I can > have is 10 which is fine. However even when merging happens deleted docs are > not cleared and I end up with 100 deleted docs in index. > > If this operation is continuously done I would end up with a large set of > deleted docs whi

Solr Deleted Docs Issue

2015-03-16 Thread vicky desai
after every 10th update and so the max Segment Count I can have is 10 which is fine. However even when merging happens deleted docs are not cleared and I end up with 100 deleted docs in index. If this operation is continuously done I would end up with a large set of deleted docs which will affect the

Re: Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
I understand that changes would be expensive, but shouldn't the cache simply skip the deleted docs? In the same way as the cache for multivalued fields (that accepts livedocs bits). Thanks, roman On Wed, Nov 27, 2013 at 6:26 PM, Erick Erickson wrote: > Yep, it's expected. Segme

Re: Caches contain deleted docs (?)

2013-11-27 Thread Erick Erickson
ld); > FieldCache.DEFAULT.getInts(reader, idField, false); > > > the resulting arrays *will* contain entries for deleted docs, so to filter > them out, one has to manually check livedocs. Is this the expected > behaviour? I don't understand why the cache would be bothering to load data > for deleted docs. This is on SOLR4.0 > > Thanks! > > roman >

Caches contain deleted docs (?)

2013-11-27 Thread Roman Chyla
*will* contain entries for deleted docs, so to filter them out, one has to manually check livedocs. Is this the expected behaviour? I don't understand why the cache would be bothering to load data for deleted docs. This is on SOLR4.0 Thanks! roman

Re: Deleted Docs

2013-07-09 Thread Shawn Heisey
On 7/9/2013 3:38 PM, Katie McCorkell wrote: I am curious about the "Deleted Docs:" statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some

Re: Deleted Docs

2013-07-09 Thread Jack Krupansky
Jack Krupansky -Original Message- From: Katie McCorkell Sent: Tuesday, July 09, 2013 5:38 PM To: solr-user@lucene.apache.org Subject: Deleted Docs Hello, I am curious about the "Deleted Docs:" statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexin

Deleted Docs

2013-07-09 Thread Katie McCorkell
Hello, I am curious about the "Deleted Docs:" statistic on the solr/#/collection1 Overview page. Does Solr remove docs while indexing? I thought it only did that when Optimizing, however my instance had 726 Deleted Docs, but then after adding some documents that number decreased, eventu

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Nagendra Nagarajayya
ext:http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-18 Thread Grijesh
optimize ensures that deleted docs and terms will not be displayed. - Thanx: Grijesh www.gettinhahead.co.in -- View this message in context: http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178670.html Sent from the Solr - User mailing list

Re: Deleted docs in IndexWriter Cache (NRT related)

2011-07-17 Thread pravesh
commit would be the safest way for making sure the deleted content doesn't show up. Thanx Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/Deleted-docs-in-IndexWriter-Cache-NRT-related-tp3177877p3178179.html Sent from the Solr - User mailing list archive at Nabble.com.

Deleted docs in IndexWriter Cache (NRT related)

2011-07-17 Thread Nagendra Nagarajayya
Hi! If a document with an unique id is added again, the new document is added by deleting/marking the older doc as deleted. So when a search is made with an IndexReader obtained from the IndexWriter (for NRT) both the docs show up, the older doc and the newer updated doc. To prevent the olde

Re: Remove the deleted docs from the Solr Index

2010-01-04 Thread Shalin Shekhar Mangar
On Wed, Dec 30, 2009 at 12:10 AM, Mohamed Parvez wrote: > Ditto. There should have been an DIH command to re-sync the Index with the > DB. > But there is such a command; it is called full-import. -- Regards, Shalin Shekhar Mangar.

Re: Remove the deleted docs from the Solr Index

2010-01-03 Thread Ravi Gidwani
Lance: At times we dont have the freedom make these Database changes. Currently I am in this situation. Hence the requirement on the DIH. ~Ravi. On Sat, Jan 2, 2010 at 3:44 PM, Lance Norskog wrote: > The other option is to have a 'deleted' column in your table, and have > the applica

Re: Remove the deleted docs from the Solr Index

2010-01-02 Thread Lance Norskog
The other option is to have a 'deleted' column in your table, and have the application 'delete' operation set that field. In the DIH you query this column with 'deletedPkQuery'. Or, you can use triggers to maintain a new table with the IDs of deleted rows. This will allow you to have a batch job t

Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Mohamed Parvez
Ditto. There should have been an DIH command to re-sync the Index with the DB. Right now it looks like one way street form DB to Index. On Tue, Dec 29, 2009 at 3:07 AM, Ravi Gidwani wrote: > Hi Shalin: > > > I get your point about not knowing what has been deleted from > the database.

Re: Remove the deleted docs from the Solr Index

2009-12-29 Thread Ravi Gidwani
Hi Shalin: > I get your point about not knowing what has been deleted from the > database. So this is what even I am looking for: > > 0) A document (id=100) is currently part of solr index.( > 1) Lets say the application deleted a record with id=100 from database. > > 2) Now I need to e

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Shalin Shekhar Mangar
On Tue, Dec 29, 2009 at 3:03 AM, Mohamed Parvez wrote: > I have looked in the that thread earlier. But there is no option there for > a > solution from Solr side. > > I mean the two more options there are > 1] Use database triggers instead of DIH to manage updating the index :- > This out of ques

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Mohamed Parvez
I have looked in the that thread earlier. But there is no option there for a solution from Solr side. I mean the two more options there are 1] Use database triggers instead of DIH to manage updating the index :- This out of question as we cant run 1000 odd triggers every hour to delete. 2] Some s

Re: Remove the deleted docs from the Solr Index

2009-12-28 Thread Mauricio Scheffer
Here's a couple more options: http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents/ Cheers, Mauricio On Mon, Dec 28, 2009 at 5:51 PM, Mohamed Parvez wrote: > I am using Solr 1.

Remove the deleted docs from the Solr Index

2009-12-28 Thread Mohamed Parvez
I am using Solr 1.4 and DIH to build the index from a table. I use full import once to create the index and then i keep using delta import to update the index. All woks fine as long a the table gets added with only new rows. if there are some rows in the table that get deleted then the index doe