Re: Frequent deletions

2015-01-13 Thread Shawn Heisey
On 1/13/2015 12:10 AM, ig01 wrote: > Unfortunately this is the case, we do have hundreds of millions of documents > on one > Solr instance/server. All our configs and schema are with default > configurations. Our index > size is 180G, does that mean that we need at least 180G heap size? If you ha

Re: Frequent deletions

2015-01-12 Thread ig01
Hi, Unfortunately this is the case, we do have hundreds of millions of documents on one Solr instance/server. All our configs and schema are with default configurations. Our index size is 180G, does that mean that we need at least 180G heap size? Thanks. -- View this message in context: htt

Re: Frequent deletions

2015-01-12 Thread Shawn Heisey
On 1/10/2015 11:46 PM, ig01 wrote: > Thank you all for your response, > The thing is that we have 180G index while half of it are deleted documents. > We tried to run an optimization in order to shrink index size but it > crashes on ‘out of memory’ when the process reaches 120G. > Is it possibl

Re: Frequent deletions

2015-01-12 Thread ig01
Hi, We gave 120G to JVM, while we have 140G memory on this machine. We use the default merge policy("TieredMergePolicy"), and there are 54 segments in our index. We tried to perform an optimization with different numbers of maxSegments (53 and less) it didn't help. How much memory we need for 180G

Re: Frequent deletions

2015-01-11 Thread David Santamauro
[ disclaimer: this worked for me, ymmv ... ] I just battled this. Turns out incrementally optimizing using the maxSegments attribute was the most efficient solution for me. In particular when you are actually running out of disk space. #!/bin/bash # n-segments I started with high=400 # n-segme

Re: Frequent deletions

2015-01-11 Thread Erick Erickson
OK, why can't you give the JVM more memory, perhaps on a one-time basis to get past this problem? You've never told us how much memory you give the JVM in the first place. Best, Erick On Sun, Jan 11, 2015 at 7:54 AM, Jack Krupansky wrote: > Usually, Lucene will be optimizing (merging) segments o

Re: Frequent deletions

2015-01-11 Thread Jack Krupansky
Usually, Lucene will be optimizing (merging) segments on the fly so that you should only have a fraction of your total deletions present in the index and should never have an absolute need to do an old-fashioned full optimize. What merge policy are you using? Is Solr otherwise running fine other

Re: Frequent deletions

2015-01-11 Thread Alexandre Rafalovitch
I believe if you delete all documents in a segment, that segment as a whole goes away. A segment is created on every commit whether you reopen the searcher or not. Do you know what documents would be deleted later (are there are natural clusters). If yes, perhaps there is a way to index them so th

Re: Frequent deletions

2015-01-11 Thread ig01
Hi, It's not an option for us, all the documents in our index have same deletion probability. Is there any other solution to perform an optimization in order to reduce index size? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-deletions-tp41766

Re: Frequent deletions

2015-01-11 Thread Michał B . .
Not directly in your subject but you could look at this patch https://issues.apache.org/jira/browse/SOLR-6841 it implements visualization of solr(lucene) segments with exact information of how much deletions are present in each segment. Looking at this one you could - of course next time - react li

Re: Frequent deletions

2015-01-11 Thread Jürgen Wagner (DVT)
Maybe you should consider creating different generations of indexes and not keep everything in one index. If the likelihood of documents being deleted is rather high in, e.g., the first week or so, you could have one index for the high-probability of deletion documents (the fresh ones) and a second

Re: Frequent deletions

2015-01-11 Thread ig01
Thank you all for your response, The thing is that we have 180G index while half of it are deleted documents. We tried to run an optimization in order to shrink index size but it crashes on ‘out of memory’ when the process reaches 120G. Is it possible to optimize parts of the index? Please adv

RE: Frequent deletions

2015-01-06 Thread Amey Jadiye
Well, we are doing same thing(in a way). we have to do frequent deletions in mass, at a time we are deleting around 20M+ documents.All i am doing is after deletion i am firing the below command on each of our solr node and keep some patience as it take way much time. curl -vvv "http://node1.so

Re: Frequent deletions

2015-01-01 Thread Alexandre Rafalovitch
Is there a specific list of which data structures are "sparce" and "non-sparce" for Lucene and Solr (referencing G+ post)? I imagine this is obvious to low-level hackers, but could actually be nice to summarize it somewhere for troubleshooting. Regards, Alex. Sign up for my Solr resources

Re: Frequent deletions

2015-01-01 Thread Michael McCandless
Also see this G+ post I wrote up recently showing how %tg deletions changes over time for an "every add also deletes a previous document" stress test: https://plus.google.com/112759599082866346694/posts/MJVueTznYnD Mike McCandless http://blog.mikemccandless.com On Wed, Dec 31, 2014 at 12:21 PM,

Re: Frequent deletions

2014-12-31 Thread Erick Erickson
It's usually not necessary to optimize, as more indexing happens you should see background merges happen that'll reclaim the space, so I wouldn't worry about it unless you're seeing actual problems that have to be addressed. Here's a great visualization of the process: http://blog.mikemccandless.c