Re: optimize boosting parameters

2020-12-08 Thread Derek Poh
We monitor the response time (pingdom) of the page that uses these boosting parameters. Since the addition of these boosting parameters and an additional field to search on (which I will create a thread on it in the mailing list), the page average response time has increased by 1-2 seconds. Ma

Re: optimize boosting parameters

2020-12-08 Thread Erick Erickson
Before worrying about it too much, exactly _how_ much has the performance changed? I’ve just been in too many situations where there’s no objective measure of performance before and after, just someone saying “it seems slower” and had those performance changes disappear when a rigorous test is don

Re: optimize boosting parameters

2020-12-07 Thread Radu Gheorghe
Hi Derek, Ah, then my reply was completely off :) I don’t really see a better way. Maybe other than changing termfreq to field, if the numeric field has docValues? That may be faster, but I don’t know for sure. Best regards, Radu -- Sematext Cloud - Full Stack Observability - https://sematext.

Re: optimize boosting parameters

2020-12-07 Thread Derek Poh
Hi Radu Apologies for not making myself clear. I would like to know if there is a more simple or efficient way to craft the boosting parameters based on the requirements. For example, I am using 'if', 'map' and 'termfreq' functions in the bf parameters. Is there a more efficient or simple

Re: optimize boosting parameters

2020-12-07 Thread Radu Gheorghe
Hi Derek, It’s hard to tell whether your boosts can be made better without knowing your data and what users expect of it. Which is a problem in itself. I would suggest gathering judgements, like if a user queries for X, what doc IDs do you expect to get back? Once you have enough of these judg

Re: Optimize solr 8.4.1

2020-02-26 Thread Erick Erickson
As long as you have an http connection, you can use the replication API fetchindex command to, well, fetch an index. But that copies the index but does not shard it. I guess you could fetch into a single shard collection and then use splitshard. All that said, you'll have to reindex sometime if yo

Re: Optimize solr 8.4.1

2020-02-26 Thread Dario Rigolin
Hi Massimiliano, the only way to reindex is to resend all documents to the indexer of the Cloud instance. At the moment solr doesn't have the ability to do it when the schema is changed or to "send" indexed data to a SolrCloud from a non cloud . For example we have in solr a field with an only sto

Re: Optimize solr 8.4.1

2020-02-26 Thread Massimiliano Randazzo
Hi Paras, thank you for your answer if you don't mind I would have a couple of questions I am experiencing very long indexing times I have 8 servers for currently working on 1 instance of Solr, I thought of moving to a cloud of 4 solr servers with 3 zookeeeper servers to distribute the load but I

Re: Optimize solr 8.4.1

2020-02-26 Thread Paras Lehana
Hi Massimiliano, Is it still necessary to run the Optimize command from my application when > I have finished indexing? I guess you can stop worrying about optimizations and let Solr handle that implicitly. There's nothing so bad about having more segments. On Wed, 26 Feb 2020 at 16:02, Massimi

Re: Optimize question

2018-04-23 Thread Shawn Heisey
On 4/23/2018 11:13 AM, Scott M. wrote: I recently installed Solr 7.1 and configured it to work with Dovecot for full-text searching. It works great but after about 2 days of indexing, I've pressed the 'Optimize' button. At that point it had collected about 17 million documents and it was takin

Re: Optimize question

2018-04-23 Thread Erick Erickson
No, it's not "optimizing on its own". At least it better not be. As far as your index growing after optimize, that's the little "gotcha" with optimize, see: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ This is being addressed in the 7.4 time frame (hopeful

RE: Optimize stalls at the same point

2017-07-25 Thread Markus Jelsma
- > From:Walter Underwood > Sent: Tuesday 25th July 2017 22:39 > To: solr-user@lucene.apache.org > Subject: Re: Optimize stalls at the same point > > I’ve never been fond of elaborate GC settings. I prefer to set a few things > then let it run. I know someone wh

Re: Optimize stalls at the same point

2017-07-25 Thread David Hastings
to spare. Your max heap is over a 100 times larger than ours, > your index just around 16 times. It should work with less. > > > > As a bonus, with a smaller heap, you can have much more index data in > mapped memory. > > > > Regards, > > Markus > > > >

Re: Optimize stalls at the same point

2017-07-25 Thread Walter Underwood
an have much more index data in mapped > memory. > > Regards, > Markus > > -Original message- >> From:David Hastings >> Sent: Tuesday 25th July 2017 22:15 >> To: solr-user@lucene.apache.org >> Subject: Re: Optimize stalls at the same point >> >>

RE: Optimize stalls at the same point

2017-07-25 Thread Markus Jelsma
y 2017 22:15 > To: solr-user@lucene.apache.org > Subject: Re: Optimize stalls at the same point > > it turned out that i think it was a large GC operation, as it has since > resumed optimizing. current java options are as follows for the indexing > server (they are different fo

Re: Optimize stalls at the same point

2017-07-25 Thread David Hastings
it turned out that i think it was a large GC operation, as it has since resumed optimizing. current java options are as follows for the indexing server (they are different for the search servers) if you have any suggestions as to changes I am more than happy to hear them, honestly they have just b

Re: Optimize stalls at the same point

2017-07-25 Thread Walter Underwood
Are you sure you need a 100GB heap? The stall could be a major GC. We run with an 8GB heap. We also run with Xmx equal to Xms, growing memory to the max was really time-consuming after startup. What version of Java? What GC options? wunder Walter Underwood wun...@wunderwood.org http://observer.

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Erick Erickson
I agree with everyone else that this seems very unusual, but here are some additional possible options: If (and only if) you're returning "simple" (i.e. numerics and strings) you could consider the Streaming Aggregation stuff. It's built to return rows without going to disk. The restriction is tha

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Jack Krupansky
Thanks for that critical clarification. Try... 1. A different response writer to see if that impacts the clock time. 2. Selectively remove fields from the fl field list to see if some particular field has some issue. 3. If you simply return only the ID for the document, how fast/slow is that? How

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Shawn Heisey
On 2/12/2016 2:57 AM, Matteo Grolla wrote: > tell me if I'm wrong but qtime accounts for search time excluding the > fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the > results on the client on a LAN infrastructure for 300kB response). debug > explains how much of qtime

Re: optimize requests that fetch 1000 rows

2016-02-12 Thread Matteo Grolla
Hi Jack, tell me if I'm wrong but qtime accounts for search time excluding the fetch of stored fields (I have a 90ms qtime and a ~30s time to obtain the results on the client on a LAN infrastructure for 300kB response). debug explains how much of qtime is used by each search component. For me

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Again, first things first... debugQuery=true and see which Solr search components are consuming the bulk of qtime. -- Jack Krupansky On Thu, Feb 11, 2016 at 11:33 AM, Matteo Grolla wrote: > virtual hardware, 200ms is taken on the client until response is written to > disk > qtime on solr is ~90

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Out of curiosity, have you tried to debug that solr version to see which text arrives to the splitOnTokens method ? In latest solr that part has changed completely. Would be curious to understand what it tries to tokenise by ? and * ! Cheers On 11 February 2016 at 16:33, Matteo Grolla wrote: >

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
virtual hardware, 200ms is taken on the client until response is written to disk qtime on solr is ~90ms not great but acceptable Is it possible that the method FilenameUtils.splitOnTokens is really so heavy when requesting a lot of rows on slow hardware? 2016-02-11 17:17 GMT+01:00 Jack Krupansky

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Good to know. Hmmm... 200ms for 10 rows is not outrageously bad, but still relatively bad. Even 50ms for 10 rows would be considered barely okay. But... again it depends on query complexity - simple queries should be well under 50 ms for decent modern hardware. -- Jack Krupansky On Thu, Feb 11, 2

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Jack, response time scale with rows. Relationship doens't seem linear but Below 400 rows times are much faster, I view query times from solr logs and they are fast the same query with rows = 1000 takes 8s with rows = 10 takes 0.2s 2016-02-11 16:22 GMT+01:00 Jack Krupansky : > Are querie

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Are queries scaling linearly - does a query for 100 rows take 1/10th the time (1 sec vs. 10 sec or 3 sec vs. 30 sec)? Does the app need/expect exactly 1,000 documents for the query or is that just what this particular query happened to return? What does they query look like? Is it complex or use

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Responses have always been slow but previously time was dominated by faceting. After few optimization this is my bottleneck. My suggestion has been to properly implement paging and reduce rows, unfortunately this is not possible at least not soon 2016-02-11 16:18 GMT+01:00 Jack Krupansky : > Is t

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Jack Krupansky
Is this a scenario that was working fine and suddenly deteriorated, or has it always been slow? -- Jack Krupansky On Thu, Feb 11, 2016 at 4:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request 1000 rows to solr. > Unfortun

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
[image: Immagine incorporata 1] 2016-02-11 16:05 GMT+01:00 Matteo Grolla : > I see a lot of time spent in splitOnTokens > > which is called by (last part of stack trace) > > BinaryResponseWriter$Resolver.writeResultsBody() > ... > solr.search.ReturnsField.wantsField() > commons.io.FileNameUtils.w

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
I see a lot of time spent in splitOnTokens which is called by (last part of stack trace) BinaryResponseWriter$Resolver.writeResultsBody() ... solr.search.ReturnsField.wantsField() commons.io.FileNameUtils.wildcardmatch() commons.io.FileNameUtils.splitOnTokens() 2016-02-11 15:42 GMT+01:00 Matte

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 9:42 AM, Matteo Grolla wrote: > Hi Yonic, > after the first query I find 1000 docs in the document cache. > I'm using curl to send the request and requesting javabin format to mimic > the application. > gc activity is low > I managed to load the entire 50GB index in th

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Yonic, after the first query I find 1000 docs in the document cache. I'm using curl to send the request and requesting javabin format to mimic the application. gc activity is low I managed to load the entire 50GB index in the filesystem cache, after that queries don't cause disk activity an

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Yonik Seeley
On Thu, Feb 11, 2016 at 7:45 AM, Matteo Grolla wrote: > Thanks Toke, yes, they are long times, and solr qtime (to execute the > query) is a fraction of a second. > The response in javabin format is around 300k. OK, That tells us a lot. And if you actually tested so that all the docs would be in t

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Alessandro Benedetti
Hi Matteo, as an addition to Upayavira observation, how is the memory assigned for that Solr Instance ? How much memory is assigned to Solr and how much left for the OS ? Is this a VM on top of a physical machine ? So it is the real physical memory used, or swapping could happen frequently ? Is th

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Thanks Toke, yes, they are long times, and solr qtime (to execute the query) is a fraction of a second. The response in javabin format is around 300k. Currently I can't limit the rows requested or the fields requested, those are fixed for me. 2016-02-11 13:14 GMT+01:00 Toke Eskildsen : > On Thu,

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Toke Eskildsen
On Thu, 2016-02-11 at 11:53 +0100, Matteo Grolla wrote: > I'm working with solr 4.0, sorting on score (default). > I tried setting the document cache size to 2048, so all docs of a single > request fit (2 requests fit actually) > If I execute a query the first time it takes 24s > I reexecu

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Matteo Grolla
Hi Upayavira, I'm working with solr 4.0, sorting on score (default). I tried setting the document cache size to 2048, so all docs of a single request fit (2 requests fit actually) If I execute a query the first time it takes 24s I reexecute it, with all docs in the documentCache and it tak

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Upayavira
On Thu, Feb 11, 2016, at 09:33 AM, Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > The bottleneck are queries that request 1000 rows to solr. > Unfortunately the application can't be modified at the moment, can you > suggest me what could be done on the solr side to

Re: optimize requests that fetch 1000 rows

2016-02-11 Thread Binoy Dalal
If you're fetching large text fields, consider highlighting on them and just returning the snippets. I faced such a problem some time ago and highlighting sped things up nearly 10x for us. On Thu, 11 Feb 2016, 15:03 Matteo Grolla wrote: > Hi, > I'm trying to optimize a solr application. > T

Re: optimize cache-hit-ratio of filter- and query-result-cache

2015-12-01 Thread Erick Erickson
1.1> Absolutely. The filterCache is simply a map. The key is the fq clause, so fq=field1:(1 OR 2 OR 3) is different than fq=field1:(3 OR 2 OR 1). 2.1> not sure. But don't get very hung up on queryResultCache. It's useful pretty much for paging and the hit ration is often very low as it only gets u

Re: optimize cache-hit-ratio of filter- and query-result-cache

2015-12-01 Thread Johannes Siegert
Thanks. The statements on http://wiki.apache.org/solr/SolrCaching#showItems are not explicitly enough for my question.

Re: optimize cache-hit-ratio of filter- and query-result-cache

2015-11-30 Thread Mikhail Khludnev
On Mon, Nov 30, 2015 at 12:46 PM, Johannes Siegert < johannes.sieg...@marktjagd.de> wrote: > Hi, > > some of my solr indices have a low cache-hit-ratio. > > 1 Does sorting the parts of a single filter-query have impact on > filter-cache- and query-result-cache-hit-ratio? > 1.1 Example: fq=field1:(

Re: optimize status

2015-07-02 Thread Summer Shire
Upayavira: I am using solr 4.7 and yes I am using TieredMergePolicy Erick: All my boxes have SSD’s and there isn’t a big disparity between qTime and response time. The performance hit on my end is because of the fragmented index files causing more disk seeks are you mentioned. And I tried reques

Re: optimize status

2015-07-01 Thread Shawn Heisey
On 6/30/2015 6:23 AM, Erick Erickson wrote: > I've actually seen this happen right in front of my eyes "in the > field". However, that was a very high-performance environment. My > assumption was that fragmented index files were causing more disk > seeks especially for the first-pass query response

RE: optimize status

2015-07-01 Thread Reitzel, Charles
ly to match mongodb (indexing speed is not its strong point) ... :-) -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Tuesday, June 30, 2015 2:46 PM To: solr-user@lucene.apache.org Subject: Re: optimize status On Tue, Jun 30, 2015, at 04:42 PM, Shawn Heisey wrote: >

Re: optimize status

2015-06-30 Thread Upayavira
On Tue, Jun 30, 2015, at 04:42 PM, Shawn Heisey wrote: > On 6/29/2015 2:48 PM, Reitzel, Charles wrote: > > I take your point about shards and segments being different things. I > > understand that the hash ranges per segment are not kept in ZK. I guess I > > wish they were. > > > > In this r

Re: optimize status

2015-06-30 Thread Shawn Heisey
On 6/29/2015 2:48 PM, Reitzel, Charles wrote: > I take your point about shards and segments being different things. I > understand that the hash ranges per segment are not kept in ZK. I guess I > wish they were. > > In this regard, I liked Mongodb, uses a 2-level sharding scheme. Each shard

Re: optimize status

2015-06-30 Thread Erick Erickson
I've actually seen this happen right in front of my eyes "in the field". However, that was a very high-performance environment. My assumption was that fragmented index files were causing more disk seeks especially for the first-pass query response in distributed mode. So, if the problem is similar,

Re: optimize status

2015-06-29 Thread Upayavira
We need to work out why your performance is bad without optimise. What version of Solr are you using? Can you confirm that your config is using the TieredMergePolicy? Upayavira Oe, Jun 30, 2015, at 04:48 AM, Summer Shire wrote: > Hi Upayavira and Erick, > > There are two things we are talking a

Re: optimize status

2015-06-29 Thread Summer Shire
Hi Upayavira and Erick, There are two things we are talking about here. First: Why am I optimizing? If I don’t our SEARCH (NOT INDEXING) performance is 100% worst. The problem lies in the number of total segments. We have to have max segments 1 or 2. I have done intensive performance related

RE: optimize status

2015-06-29 Thread Reitzel, Charles
I see what you mean. Many thanks for the details. -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, June 29, 2015 6:36 PM To: solr-user@lucene.apache.org Subject: Re: optimize status Reitzel, Charles wrote: > Question, Toke: in your "i

Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles wrote: > Question, Toke: in your "immutable" cases, don't the benefits of > optimizing come mostly from eliminating deleted records? Not for us. We have about 1 deleted document for every 1000 or 10.000 standard documents. > Is there any material difference in heap, CPU, etc. b

RE: optimize status

2015-06-29 Thread Reitzel, Charles
say 2.5?) be a good best practice? -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, June 29, 2015 3:56 PM To: solr-user@lucene.apache.org Subject: Re: optimize status Reitzel, Charles wrote: > Is there really a good reason to consolidate down t

RE: optimize status

2015-06-29 Thread Reitzel, Charles
:15 PM To: solr-user@lucene.apache.org Subject: RE: optimize status " Is there really a good reason to consolidate down to a single segment?" Archiving (as one example). Come July 1, the collection for log entries/transactions in June will never be changed, so optimizing is actually a good thin

Re: optimize status

2015-06-29 Thread Upayavira
For the sake of history, somewhere around Solr/Lucene 3.2 a new "MergePolicy" was introduced. The old one merged simply based upon age, or "index generation", meaning the older the segment, the less likely it would get merged, hence needing optimize to clear out deletes from your older segments. T

Re: optimize status

2015-06-29 Thread Toke Eskildsen
Reitzel, Charles wrote: > Is there really a good reason to consolidate down to a single segment? In the scenario spawning this thread it does not seem to be the best choice. Speaking more broadly there are Solr setups out there that deals with immutable data, often tied to a point in time, e.g

RE: optimize status

2015-06-29 Thread Garth Grimm
ed properly, and I don't think the value used for routing has anything to do with what segment they happen to be stored into. -Original Message- From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org] Sent: Monday, June 29, 2015 11:38 AM To: solr-user@lucene.apache.org Subject: R

Re: optimize status

2015-06-29 Thread Steven White
here > ...Or am I all wet (again)? > > -Original Message- > From: Walter Underwood [mailto:wun...@wunderwood.org] > Sent: Monday, June 29, 2015 10:39 AM > To: solr-user@lucene.apache.org > Subject: Re: optimize status > > "Optimize" is a manual full mer

RE: optimize status

2015-06-29 Thread Reitzel, Charles
-Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Monday, June 29, 2015 10:39 AM To: solr-user@lucene.apache.org Subject: Re: optimize status "Optimize" is a manual full merge. Solr automatically merges segments as needed. This also expunges deleted

Re: optimize status

2015-06-29 Thread Walter Underwood
“Optimize” is a manual full merge. Solr automatically merges segments as needed. This also expunges deleted documents. We really need to rename “optimize” to “force merge”. Is there a Jira for that? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun

Re: optimize status

2015-06-29 Thread Erick Erickson
Steven: Yes, but First, here's Mike McCandles' excellent blog on segment merging: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html I think the third animation is the TieredMergePolicy. In short, yes an optimize will reclaim disk space. But as you update, this is

Re: optimize status

2015-06-29 Thread Steven White
Hi Upayavira, This is news to me that we should not optimize and index. What about disk space saving, isn't optimization to reclaim disk space or is Solr somehow does that? Where can I read more about this? I'm on Solr 5.1.0 (may switch to 5.2.1) Thanks Steve On Mon, Jun 29, 2015 at 4:16 AM,

Re: optimize status

2015-06-29 Thread Upayavira
I'm afraid I don't understand. You're saying that optimising is causing performance issues? Simple solution: DO NOT OPTIMIZE! Optimisation is very badly named. What it does is squashes all segments in your index into one segment, removing all deleted documents. It is good to get rid of deletes -

Re: optimize status

2015-06-29 Thread Summer Shire
Have to cause of performance issues. Just want to know if there is a way to tap into the status. > On Jun 28, 2015, at 11:37 PM, Upayavira wrote: > > Bigger question, why are you optimizing? Since 3.6 or so, it generally > hasn't been requires, even, is a bad thing. > > Upayavira > >> On Su

Re: optimize status

2015-06-28 Thread Upayavira
Bigger question, why are you optimizing? Since 3.6 or so, it generally hasn't been requires, even, is a bad thing. Upayavira On Sun, Jun 28, 2015, at 09:37 PM, Summer Shire wrote: > Hi All, > > I have two indexers (Independent processes ) writing to a common solr > core. > If One indexer process

Re: Optimize SolrCloud without downtime

2015-03-31 Thread Erick Erickson
I really don't have a good explanation here, those are the default values and the folks who set them up no doubt chose them with some care. Afraid I'll have to defer to people who actually know the code... Erick On Mon, Mar 30, 2015 at 11:59 PM, Pavel Hladik wrote: > When we indexing I see the d

Re: Optimize SolrCloud without downtime

2015-03-31 Thread Pavel Hladik
When we indexing I see the deleted docs are a bit changing.. I was surprised when developer reindex 120M index, we had around 110M of deleted docs and this number was not falling. As you wrote, the typical behavior should be merging deleted docs to 10-20% of whole index? So it should be after two w

Re: Optimize SolrCloud without downtime

2015-03-30 Thread Erick Erickson
Hmmm, are you indexing during the time you see the deleted docs not changing? Because this is very strange. Theoretically, if you reindex everything, that should result in segments that have _no_ live docs in them and they should really disappear ASAP. One way to work around this if we determine t

Re: Optimize SolrCloud without downtime

2015-03-30 Thread Pavel Hladik
Hi, thanks for reply. We have a lot of deleted docs cause we have to reindex all records from time to time, changing some important parameters.. When we do update, it means create and delete. Our deleted docs do not disappear by merging segments. I see our deleted docs are almost the same number

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson
bq: It does NOT optimize multiple replicas or shards in parallel. This behavior was changed in 4.10 though, see: https://issues.apache.org/jira/browse/SOLR-6264 So with 5.0 Pavel is seeing the result of that JIRA I bet. I have to agree with Shawn, the optimization step should proceed invisibly

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Shawn Heisey
On 3/25/2015 9:08 AM, pavelhladik wrote: > Our data are changing frequently so that's why so many deletedDocs. > Optimized core takes around 50GB on disk, we are now almost on 100GB and I'm > looking for best solution howto optimize this huge core without downtime. I > know optimization working in

Re: Optimize SolrCloud without downtime

2015-03-25 Thread Erick Erickson
That's a high number of deleted documents as a percentage of your index! Or at least I find those numbers surprising. When segments are merged in the background during normal indexing, quite a bit of weight is given to segments that have a high percentage of deleted docs. I usually see at most 10-2

Re: Optimize during indexing

2014-11-21 Thread Bryan Bende
When I've run an optimize with Solr 4.8.1 (by clicking optimize from the collection overview in the admin ui) it goes replica by replica, so it is never doing more than one shard or replica at the same time. It also significantly slows down operations that hit the replica being optimized. I've see

Re: Optimize during indexing

2014-11-21 Thread Erick Erickson
bq: if I can optimize one shard at a time Not sure. Try putting &distrib=false on the URL, but I don't know for sure whether that'd work or not. If this works at all, it'll work on one _replica_ at a time, not shard. Bu why would you want to? Each optimization is local and runs in the background

Re: Optimize during indexing

2014-11-21 Thread Yago Riveiro
It’s the "Deleted Docs” metric in the statistic core. I now that eventually the merges will expunge this deletes but I will run out of space soon and I want to know the _real_ space that I have. Actually I have space enough (about 3.5x the size of the index) to do the optimize.  Other

Re: Optimize during indexing

2014-11-21 Thread Erick Erickson
Yes, should be no problem. Although this should be happening automatically, the percentage of documents in a segment weighs quite heavily when the decision is made to merge segments in the background. You say you have "millions of deletes". Is this the difference between numDocs and maxDoc on the

Re: optimize and .nfsXXXX files

2014-08-18 Thread Michael McCandless
Soft commit (i.e. opening a new IndexReader in Lucene and closing the old one) should make those go away? The .nfsX files are created when a file is deleted but a local process (in this case, the current Lucene IndexReader) still has the file open. Mike McCandless http://blog.mikemccandless.

Re: Optimize Index in solr 4.6

2014-02-12 Thread Shawn Heisey
On 2/6/2014 4:00 AM, Shawn Heisey wrote: I would not recommend it, but if you know for sure that your infrastructure can handle it, then you should be able to optimize them all at once by sending parallel optimize requests with distrib=false directly to the Solr cores that hold the shard replicas

Re: Optimize Index in solr 4.6

2014-02-06 Thread Shawn Heisey
On 2/5/2014 11:20 PM, Sesha Sendhil Subramanian wrote: > I am running solr cloud with 10 shards. I do a batch indexing once everyday > and once indexing is done I call optimize. > > I see that optimize happens on each shard one at a time and not in > parallel. Is it possible for the optimize to ha

Re: Optimize and replication: some questions battery.

2014-02-06 Thread Luis Cappa Banda
Hi Toke! Thanks for answering. That's it: I talk about index corruption just to prevent, not because I have already noticed it. During some tests in the past I checked that a mergeFactor of 2 improves more than a little bit search speed instead common merge factors such as 10, for example. Of cour

Re: Optimize and replication: some questions battery.

2014-02-06 Thread Toke Eskildsen
On Thu, 2014-02-06 at 10:22 +0100, Luis Cappa Banda wrote: > I knew some performance tips to improve search and I configured a very > low merge factor (2) to boost search > operations instead of indexation ones. That would give you a small search speed increase and a huge penalty on indexing speed

Re: Optimize and replication: some questions battery.

2014-02-06 Thread Luis Cappa Banda
Hi Chris, Thank you very much for your response! It was very instructive. I knew some performance tips to improve search and I configured a very low merge factor (2) to boost search operations instead of indexation ones. I haven't got a deep knowledge of internal Lucene behavior in this case, but

Re: Optimize and replication: some questions battery.

2014-02-05 Thread Chris Hostetter
: I've got an scenario where I index very frequently on master servers and : replicate to slave servers with one minute polling. Master indexes are : growing fast and I would like to optimize indexes to improve search : queries. However... For a scenerio where your index is changing that rapidly,

Re: Optimize

2014-01-17 Thread Otis Gospodnetic
If true, I think it is a bug. I think some people rely on optimize not being dumb about this. Otis Solr & ElasticSearch Support http://sematext.com/ On Jan 17, 2014 2:17 PM, "William Bell" wrote: > If I optimize and the core is already optimized, shouldn't it return > immediately? At least that

Re: "optimize" index : impact on performance

2013-08-05 Thread Chris Hostetter
: Subject: "optimize" index : impact on performance : References: <1375381044900-4082026.p...@n3.nabble.com> : In-Reply-To: <1375381044900-4082026.p...@n3.nabble.com> https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing li

Re: "optimize" index : impact on performance [Republished]

2013-08-05 Thread Anca Kopetz
Hi, We already did some benchmarks during optimize and we haven't noticed a big impact on overall performance of search. The benchmarks' results were almost the same with vs. without running optimization. We have enough free RAM for the two OS disk caches during optimize (15 GB represents the

Re: "optimize" index : impact on performance

2013-08-02 Thread Shawn Heisey
On 8/2/2013 8:13 AM, Anca Kopetz wrote: Then we optimized the index to 1 segment / 0 deleted docs and we got +40% of QPS compared to the previous test. Therefore we thought of optimizing the index every two hours, as our index is evolving due to frequent commits (every 30 minutes) and thus the p

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless
Unfortunately I really don't know ;) Every time I set forth to figure things like this out I seem to learn some new way... Maybe someone else knows? Mike McCandless http://blog.mikemccandless.com On Thu, Sep 22, 2011 at 2:15 PM, Shawn Heisey wrote: > Michael, > > What is the best central plac

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Shawn Heisey
Michael, What is the best central place on an rpm-based distro (CentOS 6 in my case) to raise the vmem limit for specific user(s), assuming it's not already correct? I'm using /etc/security/limits.conf to raise the open file limit for the user that runs Solr: ncindex hardnofile

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless
OK, excellent. Thanks for bringing closure, Mike McCandless http://blog.mikemccandless.com On Thu, Sep 22, 2011 at 9:00 AM, Ralf Matulat wrote: > Dear Mike, > thanks for your your reply. > Just a couple of minutes we found a solution or - to be honest - where we > went wrong. > Our failure was

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Ralf Matulat
Dear Mike, thanks for your your reply. Just a couple of minutes we found a solution or - to be honest - where we went wrong. Our failure was the use of ulimit. We missed, that ulimit sets the vmem for each shell seperatly. So we set 'ulimit -v unlimited' on a shell, thinking that we've done the

Re: Optimize fails with OutOfMemory Exception - sun.nio.ch.FileChannelImpl.map involved

2011-09-22 Thread Michael McCandless
Are you sure you are using a 64 bit JVM? Are you sure you really changed your vmem limit to unlimited? That should have resolved the OOME from mmap. Or: can you run "cat /proc/sys/vm/max_map_count"? This is a limit on the total number of maps in a single process, that Linux imposes. But the de

Re: Optimize concern in Solr 3.2

2011-09-02 Thread Pawan Darira
Thanks for the guidance But it could not work out. Although m reading the link provided by you, but can it be due to write.lock file being created in "/index/" directory. Please suggest - pawan On Fri, Sep 2, 2011 at 6:34 PM, Michael Ryan wrote: > > I have recently upgraded from Solr 1.4 to S

RE: Optimize concern in Solr 3.2

2011-09-02 Thread Michael Ryan
> I have recently upgraded from Solr 1.4 to Solr 3.2. In Solr 1.4 only 3 > files (one .cfs & two segments) file were made in *index/* directory. > (after > doing optimize). > > Now, in Solr 3.2, the optimize seems not be working. My final number of > files in *index/* directory are in 7-8 in numb

Re: Optimize requires 50% more disk space when there are exactly 20 segments

2011-08-24 Thread Lance Norskog
Which Solr version do you have? In 3.x and trunk, Tiered and BalancedSegment are there for exactly this reason. In Solr 1.4, your only trick is to do a partial optimize with maxSegments. This lets you say "optimize until there are 15 segments, then stop". Do this with smaller and smaller numbers.

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Shawn Heisey
On 6/21/2011 9:09 AM, Robert Muir wrote: the problem is that before https://issues.apache.org/jira/browse/SOLR-2567, Solr invoked the TieredMergePolicy "setters" *before* it tried to apply these 'global' mergeFactor etc params. So, even if you set them explicitly inside the, they would then get

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Robert Muir
the problem is that before https://issues.apache.org/jira/browse/SOLR-2567, Solr invoked the TieredMergePolicy "setters" *before* it tried to apply these 'global' mergeFactor etc params. So, even if you set them explicitly inside the , they would then get clobbered by these 'global' params / defau

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Michael McCandless
On Tue, Jun 21, 2011 at 9:42 AM, Shawn Heisey wrote: > On 6/20/2011 12:31 PM, Michael McCandless wrote: >> >> For back-compat, mergeFactor maps to both of these, but it's better to >> set them directly eg: >> >>     >>       10 >>       20 >>     >> >> (and then remove your mergeFactor setting u

Re: Optimize taking two steps and extra disk space

2011-06-21 Thread Shawn Heisey
On 6/20/2011 12:31 PM, Michael McCandless wrote: For back-compat, mergeFactor maps to both of these, but it's better to set them directly eg: 10 20 (and then remove your mergeFactor setting under indexDefaults) When I did this and ran a reindex, it merged once it rea

  1   2   >