Re: Frequent OOM - (Unknown source in logs).
On 12/28/2012 10:34 PM, Otis Gospodnetic wrote: Hi, I'm not sure what that autoCommit with 0 values does. Does it effectively disable autocommits? I hope so, else this may be a problem. Otis, I have 0 in my config for autocommit on my production 3.5.0 servers, and have had since 1.4.0. I suggested to this user on irc (#solr) that they try those values, with the idea that commits during indexing might be consuming additional memory. Setting the autocommit values to zero appears to disable autocommit. I've been unable to figure out what shreejay's problem might be. I'm not running into this problem. Making an assumption here about gender, apologies if it's the wrong one: He's giving a lot more memory (12GB) to Solr than I am (8GB). His shards (36-40GB) are larger than mine (17GB), but I have all six of my large shards on the same dev server / JVM, so in effect I have larger indexes. The one big difference that I can see is that he's using SolrCloud on 4.0, and I'm on recent 4.1 snapshots without SolrCloud. I have not asked many questions about query patterns, which might lead to big memory usage in FieldCache. Thanks, Shawn
Re: ZooKeeper ensemble behind load balancer
I would suggest asking this on the zookeeper user list. And let us know here what you find out, I'd be interested. Note, zookeeper, as I understand it, uses its own protocol, so to some reasonable extent it probablmy depends on yr load balancer. Also, as I understand it, zookeeper maintains active connections to solr hosts, which is not a common scenario for load balances as I understand it. Upayavira On Fri, Dec 28, 2012, at 04:39 PM, Marcin Rzewucki wrote: > Hi, > > Does Solr need connection to all of hosts in ZK ensemble or only to one > of > them at a time ? I wonder if it is possible to use load balancer for ZK > ensemble and use only one address as zkHost for Solr ? Having load > balancer > makes it easier to change ZK hosts while still using same address by Solr > (no need to restart Solr or change its configuration). > > Thanks in advance. > Regards.
Re: Frequent OOM - (Unknown source in logs).
The code (4.x) suggests that an autoCommit of 0 or negative or not present in the config disables autoCommit, but time and document count-based commit are independent: protected UpdateHandlerInfo loadUpdatehandlerInfo() { return new UpdateHandlerInfo(get("updateHandler/@class",null), getInt("updateHandler/autoCommit/maxDocs",-1), getInt("updateHandler/autoCommit/maxTime",-1), getBool("updateHandler/autoCommit/openSearcher",true), getInt("updateHandler/commitIntervalLowerBound",-1), getInt("updateHandler/autoSoftCommit/maxDocs",-1), getInt("updateHandler/autoSoftCommit/maxTime",-1)); } ... private void _scheduleCommitWithinIfNeeded(long commitWithin) { long ctime = (commitWithin > 0) ? commitWithin : timeUpperBound; if (ctime > 0) { _scheduleCommitWithin(ctime); } } private void _scheduleCommitWithin(long commitMaxTime) { if (commitMaxTime <= 0) return; ... public void addedDocument(int commitWithin) { // maxDocs-triggered autoCommit. Use == instead of > so we only trigger once on the way up if (docsUpperBound > 0) { ... public String toString() { if (timeUpperBound > 0 || docsUpperBound > 0) { return (timeUpperBound > 0 ? ("if uncommited for " + timeUpperBound + "ms; ") : "") + (docsUpperBound > 0 ? ("if " + docsUpperBound + " uncommited docs ") : ""); } else { return "disabled"; } } (Same code is used for autoSoftCommit as well.) ... public NamedList getStatistics() { NamedList lst = new SimpleOrderedMap(); lst.add("commits", commitCommands.get()); if (commitTracker.getDocsUpperBound() > 0) { lst.add("autocommit maxDocs", commitTracker.getDocsUpperBound()); } if (commitTracker.getTimeUpperBound() > 0) { lst.add("autocommit maxTime", "" + commitTracker.getTimeUpperBound() + "ms"); } lst.add("autocommits", commitTracker.getCommitCount()); if (softCommitTracker.getDocsUpperBound() > 0) { lst.add("soft autocommit maxDocs", softCommitTracker.getDocsUpperBound()); } if (softCommitTracker.getTimeUpperBound() > 0) { lst.add("soft autocommit maxTime", "" + softCommitTracker.getTimeUpperBound() + "ms"); } ... But, I would note that there is no formal JavaDoc contract for this behavior, so it is not guaranteed and could be changed in the future. -- Jack Krupansky -Original Message- From: Otis Gospodnetic Sent: Saturday, December 29, 2012 12:34 AM To: solr-user@lucene.apache.org Subject: Re: Frequent OOM - (Unknown source in logs). Hi, I'm not sure what that autoCommit with 0 values does. Does it effectively disable autocommits? I hope so, else this may be a problem. How large are your Solr caches? What sort of fields do you filter and facet on? How big is your index in terms of # of docs? Otis -- Solr & ElasticSearch Support http://sematext.com/ On Fri, Dec 28, 2012 at 12:50 PM, shreejay wrote: Hi Otis, Following is the setup: 6 Solr individual servers (VMs) running on Jetty. 3 Shards. Each shard with a leader and replica. *Solr Version *: /Solr 4.0 (with a patch from Solr-2592)./ *OS*: /CentOS release 5.8 (Final)/ *Java*: /java version "1.6.0_32" Java(TM) SE Runtime Environment (build 1.6.0_32-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) / *Memory*: /4 servers have 32 GB, 2 have 30 GB. / *Disk space*: /500 GB on each server. / *Queries*: Usual select queries with upto 6 filters. facets on around 8 fields. (returning only top 20) . *Java options while starting the server:* /JAVA_OPTIONS="-Xms15360m -Xmx15360m -DSTOP.PORT=1234 -DSTOP.KEY= -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCompressedOops -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/ABC/LOGFOLDER -XX:-TraceClassUnloading -Dbootstrap_confdir=./solr/collection123/conf -Dcollection.configName=123conf -DzkHost=ZooKeeper001:,ZooKeeper002:,SGAZZooKeeper003: -DnumShards=3 -jar start.jar" LOG_FILE="/ABC/LOGFOLDER/solrlogfile.log" / I run a *commit* using a curl command every 30 mins using a cron job. /curl --silent http://11.111.111.111:1234/solr/collection123/update/?commit=true&openSearcher=false/ In my SolrConfig file I have these *Commit settings*: /updateHandler class="solr.DirectUpdateHandler2"> 0 0 0 false false ${solr.data.dir:} / Please let me know if you would like more information. I am Not indexing any documents right now and I again got a OOM around an hour back one one of the nodes. Lets call it Node1. The node is in "recovery" right now. and keeps erroring with this message: /SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Server at http://NODE2:8983/solr/collection1 returned non ok status:500, message:Server Error / Although its still showing as "recovering" it is serving queries according to the log file. The other instance in this shard became the leader and is up and ru
Re: rogue values in schema browser histogram
It's a bit confusing. It's entirely normal to see terms in your index when you do low-level term walking that you didn't put there for trie fields. I think of it as meta-data for navigational purposes. No JIRA that I know of, the UI is reporting terms actually in your index albiet ones that arguably shouldn't be shown. I don't know if there's away to filter these out or not... Best Erick On Fri, Dec 28, 2012 at 5:17 PM, jmlucjav wrote: > Hi, > > I have an index where schema browser histogram reports some terms that I > never indexed. When you run a query to get those terms you get of course > none. I optimized the index and same issue. The field is a TrieIntField. > > I think I might have seen some post about this (or a similar) issue but did > not find any in jira, anyone can direct me to some ticket? > > I am in solr4.0 > regards > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/rogue-values-in-schema-browser-histogram-tp4029510.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Viewing the Solr MoinMoin wiki offline
Hi, You can easily crawl it with wget to get a local copy. Otis Solr & ElasticSearch Support http://sematext.com/ On Dec 29, 2012 4:54 PM, "d_k" wrote: > Hello, > > I'm setting up Solr inside an intranet without an internet access and > I was wondering if there is a way to obtain the data dump of the Solr > Wiki (http://wiki.apache.org/solr/) for offline viewing and searching. > > I understand MoinMoin has an export feature one can use > (http://moinmo.in/MoinDump and > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs > to be executed from within the MoinMoin server. > > Is there a way to obtain the result of that command? > Is there another way to view the solr wiki offline? >
Re: Frequent OOM - (Unknown source in logs).
Otis, As of now I have disabled caches. And we are hardly running any queries at this point. I filter mostly on string fields and two int fields, 2 dates (one is a dynamic date field) and one dynamic string field. Same goes for faceting also, except I do not use facets on the dynamic field. In terms of documents , my index is around 5.5 million. I am not trying to rule out the possibility of queries causing this, but I think its highly unlikely. The cluster is being used only by a select few and I have noticed this behaviour generally during indexing (which happens mostly during the night). The Autocommit with 0 values seems to be working. I am constantly monitoring the logs and I can see commits happening only every 30 and 60 min intervals. These commits are run by a cron job. Shawn, Your assumption is right! Thanks for the tips on Solconfig changes. They have been working good most of the times. I am assuming these issues come up after a couple of million of documents are indexed. During the initial indexing (upto 3-4 million docs) everything seems to be fine. I am going to try the recent nightly build apache-solr-4.1-2012-12-28_12-29-23 now. Lets hope using 4.1 fixes these issues. I have already started indexing my documents on a 3.6.2 Solr box as a backup. I am also going to reduce the JVM heap size and experiment between 8 - 10 GB, since i think some of the ZK connection issues were happening due to longer GC pauses. Thanks Jack for verifying it in the code. --Shreejay -- View this message in context: http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4029657.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Viewing the Solr MoinMoin wiki offline
Should that be setup as a public service then (like Wikipedia dump)? Because I need one too and I don't think it is a good idea for DDOSing Wiki with crawlers. And I bet, there will be some 'challenges' during scraping. Regards, Alex. P.s. In fact, it would make an interesting example to have an offline copy with Solr index, etc. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > Hi, > > You can easily crawl it with wget to get a local copy. > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Dec 29, 2012 4:54 PM, "d_k" wrote: > > > Hello, > > > > I'm setting up Solr inside an intranet without an internet access and > > I was wondering if there is a way to obtain the data dump of the Solr > > Wiki (http://wiki.apache.org/solr/) for offline viewing and searching. > > > > I understand MoinMoin has an export feature one can use > > (http://moinmo.in/MoinDump and > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs > > to be executed from within the MoinMoin server. > > > > Is there a way to obtain the result of that command? > > Is there another way to view the solr wiki offline? > > >
Re: Viewing the Solr MoinMoin wiki offline
I'd take it to Infra, although I think demand for this is so low... Otis Solr & ElasticSearch Support http://sematext.com/ On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch" wrote: > Should that be setup as a public service then (like Wikipedia dump)? > Because I need one too and I don't think it is a good idea for DDOSing Wiki > with crawlers. And I bet, there will be some 'challenges' during scraping. > > Regards, > Alex. > P.s. In fact, it would make an interesting example to have an offline copy > with Solr index, etc. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > Hi, > > > > You can easily crawl it with wget to get a local copy. > > > > Otis > > Solr & ElasticSearch Support > > http://sematext.com/ > > On Dec 29, 2012 4:54 PM, "d_k" wrote: > > > > > Hello, > > > > > > I'm setting up Solr inside an intranet without an internet access and > > > I was wondering if there is a way to obtain the data dump of the Solr > > > Wiki (http://wiki.apache.org/solr/) for offline viewing and searching. > > > > > > I understand MoinMoin has an export feature one can use > > > (http://moinmo.in/MoinDump and > > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs > > > to be executed from within the MoinMoin server. > > > > > > Is there a way to obtain the result of that command? > > > Is there another way to view the solr wiki offline? > > > > > >
Re: Viewing the Solr MoinMoin wiki offline
Sorry, What's Infra? A mailing list? Demand is probably low for Solr, but may be sufficient for all Apache's individual projects. I guess one way to check is too see in Apache logs if there is a lot of scrapers running (by user agents). Anyway, for Solr specifically, an acceptable substitute could be the manual version from Lucid Imagination: http://lucidworks.lucidimagination.com/display/home/PDF+Versions Regards, Alex. P.s. I am getting a feeling that Lucid (and other commercial company) people are not allowed to mention their products on this list. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Sun, Dec 30, 2012 at 12:17 PM, Otis Gospodnetic < otis.gospodne...@gmail.com> wrote: > I'd take it to Infra, although I think demand for this is so low... > > Otis > Solr & ElasticSearch Support > http://sematext.com/ > On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch" > wrote: > > > Should that be setup as a public service then (like Wikipedia dump)? > > Because I need one too and I don't think it is a good idea for DDOSing > Wiki > > with crawlers. And I bet, there will be some 'challenges' during > scraping. > > > > Regards, > > Alex. > > P.s. In fact, it would make an interesting example to have an offline > copy > > with Solr index, etc. > > > > Personal blog: http://blog.outerthoughts.com/ > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > - Time is the quality of nature that keeps events from happening all at > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > > > > On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic < > > otis.gospodne...@gmail.com> wrote: > > > > > Hi, > > > > > > You can easily crawl it with wget to get a local copy. > > > > > > Otis > > > Solr & ElasticSearch Support > > > http://sematext.com/ > > > On Dec 29, 2012 4:54 PM, "d_k" wrote: > > > > > > > Hello, > > > > > > > > I'm setting up Solr inside an intranet without an internet access and > > > > I was wondering if there is a way to obtain the data dump of the Solr > > > > Wiki (http://wiki.apache.org/solr/) for offline viewing and > searching. > > > > > > > > I understand MoinMoin has an export feature one can use > > > > (http://moinmo.in/MoinDump and > > > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it > needs > > > > to be executed from within the MoinMoin server. > > > > > > > > Is there a way to obtain the result of that command? > > > > Is there another way to view the solr wiki offline? > > > > > > > > > >
Re: ZooKeeper ensemble behind load balancer
A zookeeper ensemble should be a fairly reliable, large enough no.of machines(3+ typically 5,7,9) for a quorum. So adding a load balancer on top will just add a hop and decrease performance, and also add a failure point in the system. that being said there needs to be a way to provide solr with a way to refresh conf. without restart. Solr takes a list of zk hosts on startup, If i am correct , uses one of them unless it fails or round robins. why do your zkhosts need to change a lot? On Sat, Dec 29, 2012 at 10:58 AM, Upayavira wrote: > I would suggest asking this on the zookeeper user list. > > And let us know here what you find out, I'd be interested. > > Note, zookeeper, as I understand it, uses its own protocol, so to some > reasonable extent it probablmy depends on yr load balancer. Also, as I > understand it, zookeeper maintains active connections to solr hosts, > which is not a common scenario for load balances as I understand it. > > Upayavira > > On Fri, Dec 28, 2012, at 04:39 PM, Marcin Rzewucki wrote: > > Hi, > > > > Does Solr need connection to all of hosts in ZK ensemble or only to one > > of > > them at a time ? I wonder if it is possible to use load balancer for ZK > > ensemble and use only one address as zkHost for Solr ? Having load > > balancer > > makes it easier to change ZK hosts while still using same address by Solr > > (no need to restart Solr or change its configuration). > > > > Thanks in advance. > > Regards. > -- Anirudha P. Jadhav
Re: Viewing the Solr MoinMoin wiki offline
Hi, Sorry, by infra I meant ASF infrastructure people. There's a mailing list and a JIRA project for infra stuff. Otis Solr & ElasticSearch Support http://sematext.com/ On Dec 29, 2012 8:45 PM, "Alexandre Rafalovitch" wrote: > Sorry, > > What's Infra? A mailing list? Demand is probably low for Solr, but may be > sufficient for all Apache's individual projects. I guess one way to check > is too see in Apache logs if there is a lot of scrapers running (by user > agents). > > Anyway, for Solr specifically, an acceptable substitute could be the manual > version from Lucid Imagination: > http://lucidworks.lucidimagination.com/display/home/PDF+Versions > > Regards, >Alex. > P.s. I am getting a feeling that Lucid (and other commercial company) > people are not allowed to mention their products on this list. > > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all at > once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) > > > On Sun, Dec 30, 2012 at 12:17 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > > > I'd take it to Infra, although I think demand for this is so low... > > > > Otis > > Solr & ElasticSearch Support > > http://sematext.com/ > > On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch" > > wrote: > > > > > Should that be setup as a public service then (like Wikipedia dump)? > > > Because I need one too and I don't think it is a good idea for DDOSing > > Wiki > > > with crawlers. And I bet, there will be some 'challenges' during > > scraping. > > > > > > Regards, > > > Alex. > > > P.s. In fact, it would make an interesting example to have an offline > > copy > > > with Solr index, etc. > > > > > > Personal blog: http://blog.outerthoughts.com/ > > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > > > - Time is the quality of nature that keeps events from happening all at > > > once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > > > > > > > On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic < > > > otis.gospodne...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > You can easily crawl it with wget to get a local copy. > > > > > > > > Otis > > > > Solr & ElasticSearch Support > > > > http://sematext.com/ > > > > On Dec 29, 2012 4:54 PM, "d_k" wrote: > > > > > > > > > Hello, > > > > > > > > > > I'm setting up Solr inside an intranet without an internet access > and > > > > > I was wondering if there is a way to obtain the data dump of the > Solr > > > > > Wiki (http://wiki.apache.org/solr/) for offline viewing and > > searching. > > > > > > > > > > I understand MoinMoin has an export feature one can use > > > > > (http://moinmo.in/MoinDump and > > > > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it > > needs > > > > > to be executed from within the MoinMoin server. > > > > > > > > > > Is there a way to obtain the result of that command? > > > > > Is there another way to view the solr wiki offline? > > > > > > > > > > > > > > >