date:20121229

Re: Frequent OOM - (Unknown source in logs).

2012-12-29 Thread Shawn Heisey


On 12/28/2012 10:34 PM, Otis Gospodnetic wrote:

Hi,

I'm not sure what that autoCommit with 0 values does.  Does it effectively
disable autocommits?  I hope so, else this may be a problem.


Otis,

I have 0 in my config for autocommit on my production 3.5.0 servers, and 
have had since 1.4.0.  I suggested to this user on irc (#solr) that they 
try those values, with the idea that commits during indexing might be 
consuming additional memory.  Setting the autocommit values to zero 
appears to disable autocommit.


I've been unable to figure out what shreejay's problem might be.  I'm 
not running into this problem.


Making an assumption here about gender, apologies if it's the wrong one: 
He's giving a lot more memory (12GB) to Solr than I am (8GB).  His 
shards (36-40GB) are larger than mine (17GB), but I have all six of my 
large shards on the same dev server / JVM, so in effect I have larger 
indexes.  The one big difference that I can see is that he's using 
SolrCloud on 4.0, and I'm on recent 4.1 snapshots without SolrCloud.


I have not asked many questions about query patterns, which might lead 
to big memory usage in FieldCache.


Thanks,
Shawn

Re: ZooKeeper ensemble behind load balancer

2012-12-29 Thread Upayavira

I would suggest asking this on the zookeeper user list.

And let us know here what you find out, I'd be interested.

Note, zookeeper, as I understand it, uses its own protocol, so to some
reasonable extent it probablmy depends on yr load balancer. Also, as I
understand it, zookeeper maintains active connections to solr hosts,
which is not a common scenario for load balances as I understand it.

Upayavira

On Fri, Dec 28, 2012, at 04:39 PM, Marcin Rzewucki wrote:
> Hi,
> 
> Does Solr need connection to all of hosts in ZK ensemble or only to one
> of
> them at a time ? I wonder if it is possible to use load balancer for ZK
> ensemble and use only one address as zkHost for Solr ? Having load
> balancer
> makes it easier to change ZK hosts while still using same address by Solr
> (no need to restart Solr or change its configuration).
> 
> Thanks in advance.
> Regards.

Re: Frequent OOM - (Unknown source in logs).

2012-12-29 Thread Jack Krupansky

The code (4.x) suggests that an autoCommit of 0 or negative or not present 
in the config disables autoCommit, but time and document count-based commit 
are independent:


protected UpdateHandlerInfo loadUpdatehandlerInfo() {
 return new UpdateHandlerInfo(get("updateHandler/@class",null),
 getInt("updateHandler/autoCommit/maxDocs",-1),
 getInt("updateHandler/autoCommit/maxTime",-1),
 getBool("updateHandler/autoCommit/openSearcher",true),
 getInt("updateHandler/commitIntervalLowerBound",-1),
 getInt("updateHandler/autoSoftCommit/maxDocs",-1),
 getInt("updateHandler/autoSoftCommit/maxTime",-1));
}
...
private void _scheduleCommitWithinIfNeeded(long commitWithin) {
 long ctime = (commitWithin > 0) ? commitWithin : timeUpperBound;

 if (ctime > 0) {
   _scheduleCommitWithin(ctime);
 }
}

private void _scheduleCommitWithin(long commitMaxTime) {
 if (commitMaxTime <= 0) return;
...
public void addedDocument(int commitWithin) {
 // maxDocs-triggered autoCommit.  Use == instead of > so we only trigger 
once on the way up

 if (docsUpperBound > 0) {
...
public String toString() {
 if (timeUpperBound > 0 || docsUpperBound > 0) {
   return (timeUpperBound > 0 ? ("if uncommited for " + timeUpperBound + 
"ms; ")

   : "")
   + (docsUpperBound > 0 ? ("if " + docsUpperBound + " uncommited docs 
")

   : "");

 } else {
   return "disabled";
 }
}
(Same code is used for autoSoftCommit as well.)
...
public NamedList getStatistics() {
 NamedList lst = new SimpleOrderedMap();
 lst.add("commits", commitCommands.get());
 if (commitTracker.getDocsUpperBound() > 0) {
   lst.add("autocommit maxDocs", commitTracker.getDocsUpperBound());
 }
 if (commitTracker.getTimeUpperBound() > 0) {
   lst.add("autocommit maxTime", "" + commitTracker.getTimeUpperBound() + 
"ms");

 }
 lst.add("autocommits", commitTracker.getCommitCount());
 if (softCommitTracker.getDocsUpperBound() > 0) {
   lst.add("soft autocommit maxDocs", 
softCommitTracker.getDocsUpperBound());

 }
 if (softCommitTracker.getTimeUpperBound() > 0) {
   lst.add("soft autocommit maxTime", "" + 
softCommitTracker.getTimeUpperBound() + "ms");

 }
...

But, I would note that there is no formal JavaDoc contract for this 
behavior, so it is not guaranteed and could be changed in the future.


-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Saturday, December 29, 2012 12:34 AM
To: solr-user@lucene.apache.org
Subject: Re: Frequent OOM - (Unknown source in logs).

Hi,

I'm not sure what that autoCommit with 0 values does.  Does it effectively
disable autocommits?  I hope so, else this may be a problem.

How large are your Solr caches?
What sort of fields do you filter and facet on?
How big is your index in terms of # of docs?

Otis
--
Solr & ElasticSearch Support
http://sematext.com/




On Fri, Dec 28, 2012 at 12:50 PM, shreejay  wrote:


Hi Otis,

Following is the setup:

6 Solr individual servers (VMs) running on Jetty.
3 Shards. Each shard with a leader and replica.
*Solr Version *: /Solr 4.0 (with a patch from Solr-2592)./
*OS*: /CentOS release 5.8 (Final)/
*Java*:
/java version "1.6.0_32"
Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) /

*Memory*: /4 servers have 32 GB, 2 have 30 GB. /
*Disk space*: /500 GB on each server. /

*Queries*:
Usual select queries with upto 6 filters.
facets on around 8 fields. (returning only top 20) .

*Java options while starting the server:*
/JAVA_OPTIONS="-Xms15360m -Xmx15360m -DSTOP.PORT=1234 -DSTOP.KEY=
-XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:+UseCompressedOops
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/ABC/LOGFOLDER
-XX:-TraceClassUnloading
-Dbootstrap_confdir=./solr/collection123/conf
-Dcollection.configName=123conf
-DzkHost=ZooKeeper001:,ZooKeeper002:,SGAZZooKeeper003:
-DnumShards=3 -jar start.jar"
LOG_FILE="/ABC/LOGFOLDER/solrlogfile.log"
/

I run a *commit* using a curl command every 30 mins using a cron job.
/curl --silent

http://11.111.111.111:1234/solr/collection123/update/?commit=true&openSearcher=false/

In my SolrConfig file I have these *Commit settings*:
/updateHandler class="solr.DirectUpdateHandler2">


0
0


 0


false
false


  ${solr.data.dir:}


  

/


Please let me know if you would like more information. I am Not indexing
any
documents right now and I again got a OOM around an hour back one one of
the
nodes. Lets call it Node1. The node is in "recovery" right now. and keeps
erroring with this message:
/SEVERE: Error while trying to
recover:org.apache.solr.common.SolrException:
Server at
http://NODE2:8983/solr/collection1 returned non ok status:500,
message:Server Error
/
Although its still showing as "recovering" it is serving queries according
to the log file.
The other instance in this shard became the leader and is up and ru

Re: rogue values in schema browser histogram

2012-12-29 Thread Erick Erickson

It's a bit confusing. It's entirely normal to see terms in your index when
you do low-level term walking that you didn't put there for trie fields. I
think of it as meta-data for navigational purposes.

No JIRA that I know of, the UI is reporting terms actually in your index
albiet ones that arguably shouldn't be shown. I don't know if there's  away
to filter these out or not...

Best
Erick

On Fri, Dec 28, 2012 at 5:17 PM, jmlucjav  wrote:

> Hi,
>
> I have an index where schema browser histogram reports some terms that I
> never indexed. When you run a query to get those terms you get of course
> none. I optimized the index and same issue. The field is a TrieIntField.
>
> I think I might have seen some post about this (or a similar) issue but did
> not find any in jira, anyone can direct me to some ticket?
>
> I am in solr4.0
> regards
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/rogue-values-in-schema-browser-histogram-tp4029510.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Viewing the Solr MoinMoin wiki offline

2012-12-29 Thread Otis Gospodnetic

Hi,

You can easily crawl it with wget to get a local copy.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 29, 2012 4:54 PM, "d_k"  wrote:

> Hello,
>
> I'm setting up Solr inside an intranet without an internet access and
> I was wondering if there is a way to obtain the data dump of the Solr
> Wiki (http://wiki.apache.org/solr/) for offline viewing and searching.
>
> I understand MoinMoin has an export feature one can use
> (http://moinmo.in/MoinDump and
> http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs
> to be executed from within the MoinMoin server.
>
> Is there a way to obtain the result of that command?
> Is there another way to view the solr wiki offline?
>

Re: Frequent OOM - (Unknown source in logs).

2012-12-29 Thread shreejay

Otis,

As of now I have disabled caches. And we are hardly running any queries at
this point. I filter mostly on string fields and two int fields, 2 dates
(one is a dynamic date field) and one dynamic string field. Same goes for
faceting also, except I do not use facets on the dynamic field.

In terms of documents , my index is around 5.5 million.

I am not trying to rule out the possibility of queries causing this, but I
think its highly unlikely. The cluster is being used only by a select few
and I have noticed this behaviour generally during indexing (which happens
mostly during the night).

The Autocommit with 0 values seems to be working. I am constantly monitoring
the logs and I can see commits happening only every 30 and 60 min intervals.
These commits are run by a cron job.

Shawn,

Your assumption is right! Thanks for the tips on Solconfig changes. They
have been working good most of the times. I am assuming these issues come up
after a couple of million of documents are indexed. During the initial
indexing (upto 3-4 million docs) everything seems to be fine. I am going to
try the recent nightly build apache-solr-4.1-2012-12-28_12-29-23 now. Lets
hope using 4.1 fixes these issues. I have already started indexing my
documents on a 3.6.2 Solr box as a backup. I am also going to reduce the JVM
heap size and experiment between 8 - 10 GB, since i think some of the ZK
connection issues were happening due to longer GC pauses.

Thanks Jack for verifying it in the code.

--Shreejay

--
View this message in context:
http://lucene.472066.n3.nabble.com/Frequent-OOM-Unknown-source-in-logs-tp4029361p4029657.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Viewing the Solr MoinMoin wiki offline

2012-12-29 Thread Alexandre Rafalovitch

Should that be setup as a public service then (like Wikipedia dump)?
Because I need one too and I don't think it is a good idea for DDOSing Wiki
with crawlers. And I bet, there will be some 'challenges' during scraping.

Regards,
Alex.
P.s. In fact, it would make an interesting example to have an offline copy
with Solr index, etc.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> You can easily crawl it with wget to get a local copy.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 29, 2012 4:54 PM, "d_k"  wrote:
>
> > Hello,
> >
> > I'm setting up Solr inside an intranet without an internet access and
> > I was wondering if there is a way to obtain the data dump of the Solr
> > Wiki (http://wiki.apache.org/solr/) for offline viewing and searching.
> >
> > I understand MoinMoin has an export feature one can use
> > (http://moinmo.in/MoinDump and
> > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs
> > to be executed from within the MoinMoin server.
> >
> > Is there a way to obtain the result of that command?
> > Is there another way to view the solr wiki offline?
> >
>

Re: Viewing the Solr MoinMoin wiki offline

2012-12-29 Thread Otis Gospodnetic

I'd take it to Infra, although I think demand for this is so low...

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch"  wrote:

> Should that be setup as a public service then (like Wikipedia dump)?
> Because I need one too and I don't think it is a good idea for DDOSing Wiki
> with crawlers. And I bet, there will be some 'challenges' during scraping.
>
> Regards,
> Alex.
> P.s. In fact, it would make an interesting example to have an offline copy
> with Solr index, etc.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > You can easily crawl it with wget to get a local copy.
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Dec 29, 2012 4:54 PM, "d_k"  wrote:
> >
> > > Hello,
> > >
> > > I'm setting up Solr inside an intranet without an internet access and
> > > I was wondering if there is a way to obtain the data dump of the Solr
> > > Wiki (http://wiki.apache.org/solr/) for offline viewing and searching.
> > >
> > > I understand MoinMoin has an export feature one can use
> > > (http://moinmo.in/MoinDump and
> > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it needs
> > > to be executed from within the MoinMoin server.
> > >
> > > Is there a way to obtain the result of that command?
> > > Is there another way to view the solr wiki offline?
> > >
> >
>

Re: Viewing the Solr MoinMoin wiki offline

2012-12-29 Thread Alexandre Rafalovitch

Sorry,

What's Infra? A mailing list? Demand is probably low for Solr, but may be
sufficient for all Apache's individual projects. I guess one way to check
is too see in Apache logs if there is a lot of scrapers running (by user
agents).

Anyway, for Solr specifically, an acceptable substitute could be the manual
version from Lucid Imagination:
http://lucidworks.lucidimagination.com/display/home/PDF+Versions

Regards,
   Alex.
P.s. I am getting a feeling that Lucid (and other commercial company)
people are not allowed to mention their products on this list.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Sun, Dec 30, 2012 at 12:17 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> I'd take it to Infra, although I think demand for this is so low...
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch" 
> wrote:
>
> > Should that be setup as a public service then (like Wikipedia dump)?
> > Because I need one too and I don't think it is a good idea for DDOSing
> Wiki
> > with crawlers. And I bet, there will be some 'challenges' during
> scraping.
> >
> > Regards,
> > Alex.
> > P.s. In fact, it would make an interesting example to have an offline
> copy
> > with Solr index, etc.
> >
> > Personal blog: http://blog.outerthoughts.com/
> > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > - Time is the quality of nature that keeps events from happening all at
> > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> >
> >
> > On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic <
> > otis.gospodne...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > You can easily crawl it with wget to get a local copy.
> > >
> > > Otis
> > > Solr & ElasticSearch Support
> > > http://sematext.com/
> > > On Dec 29, 2012 4:54 PM, "d_k"  wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm setting up Solr inside an intranet without an internet access and
> > > > I was wondering if there is a way to obtain the data dump of the Solr
> > > > Wiki (http://wiki.apache.org/solr/) for offline viewing and
> searching.
> > > >
> > > > I understand MoinMoin has an export feature one can use
> > > > (http://moinmo.in/MoinDump and
> > > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it
> needs
> > > > to be executed from within the MoinMoin server.
> > > >
> > > > Is there a way to obtain the result of that command?
> > > > Is there another way to view the solr wiki offline?
> > > >
> > >
> >
>

Re: ZooKeeper ensemble behind load balancer

2012-12-29 Thread Anirudha Jadhav

A zookeeper ensemble should be a fairly reliable, large enough no.of
machines(3+ typically 5,7,9) for a quorum.
So adding a load balancer on top will just add a hop and
decrease performance, and also add a failure point in the system.

that being said there needs to be a way to provide solr with a way to
refresh conf. without restart.

Solr takes a list of zk hosts on startup, If i am correct , uses one of
them unless it fails or round robins.

why do your zkhosts need to change a lot?

On Sat, Dec 29, 2012 at 10:58 AM, Upayavira  wrote:

> I would suggest asking this on the zookeeper user list.
>
> And let us know here what you find out, I'd be interested.
>
> Note, zookeeper, as I understand it, uses its own protocol, so to some
> reasonable extent it probablmy depends on yr load balancer. Also, as I
> understand it, zookeeper maintains active connections to solr hosts,
> which is not a common scenario for load balances as I understand it.
>
> Upayavira
>
> On Fri, Dec 28, 2012, at 04:39 PM, Marcin Rzewucki wrote:
> > Hi,
> >
> > Does Solr need connection to all of hosts in ZK ensemble or only to one
> > of
> > them at a time ? I wonder if it is possible to use load balancer for ZK
> > ensemble and use only one address as zkHost for Solr ? Having load
> > balancer
> > makes it easier to change ZK hosts while still using same address by Solr
> > (no need to restart Solr or change its configuration).
> >
> > Thanks in advance.
> > Regards.
>

-- 
Anirudha P. Jadhav

Re: Viewing the Solr MoinMoin wiki offline

2012-12-29 Thread Otis Gospodnetic

Hi,

Sorry, by infra I meant ASF infrastructure people. There's a mailing list
and a JIRA project for infra stuff.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 29, 2012 8:45 PM, "Alexandre Rafalovitch"  wrote:

> Sorry,
>
> What's Infra? A mailing list? Demand is probably low for Solr, but may be
> sufficient for all Apache's individual projects. I guess one way to check
> is too see in Apache logs if there is a lot of scrapers running (by user
> agents).
>
> Anyway, for Solr specifically, an acceptable substitute could be the manual
> version from Lucid Imagination:
> http://lucidworks.lucidimagination.com/display/home/PDF+Versions
>
> Regards,
>Alex.
> P.s. I am getting a feeling that Lucid (and other commercial company)
> people are not allowed to mention their products on this list.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Sun, Dec 30, 2012 at 12:17 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > I'd take it to Infra, although I think demand for this is so low...
> >
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Dec 29, 2012 8:14 PM, "Alexandre Rafalovitch" 
> > wrote:
> >
> > > Should that be setup as a public service then (like Wikipedia dump)?
> > > Because I need one too and I don't think it is a good idea for DDOSing
> > Wiki
> > > with crawlers. And I bet, there will be some 'challenges' during
> > scraping.
> > >
> > > Regards,
> > > Alex.
> > > P.s. In fact, it would make an interesting example to have an offline
> > copy
> > > with Solr index, etc.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > > On Sun, Dec 30, 2012 at 9:15 AM, Otis Gospodnetic <
> > > otis.gospodne...@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > You can easily crawl it with wget to get a local copy.
> > > >
> > > > Otis
> > > > Solr & ElasticSearch Support
> > > > http://sematext.com/
> > > > On Dec 29, 2012 4:54 PM, "d_k"  wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm setting up Solr inside an intranet without an internet access
> and
> > > > > I was wondering if there is a way to obtain the data dump of the
> Solr
> > > > > Wiki (http://wiki.apache.org/solr/) for offline viewing and
> > searching.
> > > > >
> > > > > I understand MoinMoin has an export feature one can use
> > > > > (http://moinmo.in/MoinDump and
> > > > > http://moinmo.in/HelpOnMoinCommand/ExportDump) but i'm afraid it
> > needs
> > > > > to be executed from within the MoinMoin server.
> > > > >
> > > > > Is there a way to obtain the result of that command?
> > > > > Is there another way to view the solr wiki offline?
> > > > >
> > > >
> > >
> >
>

Re: Frequent OOM - (Unknown source in logs).

Re: ZooKeeper ensemble behind load balancer

Re: Frequent OOM - (Unknown source in logs).

Re: rogue values in schema browser histogram

Re: Viewing the Solr MoinMoin wiki offline

Re: Frequent OOM - (Unknown source in logs).

Re: Viewing the Solr MoinMoin wiki offline

Re: Viewing the Solr MoinMoin wiki offline

Re: Viewing the Solr MoinMoin wiki offline

Re: ZooKeeper ensemble behind load balancer

Re: Viewing the Solr MoinMoin wiki offline

11 matches

Site Navigation

Mail list logo

Footer information