With virtual hosting you can give CPU & memory quotas to your
different VMs. This allows you to control the Nutch v.s. The World
problem. Unforch, you cannot allocate disk channel. With two i/o bound
apps, this is a problem.

On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <e...@makethembite.com> wrote:
> Excellent information. Thank you. Solr is acting just fine then. I can
> connect to it no issues, it indexes fine and there didn't seem to be any
> complication with it. Now I can rule it out and go about solving, what you
> pointed out, and I agree, to be a java/nutch issue.
>
> Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open
> source and found on apache.org
>
> Thanks for your time.
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
> Sent: Sunday, October 31, 2010 4:33 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> What servlet container are you putting your Solr in? Jetty? Tomcat?
> Something else?  Are you fronting it with apache on top of that? (I think
> maybe you are, otherwise I'm not sure how the phrase 'virtual host'
> applies).
>
> In general, Solr of course doesn't care what directory it's in on disk, so
> long as the process running solr has the neccesary read/write permissions to
> the neccesary directories (and if it doesn't, you'd usually find out right
> away with an error message).  And clients to Solr don't care what directory
> it's in on disk either, they only care that they can get it to it connecting
> to a certain port at a certain hostname. In general, if they can't get to it
> on a certain port at a certain hostname, that's something you'd discover
> right away, not something that would be intermittent.  But I'm not familiar
> with nutch, you may want to try connecting to the port you have Solr running
> on (the hostname/port you have told nutch to find solr on?) yourself
> manually, and just make sure it is connectable.
>
> I can't think of any reason that what directory you have Solr in could cause
> CPU utilization issues. I think it's got nothing to do with that.
>
> I am not familar with nutch, if it's nutch that's taking 100% of your CPU,
> you might want to find some nutch experts to ask. Perhaps there's a nutch
> listserv?  I am also not familiar with hadoop; you mention just in passing
> that you're using hadoop too, maybe that's an added complication, I don't
> know.
>
> One obvious reason nutch could be taking 100% cpu would be simply because
> you've asked it to do a lot of work quickly, and it's trying to.
>
> One reason I have seen Solr take 100% of CPU and become responsive, is when
> the Solr process gets caught up in terrible Java garbage collection. If
> that's what's happening, then giving the Solr JVM a higher maximum heap size
> can sometimes help (although confusingly, I've seen people suggest that if
> you give the Solr JVM too MUCH heap it can also result in long GC pauses),
> and if you have a multi-core/multi-CPU machine, I've found the JVM argument
> -XX:+UseConcMarkSweepGC to be very helpful.
>
> Other than that, it sounds to me like you've got a nutch/hadoop issue, not a
> Solr issue.
> ________________________________________
> From: Eric Martin [e...@makethembite.com]
> Sent: Sunday, October 31, 2010 7:16 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr in virtual host as opposed to /lib
>
> Hi,
>
> Thank you. This is more than idle curiosity. I am trying to debug an issue I
> am having with my installation and this is one step in verifying that I have
> a setup that does not consume resources. I am trying to debunk my internal
> myth that having Solr nad Nutch in a virtual host would be causing these
> issues. Here is the main issue that involves Nutch/Solr and Drupal:
>
> /home/mootlaw/lib/solr
> /home/mootlaw/lib/nutch
> /home/mootlaw/www/<Drupal site>
>
> I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise
> Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using
> jetty for my Solr. My server is not rooted.
>
> Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm:
>
> /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs
> -Dhadoop.log.file=hadoop.log
> -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64
> -classpath
> /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil
> d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n
> utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib
> /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1.
> 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla
> w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code
> c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo
> otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h
> ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl
> aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo
> gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h
> ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/
> core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1.
> jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut
> ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j
> ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j
> akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar:
> /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc
> h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar:
> /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet
> ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw
> /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h
> ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/
> lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla
> w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom
> e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l
> ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/
> home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika
> -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l
> ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m
> ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp
> -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar
> org.apache.nutch.fetcher.Fetcher
> /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50
>
> My PIDS cannot be traced and my mem usage is at 5%
>
> My hadoop logs show:
>
> 2010-10-31 15:44:11,040 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/us-5th-circuit/1454354.html
> 2010-10-31 15:44:11,294 INFO  fetcher.Fetcher - fetching
> http://www.dallastxcriminaldefenseattorney.com/atom.xml
> 2010-10-31 15:44:11,337 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=48, fetchQueues.totalSize=2499
> 2010-10-31 15:44:12,339 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:13,341 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:14,344 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:15,346 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,349 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=50, fetchQueues.totalSize=2500
> 2010-10-31 15:44:16,568 INFO  fetcher.Fetcher - fetching
> http://caselaw.findlaw.com/il-court-of-appeals/1542438.html
> 2010-10-31 15:44:17,308 INFO  fetcher.Fetcher - fetching
> http://lcweb2.loc.gov/const/const.html
> 2010-10-31 15:44:17,352 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2499
> 2010-10-31 15:44:18,354 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:19,356 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:20,358 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> 2010-10-31 15:44:21,360 INFO  fetcher.Fetcher - -activeThreads=50,
> spinWaiting=49, fetchQueues.totalSize=2500
> Can anyone help me out? Did I miss something should i be using Tomcat? One
> interesting part of this is when I try and change the nutch setting post url
> and urls by score to 1 they stay at 10 no matter what I do.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Sunday, October 31, 2010 4:12 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr in virtual host as opposed to /lib
>
> Can you expand on your question? Are you having a problem? Is this idle
> curiosity?
>
> Because I have no idea how to respond when there is so little information.
>
> Best
> Erick
>
> On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <e...@makethembite.com> wrote:
>
>> Is there an issue running Solr in /home/lib as opposed to running it
>> somewhere outside of the virtual hosts like /lib?
>>
>> Eric
>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to