With virtual hosting you can give CPU & memory quotas to your different VMs. This allows you to control the Nutch v.s. The World problem. Unforch, you cannot allocate disk channel. With two i/o bound apps, this is a problem.
On Sun, Oct 31, 2010 at 4:38 PM, Eric Martin <e...@makethembite.com> wrote: > Excellent information. Thank you. Solr is acting just fine then. I can > connect to it no issues, it indexes fine and there didn't seem to be any > complication with it. Now I can rule it out and go about solving, what you > pointed out, and I agree, to be a java/nutch issue. > > Nutch is a crawler I use to feed URL's into Solr for indexing. Nutch is open > source and found on apache.org > > Thanks for your time. > > -----Original Message----- > From: Jonathan Rochkind [mailto:rochk...@jhu.edu] > Sent: Sunday, October 31, 2010 4:33 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr in virtual host as opposed to /lib > > What servlet container are you putting your Solr in? Jetty? Tomcat? > Something else? Are you fronting it with apache on top of that? (I think > maybe you are, otherwise I'm not sure how the phrase 'virtual host' > applies). > > In general, Solr of course doesn't care what directory it's in on disk, so > long as the process running solr has the neccesary read/write permissions to > the neccesary directories (and if it doesn't, you'd usually find out right > away with an error message). And clients to Solr don't care what directory > it's in on disk either, they only care that they can get it to it connecting > to a certain port at a certain hostname. In general, if they can't get to it > on a certain port at a certain hostname, that's something you'd discover > right away, not something that would be intermittent. But I'm not familiar > with nutch, you may want to try connecting to the port you have Solr running > on (the hostname/port you have told nutch to find solr on?) yourself > manually, and just make sure it is connectable. > > I can't think of any reason that what directory you have Solr in could cause > CPU utilization issues. I think it's got nothing to do with that. > > I am not familar with nutch, if it's nutch that's taking 100% of your CPU, > you might want to find some nutch experts to ask. Perhaps there's a nutch > listserv? I am also not familiar with hadoop; you mention just in passing > that you're using hadoop too, maybe that's an added complication, I don't > know. > > One obvious reason nutch could be taking 100% cpu would be simply because > you've asked it to do a lot of work quickly, and it's trying to. > > One reason I have seen Solr take 100% of CPU and become responsive, is when > the Solr process gets caught up in terrible Java garbage collection. If > that's what's happening, then giving the Solr JVM a higher maximum heap size > can sometimes help (although confusingly, I've seen people suggest that if > you give the Solr JVM too MUCH heap it can also result in long GC pauses), > and if you have a multi-core/multi-CPU machine, I've found the JVM argument > -XX:+UseConcMarkSweepGC to be very helpful. > > Other than that, it sounds to me like you've got a nutch/hadoop issue, not a > Solr issue. > ________________________________________ > From: Eric Martin [e...@makethembite.com] > Sent: Sunday, October 31, 2010 7:16 PM > To: solr-user@lucene.apache.org > Subject: RE: Solr in virtual host as opposed to /lib > > Hi, > > Thank you. This is more than idle curiosity. I am trying to debug an issue I > am having with my installation and this is one step in verifying that I have > a setup that does not consume resources. I am trying to debunk my internal > myth that having Solr nad Nutch in a virtual host would be causing these > issues. Here is the main issue that involves Nutch/Solr and Drupal: > > /home/mootlaw/lib/solr > /home/mootlaw/lib/nutch > /home/mootlaw/www/<Drupal site> > > I'm running a 1333 FSB Dual Socket Xeon 5500 Series @ 2.4ghz, Enterprise > Linux - x86_64 - OS, 12 Gig RAM. My Solr and Nutch are running. I am using > jetty for my Solr. My server is not rooted. > > Nutch is using 100% of my cpus. I see this in my CPU utilization in my whm: > > /usr/bin/java -Xmx1000m -Dhadoop.log.dir=/home/mootlaw/lib/nutch/logs > -Dhadoop.log.file=hadoop.log > -Djava.library.path=/home/mootlaw/lib/nutch/lib/native/Linux-amd64-64 > -classpath > /home/mootlaw/lib/nutch/conf:/usr/lib/tools.jar:/home/mootlaw/lib/nutch/buil > d:/home/mootlaw/lib/nutch/build/test/classes:/home/mootlaw/lib/nutch/build/n > utch-1.2.job:/home/mootlaw/lib/nutch/nutch-*.job:/home/mootlaw/lib/nutch/lib > /apache-solr-core-1.4.0.jar:/home/mootlaw/lib/nutch/lib/apache-solr-solrj-1. > 4.0.jar:/home/mootlaw/lib/nutch/lib/commons-beanutils-1.8.0.jar:/home/mootla > w/lib/nutch/lib/commons-cli-1.2.jar:/home/mootlaw/lib/nutch/lib/commons-code > c-1.3.jar:/home/mootlaw/lib/nutch/lib/commons-collections-3.2.1.jar:/home/mo > otlaw/lib/nutch/lib/commons-el-1.0.jar:/home/mootlaw/lib/nutch/lib/commons-h > ttpclient-3.1.jar:/home/mootlaw/lib/nutch/lib/commons-io-1.4.jar:/home/mootl > aw/lib/nutch/lib/commons-lang-2.1.jar:/home/mootlaw/lib/nutch/lib/commons-lo > gging-1.0.4.jar:/home/mootlaw/lib/nutch/lib/commons-logging-api-1.0.4.jar:/h > ome/mootlaw/lib/nutch/lib/commons-net-1.4.1.jar:/home/mootlaw/lib/nutch/lib/ > core-3.1.1.jar:/home/mootlaw/lib/nutch/lib/geronimo-stax-api_1.0_spec-1.0.1. > jar:/home/mootlaw/lib/nutch/lib/hadoop-0.20.2-core.jar:/home/mootlaw/lib/nut > ch/lib/hadoop-0.20.2-tools.jar:/home/mootlaw/lib/nutch/lib/hsqldb-1.8.0.10.j > ar:/home/mootlaw/lib/nutch/lib/icu4j-4_0_1.jar:/home/mootlaw/lib/nutch/lib/j > akarta-oro-2.0.8.jar:/home/mootlaw/lib/nutch/lib/jasper-compiler-5.5.12.jar: > /home/mootlaw/lib/nutch/lib/jasper-runtime-5.5.12.jar:/home/mootlaw/lib/nutc > h/lib/jcl-over-slf4j-1.5.5.jar:/home/mootlaw/lib/nutch/lib/jets3t-0.6.1.jar: > /home/mootlaw/lib/nutch/lib/jetty-6.1.14.jar:/home/mootlaw/lib/nutch/lib/jet > ty-util-6.1.14.jar:/home/mootlaw/lib/nutch/lib/junit-3.8.1.jar:/home/mootlaw > /lib/nutch/lib/kfs-0.2.2.jar:/home/mootlaw/lib/nutch/lib/log4j-1.2.15.jar:/h > ome/mootlaw/lib/nutch/lib/lucene-core-3.0.1.jar:/home/mootlaw/lib/nutch/lib/ > lucene-misc-3.0.1.jar:/home/mootlaw/lib/nutch/lib/oro-2.0.8.jar:/home/mootla > w/lib/nutch/lib/resolver.jar:/home/mootlaw/lib/nutch/lib/serializer.jar:/hom > e/mootlaw/lib/nutch/lib/servlet-api-2.5-6.1.14.jar:/home/mootlaw/lib/nutch/l > ib/slf4j-api-1.5.5.jar:/home/mootlaw/lib/nutch/lib/slf4j-log4j12-1.4.3.jar:/ > home/mootlaw/lib/nutch/lib/taglibs-i18n.jar:/home/mootlaw/lib/nutch/lib/tika > -core-0.7.jar:/home/mootlaw/lib/nutch/lib/wstx-asl-3.2.7.jar:/home/mootlaw/l > ib/nutch/lib/xercesImpl.jar:/home/mootlaw/lib/nutch/lib/xml-apis.jar:/home/m > ootlaw/lib/nutch/lib/xmlenc-0.52.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp > -2.1.jar:/home/mootlaw/lib/nutch/lib/jsp-2.1/jsp-api-2.1.jar > org.apache.nutch.fetcher.Fetcher > /home/mootlaw/lib/nutch/crawl/segments/20101031144443 -threads 50 > > My PIDS cannot be traced and my mem usage is at 5% > > My hadoop logs show: > > 2010-10-31 15:44:11,040 INFO fetcher.Fetcher - fetching > http://caselaw.findlaw.com/us-5th-circuit/1454354.html > 2010-10-31 15:44:11,294 INFO fetcher.Fetcher - fetching > http://www.dallastxcriminaldefenseattorney.com/atom.xml > 2010-10-31 15:44:11,337 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=48, fetchQueues.totalSize=2499 > 2010-10-31 15:44:12,339 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=2500 > 2010-10-31 15:44:13,341 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=2500 > 2010-10-31 15:44:14,344 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=2500 > 2010-10-31 15:44:15,346 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=2500 > 2010-10-31 15:44:16,349 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=50, fetchQueues.totalSize=2500 > 2010-10-31 15:44:16,568 INFO fetcher.Fetcher - fetching > http://caselaw.findlaw.com/il-court-of-appeals/1542438.html > 2010-10-31 15:44:17,308 INFO fetcher.Fetcher - fetching > http://lcweb2.loc.gov/const/const.html > 2010-10-31 15:44:17,352 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=49, fetchQueues.totalSize=2499 > 2010-10-31 15:44:18,354 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=49, fetchQueues.totalSize=2500 > 2010-10-31 15:44:19,356 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=49, fetchQueues.totalSize=2500 > 2010-10-31 15:44:20,358 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=49, fetchQueues.totalSize=2500 > 2010-10-31 15:44:21,360 INFO fetcher.Fetcher - -activeThreads=50, > spinWaiting=49, fetchQueues.totalSize=2500 > Can anyone help me out? Did I miss something should i be using Tomcat? One > interesting part of this is when I try and change the nutch setting post url > and urls by score to 1 they stay at 10 no matter what I do. > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Sunday, October 31, 2010 4:12 PM > To: solr-user@lucene.apache.org > Subject: Re: Solr in virtual host as opposed to /lib > > Can you expand on your question? Are you having a problem? Is this idle > curiosity? > > Because I have no idea how to respond when there is so little information. > > Best > Erick > > On Sun, Oct 31, 2010 at 5:32 PM, Eric Martin <e...@makethembite.com> wrote: > >> Is there an issue running Solr in /home/lib as opposed to running it >> somewhere outside of the virtual hosts like /lib? >> >> Eric >> >> > > -- Lance Norskog goks...@gmail.com