Re: nutch and solr

2012-02-27 Thread alessio crisantemi
now, all works! I have another problem If I use a conector with my solr-nutch. this is the error: Grave: java.lang.RuntimeException: org.apache.lucene.index.CorruptIndexException: Unknown format version: -11 at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.cor

Re: nutch and solr

2012-02-25 Thread alessio crisantemi
thi is the problem! Becaus in my root there is a url! I write you my step-by-step configuration of nutch: (I use cygwin because I work on windows) *1. Extract the Nutch package* *2. Configure Solr* (*Copy the provided Nutch schema from directory apache-nutch-1.0/conf to directory apache-solr-1.3

Re: nutch and solr

2012-02-24 Thread tamanjit.bin...@yahoo.co.in
The empty path message is becayse nutch is unable to find a url in the url location that you provide. Kindly ensure there is a url there. -- View this message in context: http://lucene.472066.n3.nabble.com/nutch-and-solr-tp3765166p3773089.html Sent from the Solr - User mailing list archive at Na

Re: nutch and solr

2012-02-22 Thread alessio crisantemi
thanks for your reply, but don't work. the same message: can't convert empty path and more: impossible find class org.apache.nutch.crawl.injector .. Il giorno 22 febbraio 2012 06:14, tamanjit.bin...@yahoo.co.in < tamanjit.bin...@yahoo.co.in> ha scritto: > Try this command. > > bin/nutch crawl

Re: nutch and solr

2012-02-21 Thread tamanjit.bin...@yahoo.co.in
Try this command. bin/nutch crawl urls//.txt -dir crawl/ -threads 10 -depth 2 -topN 1000 Your folder structure will look like this: -- urls -- -- .txt | | -- crawl -- The folder name will be for different domains. So for each domain

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Charan, Thanks for the clarifications. The link I have been referring to( http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/) does not say anything about using the crawl? Do I have to do it after the last step mentioned? Thanks, Abi On Thu, Feb 10, 2011 at 12:58 AM, charan kumar

Re: Nutch and Solr search on the fly

2011-02-09 Thread charan kumar
Hi Abishek, depth is a param of crawl command, not fetch command If you are using custom script calling individual stages of nutch crawl, then depth N means , you running that script for N times.. You can put a loop, in the script. Thanks, Charan On Wed, Feb 9, 2011 at 6:26 AM, .: Abhishek :.

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Erick, Thanks a bunch for the response Could be a chance..but all I am wondering is where to specify the depth in the whole entire process in the URL http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/? I tried specifying it during the fetcher phase but it was just ignored :( Thanks

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
Are you using the depth parameter with the crawl command or are you using the separate generate, fetch etc. commands? What's $ nutch readdb -stats returning? On Wednesday 09 February 2011 15:06:40 .: Abhishek :. wrote: > Hi Markus, > > I am sorry for not being clear, I meant to say that... >

Re: Nutch and Solr search on the fly

2011-02-09 Thread Erick Erickson
WARNING: I don't do Nutch much, but could it be that your crawl depth is 1? See: http://wiki.apache.org/nutch/NutchTutorial and search for "depth" Best Erick On Wed, Feb 9, 2011 at 9:06 AM, .: Abhishek :. wrote: > Hi Markus, > > I am sorry for not be

Re: Nutch and Solr search on the fly

2011-02-09 Thread .: Abhishek :.
Hi Markus, I am sorry for not being clear, I meant to say that... Suppose if a url namely www.somehost.com/gifts/greetingcard.html(which in turn contain links to a.html, b.html, c.html, d.html) is injected into the seed.txt, after the whole process I was expecting a bunch of other pages which c

Re: Nutch and Solr search on the fly

2011-02-09 Thread Markus Jelsma
The parsed data is only sent to the Solr index of you tell a segment to be indexed; solrindex If you did this only once after injecting and then the consequent fetch,parse,update,index sequence then you, of course, only see those URL's. If you don't index a segment after it's being parsed,

Re: [Nutch] and Solr integration

2011-01-03 Thread Adam Estrada
BLEH! This is entirely possible to do in a single step AS LONG AS YOU GET THE SYNTAX CORRECT ;-) http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/ bin/nutch crawl urls -dir crawl -threads

Re: [Nutch] and Solr integration

2011-01-03 Thread Adam Estrada
All, I realize that the documentation says that you crawl first then add to Solr but I spent several hours running the same command through Cygwin with -solrindex http://localhost:8983/solr on the command line (eg. bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex http://l

Re: [Nutch] and Solr integration

2010-12-20 Thread Adam Estrada
bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex http://localhost:8983/solr I've run that command before and it worked...that's why I asked. grab nutch from trunk and run bin/nutch and see that it is in fact an option. It looks like Hadoop is the culprit now and I am at

Re: [Nutch] and Solr integration

2010-12-20 Thread Anurag
why are using solrindex in the argument.? It is used when we need to index the crawled data in Solr For more read http://wiki.apache.org/nutch/NutchTutorial . Also for nutch-solr integration this is very useful blog http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ I integrated nutch an