Re: DIH From various File system locations

2011-01-25 Thread Adam Estrada
I take that back...Use am currently using version 1.2 and make sure that the latest versions of Tika and PDFBox is in the contrib folder. 1.3 is structured a bit differently and it doesn't look like there is a contrib directory. Maybe one of the Nutch contributors can comment on this? Adam On Tue

Re: DIH From various File system locations

2011-01-25 Thread Adam Estrada
There are a few tutorials out there. 1. http://wiki.apache.org/nutch/RunningNutchAndSolr (not the most practical) 2. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ (similar to 1.) 3. Build the latest from branch http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/ and read this

Re: DIH From various File system locations

2011-01-25 Thread pankaj bhatt
Thanks Adam, It seems like Nutch use to solve most of my concerns. i would be great if you can have share resources for Nutch with us. / Pankaj Bhatt. On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups < estrada.adam.gro...@gmail.com> wrote: > I would just use Nutch and specify the -solr param on t

Re: DIH From various File system locations

2011-01-25 Thread Estrada Groups
I would just use Nutch and specify the -solr param on the command line. That will add the extracted content your instance of solr. Adam Sent from my iPhone On Jan 25, 2011, at 5:29 AM, pankaj bhatt wrote: > Hi All, > I need to index the documents presents in my file system at various

DIH From various File system locations

2011-01-25 Thread pankaj bhatt
Hi All, I need to index the documents presents in my file system at various locations (e.g. C:\docs , d:\docs ). Is there any way through which i can specify this in my DIH Configuration. Here is my configuration:-