I take that back...Use am currently using version 1.2 and make sure
that the latest versions of Tika and PDFBox is in the contrib folder.
1.3 is structured a bit differently and it doesn't look like there is
a contrib directory. Maybe one of the Nutch contributors can comment
on this?
Adam
On Tue
There are a few tutorials out there.
1. http://wiki.apache.org/nutch/RunningNutchAndSolr (not the most practical)
2. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ (similar to 1.)
3. Build the latest from branch
http://svn.apache.org/repos/asf/nutch/branches/branch-1.3/ and read
this
Thanks Adam, It seems like Nutch use to solve most of my concerns.
i would be great if you can have share resources for Nutch with us.
/ Pankaj Bhatt.
On Tue, Jan 25, 2011 at 7:21 PM, Estrada Groups <
estrada.adam.gro...@gmail.com> wrote:
> I would just use Nutch and specify the -solr param on t
I would just use Nutch and specify the -solr param on the command line. That
will add the extracted content your instance of solr.
Adam
Sent from my iPhone
On Jan 25, 2011, at 5:29 AM, pankaj bhatt wrote:
> Hi All,
> I need to index the documents presents in my file system at various
Hi All,
I need to index the documents presents in my file system at various
locations (e.g. C:\docs , d:\docs ).
Is there any way through which i can specify this in my DIH
Configuration.
Here is my configuration:-