Helps lots. Thanks, Jorge Luis. Good point about different fields -
I'll just put the h1 and h2 (however deep I want to go) into fields, and we
can sort out weighting and whether we want it later with edismax. The
blogs on adding plugins for that sort of thing look straightforward.
On Mon, J
Hi Dan:
Agreed, this question is more Nutch related than Solr ;)
Nutch doesn't send any data into /update/extract request handler, all the text
and metadata extraction happens in Nutch side rather than relying in the
ExtractRequestHandler provided by Solr. Underneath Nutch use Tika the same
te