Hi,

I don't know nutch, but I will answer as if you were using solr cell : 
http://wiki.apache.org/solr/ExtractingRequestHandler

When a pdf file is sent to extracting request handler, several meta data are 
extracted from pdf. These metadata are assigned to fields. I usually enable 
dynamic field * to capture all metadata and see their accociated field names 
and values. Afterwards I select useful ones (and define them in schema.xml like 
you did for author) and forward remaining ones to an ignored dynamic field. 
Wiki page has all info to manipulate metadata generated by extraction.

Hope this helps.



On Wednesday, June 4, 2014 4:51 AM, Bayu Widyasanyata <bwidyasany...@gmail.com> 
wrote:



Hi Ahmet,

I just refering to Solr's schema.xml which described this field definition. In 
this case for example "author" field.
Then also refer to Solr query's result which I queried through Solr Admin page 
that didn't response author field.
CMIIW.

Thanks.-




On Wed, Jun 4, 2014 at 5:19 AM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:

Hi Bayu,
>
>I think this is a nutch question, no?
>
>Ahmet
>
>
>
>
>On Wednesday, June 4, 2014 1:13 AM, Bayu Widyasanyata 
><bwidyasany...@gmail.com> wrote:
>Hi,
>
>I'm sorry if this is a frequently asked question.
>
>In default Solr's schema.xml file we define an "author" field like
>following:
>    <field name="author" type="text_general" stored="true" indexed="true"/>
>
>But this field seems not parsed (by nutch) and indexed (by Solr).
>My query is always return null result for "author" field even some
>documents (PDF) are have author contents.
>
>How to display them?
>What should I prepared during fetch & parsing which I missed out?
>Any documents/links for this issue?
>
>Thanks in advance.
>
>--
>wassalam,
>[bayu]
>
>


-- 
wassalam,
[bayu]

Reply via email to