Take a look at Solr Cell:
http://wiki.apache.org/solr/ExtractingRequestHandler
Include a dynamicField with a "*" pattern and you will see the wide variety
of metadata that is available for PDF and other rich document formats.
-- Jack Krupansky
-----Original Message-----
From: Luis
Sent: Thursday, March 14, 2013 3:30 PM
To: solr-user@lucene.apache.org
Subject: Solr indexing binary files
Hi, I am new with Solr and I am extracting metadata from binary files
through
URLs stored in my database. I would like to know what fields are available
for indexing from PDFs (the ones that would be initiated as in column=””).
For example how would I extract something like file size, format or file
type.
I would also like to know how to create customized fields in Solr. How
those metadata and text content are mapped into Solr schema? Would I have
to declare that in the solrconfig.xml or do some more tweaking somewhere
else? If someone has a code snippet that could show me it would be greatly
appreciated.
Thank you in advance.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470.html
Sent from the Solr - User mailing list archive at Nabble.com.