Take a look at Solr Cell:

http://wiki.apache.org/solr/ExtractingRequestHandler

Include a dynamicField with a "*" pattern and you will see the wide variety of metadata that is available for PDF and other rich document formats.

-- Jack Krupansky

-----Original Message----- From: Luis
Sent: Thursday, March 14, 2013 3:30 PM
To: solr-user@lucene.apache.org
Subject: Solr indexing binary files

Hi, I am new with Solr and I am extracting metadata from binary files through
URLs stored in my database.  I would like to know what fields are available
for indexing from PDFs (the ones that would be initiated as in column=””).
For example how would I extract something like file size, format or file
type.

I would also like to know how to create customized fields in Solr.  How
those metadata and text content are mapped into Solr schema?  Would I have
to declare that in the solrconfig.xml or do some more tweaking somewhere
else?  If someone has a code snippet that could show me it would be greatly
appreciated.

Thank you in advance.




--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to