Re: indexing pdf documents

2008-05-14 Thread Brian Carmalt
Hello Cam, The wiki for RichDocuments explains how you can add meta data to the RDUpdater. http://wiki.apache.org/solr/UpdateRichDocuments I have used the patch to index docs and thier meta data, but it was not exactly what we needed. Brian. Am Mittwoch, den 14.05.2008, 12:38 +0300 schrieb

Re: indexing pdf documents

2008-05-14 Thread Cam Bazz
Hello Elizabeth; Yes, I have PDF files, and metadata about them already extracted. so I need something like: someone content of my pdf file it seems that the updaterichdocument patch can only accept pdfs in raw form - so it is not possible to feed metadata. Have you found a solution other th

Re: indexing pdf documents

2008-05-13 Thread Bess Sadler
C.B., are you saying you have metadata about your PDF files (i.e., title, author, etc) separate from the PDF file itself, or are you saying you want to extract that information from the PDF file? The first of these is pretty easy, the second of these can be difficult or impossible, dependin

Re: indexing pdf documents

2008-05-13 Thread Cam Bazz
yes, I have seen the documentation on RichDocumentRequestHandler at the http://wiki.apache.org/solr/UpdateRichDocuments page. However, from what I understand this just feeds documents to solr. How can I construct something like: document_id, document_name, document_text and feed it in. (i.e. my doc

Re: indexing pdf documents

2008-05-12 Thread Chris Harris
Solr does not have this support built in, but there's a patch for it: https://issues.apache.org/jira/browse/SOLR-284 On Mon, May 12, 2008 at 2:02 PM, Cam Bazz <[EMAIL PROTECTED]> wrote: > Hello, > > Before making a little program to extract the txt from my pdfs and feed it > into solr with xml,