thanxs, i know and read that page. sending additional meta-tags with the curl call is no problem. i only thought that there might be a way to use the xml-approach also with PDF files. i'll go the "curl"-way for that files.
-- mit freundlichen Grüßen Markus Rietzler - <rietzler_software/> Rechenzentrum der Finanzverwaltung NRW 0211/4572-2130 > -----Ursprüngliche Nachricht----- > Von: Grant Ingersoll [mailto:gsing...@apache.org] > Gesendet: Dienstag, 27. Oktober 2009 11:43 > An: solr-user@lucene.apache.org > Betreff: Re: solr cell/tika: pdf import with xml metatags > > > On Oct 27, 2009, at 6:36 AM, <markus.rietz...@rzf.fin-nrw.de> > <markus.rietz...@rzf.fin-nrw.de > > wrote: > > > hi, > > > > we want to use SOLR as our intranet search engine. > > i downloaded the nightly bild of solr 1.4. pdf extraction does via > > Solr Cell/Tika. i can send the pdf via curl > > to solr. > > > > we do have a large set of meta-tags to all our intranet documents, > > including PDF, PPT etc. to import html > > files from our CMS i have access to all of this meta tags > and create > > a xml document which i send to SOLR, > > > > eg. > > > > <?xml version='1.0' encoding='UTF-8'?> > > <add> > > <doc> > > <field name="id">1</field> > > <field name="title">this is the title</field> > > </doc> > > <doc> > > <field name="id">2</field> > > <field name="title">this is another title</field> > > </doc> > > <doc> > > <field name="id">3</field> > > <field name="title">this is the third title</field> > > </doc> > > </add> > > > > this works fine with html files where i can grab all the > meta tags, > > including "body". > > > > so my question is, can i use this xml-document to send a pdf file > > also? > > I'm not sure what you mean here, can you clarify? PDF and other > "rich" documents can't be sent by XML. > > > ok, one way would be to use > > the extracthandler with extract only and put the data in > the "body"- > > field. > > I guess all I can point you at right now is the wiki: > http://wiki.apache.org/solr/ExtractingRequestHandler > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > >