Hi Sascha, Thanks for your reply. Our approach is similar to what you have mentioned in the jira issue except that we have all metadata in the xml and not in the database. I am therefore using a custom XmlUpdateRequestHandler to parse the XML and then calling Tika from within the XML Loader to parse the content. Until now this seems to work. When and in which Solr version do you expect the jira issue to be addressed?
On Mon, Nov 16, 2009 at 5:02 PM, Sascha Szott <sz...@zib.de> wrote: > Hi, > > the problem you've described -- an integration of DataImportHandler (to > traverse the XML file and get the document urls) and Solr Cell (to extract > content afterwards) -- is already addressed in issue SOLR-1358 ( > https://issues.apache.org/jira/browse/SOLR-1358). > > Best, > Sascha > > > Kerwin wrote: > >> Hi, >> >> I am new to this forum and would like to know if the function described >> below has been developed or exists in Solr. If it does not exist, is it a >> good Idea and can I contribute. >> >> We need to index multiple documents with different formats. So we use Solr >> with Tika (Solr Cell). >> >> Question: >> Can you index both metadata and content for multiple documents iteratively >> in Solr? >> For example I have an XML with metadata and a links to the documents >> content. There are many documents in this XML and I would like to index >> them >> all without firing multiple URLs. >> >> Example of XML >> <add> >> <doc> >> <field name=id>34122</field> >> <field name=author>Michael</field> >> <field name=size>3MB</field> >> <field name=URL>URL of the document</field> >> </doc> >> </add> >> <doc2>.....</doc2>...</docN> >> >> I need to index all these documents by sending this XML in a single >> URL.The >> collection of documents to be indexed could be on a file system. >> >> I have altered the Solr code to be able to do this but is there an already >> existing feature? >> >> >