You might be able to do something with the XSL Transformer step in DIH. It might also be easier to just write a SolrJ program to parse the XML and construct a SolrInputDocument to send to Solr. It's really pretty straightforward.
Best Erick On Sun, Feb 26, 2012 at 11:31 PM, Anupam Bhattacharya <anupam...@gmail.com> wrote: > Hi, > > I am using ManifoldCF to Crawl data from Documentum repository. I am able > to successfully read the metadata/properties for the defined document types > in Documentum using the out-of-the box Documentum Connector in ManifoldCF. > Unfortunately, there is one XML file also present which consists of a > custom XML structure which I need to read and fetch the element values and > add it for indexing in lucene through SOLR. > > Is there any mechanism to index any XML structure document in SOLR ? > > I checked the SOLR CELL framework which support below stucture.. > > <add> > <doc> > <field name="id">9885A004</field> > <field name="name">Canon PowerShot SD500</field> > <field name="category">camera</field> > <field name="features">3x optical zoom</field> > <field name="features">aluminum case</field> > <field name="weight">6.4</field> > <field name="price">329.95</field> > </doc> > <doc> > <field name="id">9885A003</field> > <field name="name">Canon PowerShot SD504</field> > <field name="category">camera1</field> > <field name="features">3x optical zoom1</field> > <field name="features">aluminum case1</field> > <field name="weight">6.41</field> > <field name="price">329.956</field> > </doc> > </add> > > & my Custom XML structure is of the following format.. from which I need to > read *subject *& *abstract *field for indexing. I checked TIKA project but > I couldn't find any useful stuff. > > <?xml version="1.0" encoding="UTF-8"?> > <RECORD> > <doc_id>1</doc_id> > <abstract>This is an abstract.</abstract> > <subject>Text Subject</subject> > <availability /> > <indexing> > <index_group></index_group> > <keyterms></keyterms> > <keyterms></keyterms> > </indexing> > <publication_date></publication_date> > <physical_storage /> > <log_entry /> > <legal_category /> > <legal_category_notes /> > <citation_only></citation_only> > <citation_only_desc /> > <export_control /> > <export_control_desc /> > </RECORD> > > Appreciate any help on this. > > Regards > Anupam