Shalin Shekhar Mangar wrote:
Hi Daniel,

Maybe if you can give us a sample of how your XML looks like, we can suggest
how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases
we have yet encountered are solvable using the XPathEntityProcessor in
DataImportHandler without using XSLT, for details look at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476

I think even if it is possible to use SOLR-469 for my needs, I'd still prefer the XSLT approach, because it's going to be a bit of configuration either way, and I'd rather it be an XSLT stylesheet than solrconfig.xml. In addition, I haven't yet decided whether I want to apply any patches to the version that we will deploy, but if I do go down the route of the XSLT transform patch, if I end up having to back it out the amount of work that it would be for me to do the transform at the XML source would be negligible, where it would be quite a bit of work ahead of me to go from using the DataImportHandler to not using it at all.

Because both the solr instance and the XML source are in house, I have the ability to apply the XSLT at the source instead of at solr. However, there are different teams of people that control the XML source and solr, so it would require a bit more office coordination to do it on the backend.

The data is a filemaker XML export (DTD fmresultset) and it looks roughly like this:
<fmresultset>
  <resultset>
    <field name="ID"><data>125</data></field>
    <field name="organization"><data>Ford Foundation</data></field>
    ...
    <relatedset table="Employees">
      <record>
        <field name="ID"><data>Y5-A</data></field>
        <field name="Name"><data>John Smith</data></field>
      </record>
      <record>
        <field name="ID"><data>Y5-B</data></field>
        <field name="Name"><data>Jane Doe</data></field>
      </record>
    </relatedset>
</fmresultset>

I'm taking the product of the resultset and the relatedset, using both IDs concatenated as a unique identifier, like so:

<doc>
<field name="ID">125Y5-A</field>
<field name="organization">Ford Foundation</field>
<field name="Name">John Smith</field>
</doc>
<doc>
<field name="ID">125Y5-B</field>
<field name="organization">Ford Foundation</field>
<field name="Name">Jane Doe</field>
</doc>

I can do the transform pretty simply with XSLT. I suppose it is possible to get the DataImportHandler to do this, but I'm not yet convinced that it's easier.

Daniel

Reply via email to