Shalin Shekhar Mangar wrote:
Hi Daniel,
Maybe if you can give us a sample of how your XML looks like, we can suggest
how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases
we have yet encountered are solvable using the XPathEntityProcessor in
DataImportHandler without using XSLT, for details look at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
I think even if it is possible to use SOLR-469 for my needs, I'd still
prefer the XSLT approach, because it's going to be a bit of
configuration either way, and I'd rather it be an XSLT stylesheet than
solrconfig.xml. In addition, I haven't yet decided whether I want to
apply any patches to the version that we will deploy, but if I do go
down the route of the XSLT transform patch, if I end up having to back
it out the amount of work that it would be for me to do the transform at
the XML source would be negligible, where it would be quite a bit of
work ahead of me to go from using the DataImportHandler to not using it
at all.
Because both the solr instance and the XML source are in house, I have
the ability to apply the XSLT at the source instead of at solr.
However, there are different teams of people that control the XML source
and solr, so it would require a bit more office coordination to do it on
the backend.
The data is a filemaker XML export (DTD fmresultset) and it looks
roughly like this:
<fmresultset>
<resultset>
<field name="ID"><data>125</data></field>
<field name="organization"><data>Ford Foundation</data></field>
...
<relatedset table="Employees">
<record>
<field name="ID"><data>Y5-A</data></field>
<field name="Name"><data>John Smith</data></field>
</record>
<record>
<field name="ID"><data>Y5-B</data></field>
<field name="Name"><data>Jane Doe</data></field>
</record>
</relatedset>
</fmresultset>
I'm taking the product of the resultset and the relatedset, using both
IDs concatenated as a unique identifier, like so:
<doc>
<field name="ID">125Y5-A</field>
<field name="organization">Ford Foundation</field>
<field name="Name">John Smith</field>
</doc>
<doc>
<field name="ID">125Y5-B</field>
<field name="organization">Ford Foundation</field>
<field name="Name">Jane Doe</field>
</doc>
I can do the transform pretty simply with XSLT. I suppose it is
possible to get the DataImportHandler to do this, but I'm not yet
convinced that it's easier.
Daniel