I'm in the same situation as you Daniel.  The DataImportHandler is pretty
awesome but I'd also prefer it had the power of XSLT.  The XPath support in
it doesn't suffice for me.  And I can't do very basic things like
concatenate one value with another, say a constant even.  It's too bad there
isn't a mode that XSLT can be put in to to not build the whole file into
memory to do the transform.  I've been looking into this and have turned up
nothing.  It would be neat if there was a STaX to multi-document adapter, at
which point XSLT could be applied to the smaller fixed-size documents
instead of the entire data stream.  I haven't found anything like this so
it'd need to be built.  For now my documents aren't too big to XSLT
in-memory.

~ David


Daniel Papasian wrote:
> 
> Shalin Shekhar Mangar wrote:
>> Hi Daniel,
>> 
>> Maybe if you can give us a sample of how your XML looks like, we can
>> suggest
>> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>> use-cases
>> we have yet encountered are solvable using the XPathEntityProcessor in
>> DataImportHandler without using XSLT, for details look at
>> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
> 
> I think even if it is possible to use SOLR-469 for my needs, I'd still 
> prefer the XSLT approach, because it's going to be a bit of 
> configuration either way, and I'd rather it be an XSLT stylesheet than 
> solrconfig.xml.  In addition, I haven't yet decided whether I want to 
> apply any patches to the version that we will deploy, but if I do go 
> down the route of the XSLT transform patch, if I end up having to back 
> it out the amount of work that it would be for me to do the transform at 
> the XML source would be negligible, where it would be quite a bit of 
> work ahead of me to go from using the DataImportHandler to not using it 
> at all.
> 
> Because both the solr instance and the XML source are in house, I have 
> the ability to apply the XSLT at the source instead of at solr. 
> However, there are different teams of people that control the XML source 
> and solr, so it would require a bit more office coordination to do it on 
> the backend.
> 
> The data is a filemaker XML export (DTD fmresultset) and it looks 
> roughly like this:
> <fmresultset>
>    <resultset>
>      <field name="ID"><data>125</data></field>
>      <field name="organization"><data>Ford Foundation</data></field>
>      ...
>      <relatedset table="Employees">
>        <record>
>          <field name="ID"><data>Y5-A</data></field>
>          <field name="Name"><data>John Smith</data></field>
>        </record>
>        <record>
>          <field name="ID"><data>Y5-B</data></field>
>          <field name="Name"><data>Jane Doe</data></field>
>        </record>
>      </relatedset>
> </fmresultset>
> 
> I'm taking the product of the resultset and the relatedset, using both 
> IDs concatenated as a unique identifier, like so:
> 
> <doc>
> <field name="ID">125Y5-A</field>
> <field name="organization">Ford Foundation</field>
> <field name="Name">John Smith</field>
> </doc>
> <doc>
> <field name="ID">125Y5-B</field>
> <field name="organization">Ford Foundation</field>
> <field name="Name">Jane Doe</field>
> </doc>
> 
> I can do the transform pretty simply with XSLT.  I suppose it is 
> possible to get the DataImportHandler to do this, but I'm not yet 
> convinced that it's easier.
> 
> Daniel
> 
> 

-- 
View this message in context: 
http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to