We are planning to incorporate both your requests in the next patch.
The implementation is going to be as follows.mention the xsl file
location as follows
<entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl">
....
</entity>
So the processing will be done after the XSL transformation. If after
your XSL transformation it produces a valid 'add' document not even
fields is necessary. Otherwise you will need to write all the fields
and their xpaths like any other xml
<entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"
useSolrAddXml="true"/>
So it will assume that the schema is same as that of the add xml and
does the needful.
Another feature is going to be a TemplateTransformer which takes in a
Template as follows
<entity name="e" transformer="TemplateTransformer" ....>
<field column="field1_2" template="${e.field1} ${e.field2}/>
</entity>
Please let us know what u think about this.
And keep giving us these great use-cases so that we can make the tool better.
--Noble
On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org
<[EMAIL PROTECTED]> wrote:
>
> Thanks Shalin.
>
> The particular XSLT processor used is not relevant; it's a spec. Just use
> the standard Java APIs. If I want a particular processor, then I can get
> that to happen by using a system property and/or you could offer a
> configuration input for the standard factory class implementation for a
> processor of my choice.
>
> ~ David
>
>
>
>
> Shalin Shekhar Mangar wrote:
> >
> > Hi David,
> > Actually you can concatenate values, however you'll have to write a bit of
> > code. You can write this in javascript (if you're using Java 6) or in
> > Java.
> >
> > Basically, you need to write a Transformer to do it. Look at
> >
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
> >
> > For example, lets say you get fields first-name and last-name in the XML.
> > But in the schema.xml you have a field called "name" in which you need to
> > concatenate the values of first-name and last-name (with a space in
> > between). Create a Java class:
> >
> > public class ConcatenateTransformer { public Object
> > transformRow(Map<String,
> > Object> row) { String firstName = row.get("first-name"); String lastName =
> > row.get("last-name"); row.put("name", firstName + " " + lastName); return
> > row; } }
> >
> > Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
> >
> > The data-config.xml should like this:
> > <entity name="myEntity" processor="XPathEntityProcessor" url="
> > http://myurl/example.xml"
> > transformer="com.yourpackage.ConcatenateTransformer"> <field
> > column="first-name" xpath="/record/first-name" /> <field
> > column="last-name"
> > xpath="/record/last-name" /> <field column="name" /> </entity>
> >
> > This will call ConcatenateTransformer.transformRow method for each row and
> > you can concatenate any field with any field (or constant). Note that solr
> > document will keep only those fields which are in the schema.xml, the rest
> > are thrown away.
> >
> > If you don't want to write this in Java, you can use JavaScript by using
> > the
> > built-in ScriptTransformer, for an example look at
> >
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> >
> > However, I'm beginning to realize that XSLT is a common need, let me see
> > how
> > best we can accomodate it in DataImportHandler. Which XSLT processor will
> > you prefer?
> >
> > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
> > <[EMAIL PROTECTED]>
> > wrote:
> >
> >>
> >> I'm in the same situation as you Daniel. The DataImportHandler is pretty
> >> awesome but I'd also prefer it had the power of XSLT. The XPath support
> >> in
> >> it doesn't suffice for me. And I can't do very basic things like
> >> concatenate one value with another, say a constant even. It's too bad
> >> there
> >> isn't a mode that XSLT can be put in to to not build the whole file into
> >> memory to do the transform. I've been looking into this and have turned
> >> up
> >> nothing. It would be neat if there was a STaX to multi-document adapter,
> >> at
> >> which point XSLT could be applied to the smaller fixed-size documents
> >> instead of the entire data stream. I haven't found anything like this so
> >> it'd need to be built. For now my documents aren't too big to XSLT
> >> in-memory.
> >>
> >> ~ David
> >>
> >>
> >> Daniel Papasian wrote:
> >> >
> >> > Shalin Shekhar Mangar wrote:
> >> >> Hi Daniel,
> >> >>
> >> >> Maybe if you can give us a sample of how your XML looks like, we can
> >> >> suggest
> >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
> >> >> use-cases
> >> >> we have yet encountered are solvable using the XPathEntityProcessor in
> >> >> DataImportHandler without using XSLT, for details look at
> >> >>
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
> >> >
> >> > I think even if it is possible to use SOLR-469 for my needs, I'd still
> >> > prefer the XSLT approach, because it's going to be a bit of
> >> > configuration either way, and I'd rather it be an XSLT stylesheet than
> >> > solrconfig.xml. In addition, I haven't yet decided whether I want to
> >> > apply any patches to the version that we will deploy, but if I do go
> >> > down the route of the XSLT transform patch, if I end up having to back
> >> > it out the amount of work that it would be for me to do the transform
> >> at
> >> > the XML source would be negligible, where it would be quite a bit of
> >> > work ahead of me to go from using the DataImportHandler to not using it
> >> > at all.
> >> >
> >> > Because both the solr instance and the XML source are in house, I have
> >> > the ability to apply the XSLT at the source instead of at solr.
> >> > However, there are different teams of people that control the XML
> >> source
> >> > and solr, so it would require a bit more office coordination to do it
> >> on
> >> > the backend.
> >> >
> >> > The data is a filemaker XML export (DTD fmresultset) and it looks
> >> > roughly like this:
> >> > <fmresultset>
> >> > <resultset>
> >> > <field name="ID"><data>125</data></field>
> >> > <field name="organization"><data>Ford Foundation</data></field>
> >> > ...
> >> > <relatedset table="Employees">
> >> > <record>
> >> > <field name="ID"><data>Y5-A</data></field>
> >> > <field name="Name"><data>John Smith</data></field>
> >> > </record>
> >> > <record>
> >> > <field name="ID"><data>Y5-B</data></field>
> >> > <field name="Name"><data>Jane Doe</data></field>
> >> > </record>
> >> > </relatedset>
> >> > </fmresultset>
> >> >
> >> > I'm taking the product of the resultset and the relatedset, using both
> >> > IDs concatenated as a unique identifier, like so:
> >> >
> >> > <doc>
> >> > <field name="ID">125Y5-A</field>
> >> > <field name="organization">Ford Foundation</field>
> >> > <field name="Name">John Smith</field>
> >> > </doc>
> >> > <doc>
> >> > <field name="ID">125Y5-B</field>
> >> > <field name="organization">Ford Foundation</field>
> >> > <field name="Name">Jane Doe</field>
> >> > </doc>
> >> >
> >> > I can do the transform pretty simply with XSLT. I suppose it is
> >> > possible to get the DataImportHandler to do this, but I'm not yet
> >> > convinced that it's easier.
> >> >
> >> > Daniel
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
>
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>