We are planning to incorporate both your requests in the next patch.
The implementation is going to be as follows.mention the xsl file
location as follows
<entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl">
....
</entity>
So the processing will be done after the XSL transformation. If after
your XSL transformation it produces a valid 'add' document not even
fields is necessary. Otherwise you will need to write all the fields
and their xpaths like any other xml

<entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"
useSolrAddXml="true"/>

So it will assume that the schema is same as that of the add xml and
does the needful.

Another feature is going to be a TemplateTransformer  which takes in a
Template as follows

<entity name="e" transformer="TemplateTransformer" ....>
<field column="field1_2"  template="${e.field1} ${e.field2}/>
</entity>

Please let us know what u think about this.

And keep giving us these great use-cases so that we can make the tool better.
--Noble



On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org
<[EMAIL PROTECTED]> wrote:
>
>  Thanks Shalin.
>
>  The particular XSLT processor used is not relevant; it's a spec.  Just use
>  the standard Java APIs.  If I want a particular processor, then I can get
>  that to happen by using a system property and/or you could offer a
>  configuration input for the standard factory class implementation for a
>  processor of my choice.
>
>  ~ David
>
>
>
>
>  Shalin Shekhar Mangar wrote:
>  >
>  > Hi David,
>  > Actually you can concatenate values, however you'll have to write a bit of
>  > code. You can write this in javascript (if you're using Java 6) or in
>  > Java.
>  >
>  > Basically, you need to write a Transformer to do it. Look at
>  > 
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
>  >
>  > For example, lets say you get fields first-name and last-name in the XML.
>  > But in the schema.xml you have a field called "name" in which you need to
>  > concatenate the values of first-name and last-name (with a space in
>  > between). Create a Java class:
>  >
>  > public class ConcatenateTransformer { public Object
>  > transformRow(Map<String,
>  > Object> row) { String firstName = row.get("first-name"); String lastName =
>  > row.get("last-name"); row.put("name", firstName + " " + lastName); return
>  > row; } }
>  >
>  > Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
>  >
>  > The data-config.xml should like this:
>  > <entity name="myEntity" processor="XPathEntityProcessor" url="
>  > http://myurl/example.xml";
>  > transformer="com.yourpackage.ConcatenateTransformer"> <field
>  > column="first-name" xpath="/record/first-name" /> <field
>  > column="last-name"
>  > xpath="/record/last-name" /> <field column="name" /> </entity>
>  >
>  > This will call ConcatenateTransformer.transformRow method for each row and
>  > you can concatenate any field with any field (or constant). Note that solr
>  > document will keep only those fields which are in the schema.xml, the rest
>  > are thrown away.
>  >
>  > If you don't want to write this in Java, you can use JavaScript by using
>  > the
>  > built-in ScriptTransformer, for an example look at
>  > 
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
>  >
>  > However, I'm beginning to realize that XSLT is a common need, let me see
>  > how
>  > best we can accomodate it in DataImportHandler. Which XSLT processor will
>  > you prefer?
>  >
>  > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
>  > <[EMAIL PROTECTED]>
>  > wrote:
>  >
>  >>
>  >> I'm in the same situation as you Daniel.  The DataImportHandler is pretty
>  >> awesome but I'd also prefer it had the power of XSLT.  The XPath support
>  >> in
>  >> it doesn't suffice for me.  And I can't do very basic things like
>  >> concatenate one value with another, say a constant even.  It's too bad
>  >> there
>  >> isn't a mode that XSLT can be put in to to not build the whole file into
>  >> memory to do the transform.  I've been looking into this and have turned
>  >> up
>  >> nothing.  It would be neat if there was a STaX to multi-document adapter,
>  >> at
>  >> which point XSLT could be applied to the smaller fixed-size documents
>  >> instead of the entire data stream.  I haven't found anything like this so
>  >> it'd need to be built.  For now my documents aren't too big to XSLT
>  >> in-memory.
>  >>
>  >> ~ David
>  >>
>  >>
>  >> Daniel Papasian wrote:
>  >> >
>  >> > Shalin Shekhar Mangar wrote:
>  >> >> Hi Daniel,
>  >> >>
>  >> >> Maybe if you can give us a sample of how your XML looks like, we can
>  >> >> suggest
>  >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>  >> >> use-cases
>  >> >> we have yet encountered are solvable using the XPathEntityProcessor in
>  >> >> DataImportHandler without using XSLT, for details look at
>  >> >>
>  >> 
> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>  >> >
>  >> > I think even if it is possible to use SOLR-469 for my needs, I'd still
>  >> > prefer the XSLT approach, because it's going to be a bit of
>  >> > configuration either way, and I'd rather it be an XSLT stylesheet than
>  >> > solrconfig.xml.  In addition, I haven't yet decided whether I want to
>  >> > apply any patches to the version that we will deploy, but if I do go
>  >> > down the route of the XSLT transform patch, if I end up having to back
>  >> > it out the amount of work that it would be for me to do the transform
>  >> at
>  >> > the XML source would be negligible, where it would be quite a bit of
>  >> > work ahead of me to go from using the DataImportHandler to not using it
>  >> > at all.
>  >> >
>  >> > Because both the solr instance and the XML source are in house, I have
>  >> > the ability to apply the XSLT at the source instead of at solr.
>  >> > However, there are different teams of people that control the XML
>  >> source
>  >> > and solr, so it would require a bit more office coordination to do it
>  >> on
>  >> > the backend.
>  >> >
>  >> > The data is a filemaker XML export (DTD fmresultset) and it looks
>  >> > roughly like this:
>  >> > <fmresultset>
>  >> >    <resultset>
>  >> >      <field name="ID"><data>125</data></field>
>  >> >      <field name="organization"><data>Ford Foundation</data></field>
>  >> >      ...
>  >> >      <relatedset table="Employees">
>  >> >        <record>
>  >> >          <field name="ID"><data>Y5-A</data></field>
>  >> >          <field name="Name"><data>John Smith</data></field>
>  >> >        </record>
>  >> >        <record>
>  >> >          <field name="ID"><data>Y5-B</data></field>
>  >> >          <field name="Name"><data>Jane Doe</data></field>
>  >> >        </record>
>  >> >      </relatedset>
>  >> > </fmresultset>
>  >> >
>  >> > I'm taking the product of the resultset and the relatedset, using both
>  >> > IDs concatenated as a unique identifier, like so:
>  >> >
>  >> > <doc>
>  >> > <field name="ID">125Y5-A</field>
>  >> > <field name="organization">Ford Foundation</field>
>  >> > <field name="Name">John Smith</field>
>  >> > </doc>
>  >> > <doc>
>  >> > <field name="ID">125Y5-B</field>
>  >> > <field name="organization">Ford Foundation</field>
>  >> > <field name="Name">Jane Doe</field>
>  >> > </doc>
>  >> >
>  >> > I can do the transform pretty simply with XSLT.  I suppose it is
>  >> > possible to get the DataImportHandler to do this, but I'm not yet
>  >> > convinced that it's easier.
>  >> >
>  >> > Daniel
>  >> >
>  >> >
>  >>
>  >> --
>  >> View this message in context:
>  >> 
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
>  >> Sent from the Solr - User mailing list archive at Nabble.com.
>  >>
>  >>
>  >
>  >
>  > --
>  > Regards,
>  > Shalin Shekhar Mangar.
>  >
>  >
>
>  --
>  View this message in context: 
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
>
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Reply via email to