We are planning to incorporate both your requests in the next patch. The implementation is going to be as follows.mention the xsl file location as follows <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"> .... </entity> So the processing will be done after the XSL transformation. If after your XSL transformation it produces a valid 'add' document not even fields is necessary. Otherwise you will need to write all the fields and their xpaths like any other xml
<entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl" useSolrAddXml="true"/> So it will assume that the schema is same as that of the add xml and does the needful. Another feature is going to be a TemplateTransformer which takes in a Template as follows <entity name="e" transformer="TemplateTransformer" ....> <field column="field1_2" template="${e.field1} ${e.field2}/> </entity> Please let us know what u think about this. And keep giving us these great use-cases so that we can make the tool better. --Noble On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org <[EMAIL PROTECTED]> wrote: > > Thanks Shalin. > > The particular XSLT processor used is not relevant; it's a spec. Just use > the standard Java APIs. If I want a particular processor, then I can get > that to happen by using a system property and/or you could offer a > configuration input for the standard factory class implementation for a > processor of my choice. > > ~ David > > > > > Shalin Shekhar Mangar wrote: > > > > Hi David, > > Actually you can concatenate values, however you'll have to write a bit of > > code. You can write this in javascript (if you're using Java 6) or in > > Java. > > > > Basically, you need to write a Transformer to do it. Look at > > > http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9 > > > > For example, lets say you get fields first-name and last-name in the XML. > > But in the schema.xml you have a field called "name" in which you need to > > concatenate the values of first-name and last-name (with a space in > > between). Create a Java class: > > > > public class ConcatenateTransformer { public Object > > transformRow(Map<String, > > Object> row) { String firstName = row.get("first-name"); String lastName = > > row.get("last-name"); row.put("name", firstName + " " + lastName); return > > row; } } > > > > Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib > > > > The data-config.xml should like this: > > <entity name="myEntity" processor="XPathEntityProcessor" url=" > > http://myurl/example.xml" > > transformer="com.yourpackage.ConcatenateTransformer"> <field > > column="first-name" xpath="/record/first-name" /> <field > > column="last-name" > > xpath="/record/last-name" /> <field column="name" /> </entity> > > > > This will call ConcatenateTransformer.transformRow method for each row and > > you can concatenate any field with any field (or constant). Note that solr > > document will keep only those fields which are in the schema.xml, the rest > > are thrown away. > > > > If you don't want to write this in Java, you can use JavaScript by using > > the > > built-in ScriptTransformer, for an example look at > > > http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9 > > > > However, I'm beginning to realize that XSLT is a common need, let me see > > how > > best we can accomodate it in DataImportHandler. Which XSLT processor will > > you prefer? > > > > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org > > <[EMAIL PROTECTED]> > > wrote: > > > >> > >> I'm in the same situation as you Daniel. The DataImportHandler is pretty > >> awesome but I'd also prefer it had the power of XSLT. The XPath support > >> in > >> it doesn't suffice for me. And I can't do very basic things like > >> concatenate one value with another, say a constant even. It's too bad > >> there > >> isn't a mode that XSLT can be put in to to not build the whole file into > >> memory to do the transform. I've been looking into this and have turned > >> up > >> nothing. It would be neat if there was a STaX to multi-document adapter, > >> at > >> which point XSLT could be applied to the smaller fixed-size documents > >> instead of the entire data stream. I haven't found anything like this so > >> it'd need to be built. For now my documents aren't too big to XSLT > >> in-memory. > >> > >> ~ David > >> > >> > >> Daniel Papasian wrote: > >> > > >> > Shalin Shekhar Mangar wrote: > >> >> Hi Daniel, > >> >> > >> >> Maybe if you can give us a sample of how your XML looks like, we can > >> >> suggest > >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the > >> >> use-cases > >> >> we have yet encountered are solvable using the XPathEntityProcessor in > >> >> DataImportHandler without using XSLT, for details look at > >> >> > >> > http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476 > >> > > >> > I think even if it is possible to use SOLR-469 for my needs, I'd still > >> > prefer the XSLT approach, because it's going to be a bit of > >> > configuration either way, and I'd rather it be an XSLT stylesheet than > >> > solrconfig.xml. In addition, I haven't yet decided whether I want to > >> > apply any patches to the version that we will deploy, but if I do go > >> > down the route of the XSLT transform patch, if I end up having to back > >> > it out the amount of work that it would be for me to do the transform > >> at > >> > the XML source would be negligible, where it would be quite a bit of > >> > work ahead of me to go from using the DataImportHandler to not using it > >> > at all. > >> > > >> > Because both the solr instance and the XML source are in house, I have > >> > the ability to apply the XSLT at the source instead of at solr. > >> > However, there are different teams of people that control the XML > >> source > >> > and solr, so it would require a bit more office coordination to do it > >> on > >> > the backend. > >> > > >> > The data is a filemaker XML export (DTD fmresultset) and it looks > >> > roughly like this: > >> > <fmresultset> > >> > <resultset> > >> > <field name="ID"><data>125</data></field> > >> > <field name="organization"><data>Ford Foundation</data></field> > >> > ... > >> > <relatedset table="Employees"> > >> > <record> > >> > <field name="ID"><data>Y5-A</data></field> > >> > <field name="Name"><data>John Smith</data></field> > >> > </record> > >> > <record> > >> > <field name="ID"><data>Y5-B</data></field> > >> > <field name="Name"><data>Jane Doe</data></field> > >> > </record> > >> > </relatedset> > >> > </fmresultset> > >> > > >> > I'm taking the product of the resultset and the relatedset, using both > >> > IDs concatenated as a unique identifier, like so: > >> > > >> > <doc> > >> > <field name="ID">125Y5-A</field> > >> > <field name="organization">Ford Foundation</field> > >> > <field name="Name">John Smith</field> > >> > </doc> > >> > <doc> > >> > <field name="ID">125Y5-B</field> > >> > <field name="organization">Ford Foundation</field> > >> > <field name="Name">Jane Doe</field> > >> > </doc> > >> > > >> > I can do the transform pretty simply with XSLT. I suppose it is > >> > possible to get the DataImportHandler to do this, but I'm not yet > >> > convinced that it's easier. > >> > > >> > Daniel > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html > >> Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > > > > > > -- > > Regards, > > Shalin Shekhar Mangar. > > > > > > -- > View this message in context: > http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > >