hi ,
There is this new patch which implements these features. I shall
update the wiki with the documentation
I guess we do not need to be too worried about the memory consumption.
A few MB of memory should be fine (unless your are using a file which
is in 10's of MB ). Consider using XPathEntityProcessor (if possible )
it uses Stax and it is pretty efficient.
thanks for your support
--Noble
A few MB of memory for an xml must be fine. The XPathEnt
On Mon, Apr 21, 2008 at 5:57 PM, David Smiley @MITRE.org
<[EMAIL PROTECTED]> wrote:
>
> Cool. So you're saying that this xslt file will operate on the entire XML
> document that was fetched from the URL and just pass it on to solr? Thanks
> for supporting this. The XML files I have coming from the my data source
> are big but not not too big to risk an out-of-memory error. And I've found
> xslt to perform fast for me. I like your proposed TemplateTransformer
> too... I'm tempted to use that in place of XSLT. Great job Paul.
>
> It'd be neat to have an XSLT transformer for your framework that operates on
> a single entity (that addresses the memory usage problem). I know your
> entities are HashMap based instead of XML, however.
>
> ~ David
>
>
>
>
> Noble Paul നോബിള് नोब्ळ् wrote:
> >
> > We are planning to incorporate both your requests in the next patch.
> > The implementation is going to be as follows.mention the xsl file
> > location as follows
> > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl">
> > ....
> > </entity>
> > So the processing will be done after the XSL transformation. If after
> > your XSL transformation it produces a valid 'add' document not even
> > fields is necessary. Otherwise you will need to write all the fields
> > and their xpaths like any other xml
> >
> > <entity processor="XPathEntitityprocessor xslt="file:/c:/my-own.xsl"
> > useSolrAddXml="true"/>
> >
> > So it will assume that the schema is same as that of the add xml and
> > does the needful.
> >
> > Another feature is going to be a TemplateTransformer which takes in a
> > Template as follows
> >
> > <entity name="e" transformer="TemplateTransformer" ....>
> > <field column="field1_2" template="${e.field1} ${e.field2}/>
> > </entity>
> >
> > Please let us know what u think about this.
> >
> > And keep giving us these great use-cases so that we can make the tool
> > better.
> > --Noble
> >
> >
> >
> > On Mon, Apr 21, 2008 at 12:07 AM, David Smiley @MITRE.org
> > <[EMAIL PROTECTED]> wrote:
> >>
> >> Thanks Shalin.
> >>
> >> The particular XSLT processor used is not relevant; it's a spec. Just
> >> use
> >> the standard Java APIs. If I want a particular processor, then I can
> >> get
> >> that to happen by using a system property and/or you could offer a
> >> configuration input for the standard factory class implementation for a
> >> processor of my choice.
> >>
> >> ~ David
> >>
> >>
> >>
> >>
> >> Shalin Shekhar Mangar wrote:
> >> >
> >> > Hi David,
> >> > Actually you can concatenate values, however you'll have to write a
> >> bit of
> >> > code. You can write this in javascript (if you're using Java 6) or in
> >> > Java.
> >> >
> >> > Basically, you need to write a Transformer to do it. Look at
> >> >
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
> >> >
> >> > For example, lets say you get fields first-name and last-name in the
> >> XML.
> >> > But in the schema.xml you have a field called "name" in which you need
> >> to
> >> > concatenate the values of first-name and last-name (with a space in
> >> > between). Create a Java class:
> >> >
> >> > public class ConcatenateTransformer { public Object
> >> > transformRow(Map<String,
> >> > Object> row) { String firstName = row.get("first-name"); String
> >> lastName =
> >> > row.get("last-name"); row.put("name", firstName + " " + lastName);
> >> return
> >> > row; } }
> >> >
> >> > Add this class to solr's classpath by putting its jar in
> >> solr/WEB-INF/lib
> >> >
> >> > The data-config.xml should like this:
> >> > <entity name="myEntity" processor="XPathEntityProcessor" url="
> >> > http://myurl/example.xml"
> >> > transformer="com.yourpackage.ConcatenateTransformer"> <field
> >> > column="first-name" xpath="/record/first-name" /> <field
> >> > column="last-name"
> >> > xpath="/record/last-name" /> <field column="name" /> </entity>
> >> >
> >> > This will call ConcatenateTransformer.transformRow method for each row
> >> and
> >> > you can concatenate any field with any field (or constant). Note that
> >> solr
> >> > document will keep only those fields which are in the schema.xml, the
> >> rest
> >> > are thrown away.
> >> >
> >> > If you don't want to write this in Java, you can use JavaScript by
> >> using
> >> > the
> >> > built-in ScriptTransformer, for an example look at
> >> >
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> >> >
> >> > However, I'm beginning to realize that XSLT is a common need, let me
> >> see
> >> > how
> >> > best we can accomodate it in DataImportHandler. Which XSLT processor
> >> will
> >> > you prefer?
> >> >
> >> > On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
> >> > <[EMAIL PROTECTED]>
> >> > wrote:
> >> >
> >> >>
> >> >> I'm in the same situation as you Daniel. The DataImportHandler is
> >> pretty
> >> >> awesome but I'd also prefer it had the power of XSLT. The XPath
> >> support
> >> >> in
> >> >> it doesn't suffice for me. And I can't do very basic things like
> >> >> concatenate one value with another, say a constant even. It's too
> >> bad
> >> >> there
> >> >> isn't a mode that XSLT can be put in to to not build the whole file
> >> into
> >> >> memory to do the transform. I've been looking into this and have
> >> turned
> >> >> up
> >> >> nothing. It would be neat if there was a STaX to multi-document
> >> adapter,
> >> >> at
> >> >> which point XSLT could be applied to the smaller fixed-size documents
> >> >> instead of the entire data stream. I haven't found anything like
> >> this so
> >> >> it'd need to be built. For now my documents aren't too big to XSLT
> >> >> in-memory.
> >> >>
> >> >> ~ David
> >> >>
> >> >>
> >> >> Daniel Papasian wrote:
> >> >> >
> >> >> > Shalin Shekhar Mangar wrote:
> >> >> >> Hi Daniel,
> >> >> >>
> >> >> >> Maybe if you can give us a sample of how your XML looks like, we
> >> can
> >> >> >> suggest
> >> >> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
> >> >> >> use-cases
> >> >> >> we have yet encountered are solvable using the
> >> XPathEntityProcessor in
> >> >> >> DataImportHandler without using XSLT, for details look at
> >> >> >>
> >> >>
> >>
> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
> >> >> >
> >> >> > I think even if it is possible to use SOLR-469 for my needs, I'd
> >> still
> >> >> > prefer the XSLT approach, because it's going to be a bit of
> >> >> > configuration either way, and I'd rather it be an XSLT stylesheet
> >> than
> >> >> > solrconfig.xml. In addition, I haven't yet decided whether I want
> >> to
> >> >> > apply any patches to the version that we will deploy, but if I do
> >> go
> >> >> > down the route of the XSLT transform patch, if I end up having to
> >> back
> >> >> > it out the amount of work that it would be for me to do the
> >> transform
> >> >> at
> >> >> > the XML source would be negligible, where it would be quite a bit
> >> of
> >> >> > work ahead of me to go from using the DataImportHandler to not
> >> using it
> >> >> > at all.
> >> >> >
> >> >> > Because both the solr instance and the XML source are in house, I
> >> have
> >> >> > the ability to apply the XSLT at the source instead of at solr.
> >> >> > However, there are different teams of people that control the XML
> >> >> source
> >> >> > and solr, so it would require a bit more office coordination to do
> >> it
> >> >> on
> >> >> > the backend.
> >> >> >
> >> >> > The data is a filemaker XML export (DTD fmresultset) and it looks
> >> >> > roughly like this:
> >> >> > <fmresultset>
> >> >> > <resultset>
> >> >> > <field name="ID"><data>125</data></field>
> >> >> > <field name="organization"><data>Ford
> >> Foundation</data></field>
> >> >> > ...
> >> >> > <relatedset table="Employees">
> >> >> > <record>
> >> >> > <field name="ID"><data>Y5-A</data></field>
> >> >> > <field name="Name"><data>John Smith</data></field>
> >> >> > </record>
> >> >> > <record>
> >> >> > <field name="ID"><data>Y5-B</data></field>
> >> >> > <field name="Name"><data>Jane Doe</data></field>
> >> >> > </record>
> >> >> > </relatedset>
> >> >> > </fmresultset>
> >> >> >
> >> >> > I'm taking the product of the resultset and the relatedset, using
> >> both
> >> >> > IDs concatenated as a unique identifier, like so:
> >> >> >
> >> >> > <doc>
> >> >> > <field name="ID">125Y5-A</field>
> >> >> > <field name="organization">Ford Foundation</field>
> >> >> > <field name="Name">John Smith</field>
> >> >> > </doc>
> >> >> > <doc>
> >> >> > <field name="ID">125Y5-B</field>
> >> >> > <field name="organization">Ford Foundation</field>
> >> >> > <field name="Name">Jane Doe</field>
> >> >> > </doc>
> >> >> >
> >> >> > I can do the transform pretty simply with XSLT. I suppose it is
> >> >> > possible to get the DataImportHandler to do this, but I'm not yet
> >> >> > convinced that it's easier.
> >> >> >
> >> >> > Daniel
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16764009.html
> >> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> > Shalin Shekhar Mangar.
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16796900.html
> >>
> >>
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/XSLT-transform-before-update--tp16738227p16807488.html
>
>
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
--
--Noble Paul