Re: XSLT transform before update?

2008-04-20 Thread David Smiley @MITRE.org

Thanks Shalin.

The particular XSLT processor used is not relevant; it's a spec.  Just use
the standard Java APIs.  If I want a particular processor, then I can get
that to happen by using a system property and/or you could offer a
configuration input for the standard factory class implementation for a
processor of my choice.

~ David


Shalin Shekhar Mangar wrote:
> 
> Hi David,
> Actually you can concatenate values, however you'll have to write a bit of
> code. You can write this in javascript (if you're using Java 6) or in
> Java.
> 
> Basically, you need to write a Transformer to do it. Look at
> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d7605a990fb03c4ff461b3736496a9
> 
> For example, lets say you get fields first-name and last-name in the XML.
> But in the schema.xml you have a field called "name" in which you need to
> concatenate the values of first-name and last-name (with a space in
> between). Create a Java class:
> 
> public class ConcatenateTransformer { public Object
> transformRow(Map Object> row) { String firstName = row.get("first-name"); String lastName =
> row.get("last-name"); row.put("name", firstName + " " + lastName); return
> row; } }
> 
> Add this class to solr's classpath by putting its jar in solr/WEB-INF/lib
> 
> The data-config.xml should like this:
> http://myurl/example.xml";
> transformer="com.yourpackage.ConcatenateTransformer">  column="first-name" xpath="/record/first-name" />  column="last-name"
> xpath="/record/last-name" />  
> 
> This will call ConcatenateTransformer.transformRow method for each row and
> you can concatenate any field with any field (or constant). Note that solr
> document will keep only those fields which are in the schema.xml, the rest
> are thrown away.
> 
> If you don't want to write this in Java, you can use JavaScript by using
> the
> built-in ScriptTransformer, for an example look at
> http://wiki.apache.org/solr/DataImportHandler#head-27fcc2794bd71f7d727104ffc6b99e194bdb6ff9
> 
> However, I'm beginning to realize that XSLT is a common need, let me see
> how
> best we can accomodate it in DataImportHandler. Which XSLT processor will
> you prefer?
> 
> On Sat, Apr 19, 2008 at 12:13 AM, David Smiley @MITRE.org
> <[EMAIL PROTECTED]>
> wrote:
> 
>>
>> I'm in the same situation as you Daniel.  The DataImportHandler is pretty
>> awesome but I'd also prefer it had the power of XSLT.  The XPath support
>> in
>> it doesn't suffice for me.  And I can't do very basic things like
>> concatenate one value with another, say a constant even.  It's too bad
>> there
>> isn't a mode that XSLT can be put in to to not build the whole file into
>> memory to do the transform.  I've been looking into this and have turned
>> up
>> nothing.  It would be neat if there was a STaX to multi-document adapter,
>> at
>> which point XSLT could be applied to the smaller fixed-size documents
>> instead of the entire data stream.  I haven't found anything like this so
>> it'd need to be built.  For now my documents aren't too big to XSLT
>> in-memory.
>>
>> ~ David
>>
>>
>> Daniel Papasian wrote:
>> >
>> > Shalin Shekhar Mangar wrote:
>> >> Hi Daniel,
>> >>
>> >> Maybe if you can give us a sample of how your XML looks like, we can
>> >> suggest
>> >> how to use SOLR-469 (Data Import Handler) to index it. Most of the
>> >> use-cases
>> >> we have yet encountered are solvable using the XPathEntityProcessor in
>> >> DataImportHandler without using XSLT, for details look at
>> >>
>> http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476
>> >
>> > I think even if it is possible to use SOLR-469 for my needs, I'd still
>> > prefer the XSLT approach, because it's going to be a bit of
>> > configuration either way, and I'd rather it be an XSLT stylesheet than
>> > solrconfig.xml.  In addition, I haven't yet decided whether I want to
>> > apply any patches to the version that we will deploy, but if I do go
>> > down the route of the XSLT transform patch, if I end up having to back
>> > it out the amount of work that it would be for me to do the transform
>> at
>> > the XML source would be negligible, where it would be quite a bit of
>> > work ahead of me to go from using the DataImportHandler to not using it
>> > at all.
>> >
>> > Because both the solr instance and the XML source are in house, I have
>> > the ability to apply the XSLT at the source instead of at solr.
>> > However, there are different teams of people that control the XML
>> source
>> > and solr, so it would require a bit more office coordination to do it
>> on
>> > the backend.
>> >
>> > The data is a filemaker XML export (DTD fmresultset) and it looks
>> > roughly like this:
>> > 
>> >
>> >  125
>> >  Ford Foundation
>> >  ...
>> >  
>> >
>> >  Y5-A
>> >  John Smith
>> >
>> >
>> >  Y5-B
>> >  Jane Doe
>> >
>> >  
>> > 
>> >
>> > I'm taking the product of the resultset and the rel

DataField parsing error using BinaryResponseParser for solrj

2008-04-20 Thread Eason . Lee
Error comes from solr while parsing the datefield
It is ok with XMLResponseParser

Apr 22, 2008 11:02:13 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable
date:
"1995-02-16T00:00:00Z"
at org.apache.solr.schema.DateField.toObject(DateField.java:173)
at org.apache.solr.schema.DateField.toObject(DateField.java:83)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryRe
sponseWriter.java:137)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(Bi
naryResponseWriter.java:115)
at
org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryR
esponseWriter.java:84)
at
org.apache.solr.common.util.NamedListCodec.writeVal(NamedListCodec.ja
va:128)
at
org.apache.solr.common.util.NamedListCodec.writeNamedList(NamedListCo
dec.java:118)
at
org.apache.solr.common.util.NamedListCodec.marshal(NamedListCodec.jav
a:77)
at
org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWrit
er.java:44)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:295)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
icationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
ilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
alve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
alve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
ava:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
ava:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
ve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
a:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
ss(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
7)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.text.ParseException: Unparseable date:
"1995-02-16T00:00:00Z"
at java.text.DateFormat.parse(DateFormat.java:337)
at org.apache.solr.schema.DateField.toObject(DateField.java:170)
... 21 more


Re: DataField parsing error using BinaryResponseParser for solrj

2008-04-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
It is not a problem with the BinaryResponseWriter itself. It is caused
by the bug https://issues.apache.org/jira/browse/SOLR-470
we need to fix it now.
--Noble

On Mon, Apr 21, 2008 at 9:16 AM, Eason. Lee <[EMAIL PROTECTED]> wrote:
> Error comes from solr while parsing the datefield
>  It is ok with XMLResponseParser
>
>  Apr 22, 2008 11:02:13 AM org.apache.solr.common.SolrException log
>  SEVERE: java.lang.RuntimeException: java.text.ParseException: Unparseable
>  date:
>  "1995-02-16T00:00:00Z"
> at org.apache.solr.schema.DateField.toObject(DateField.java:173)
> at org.apache.solr.schema.DateField.toObject(DateField.java:83)
> at
>  org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryRe
>  sponseWriter.java:137)
> at
>  org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(Bi
>  naryResponseWriter.java:115)
> at
>  org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryR
>  esponseWriter.java:84)
> at
>  org.apache.solr.common.util.NamedListCodec.writeVal(NamedListCodec.ja
>  va:128)
> at
>  org.apache.solr.common.util.NamedListCodec.writeNamedList(NamedListCo
>  dec.java:118)
> at
>  org.apache.solr.common.util.NamedListCodec.marshal(NamedListCodec.jav
>  a:77)
> at
>  org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWrit
>  er.java:44)
> at
>  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
>  r.java:295)
> at
>  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
>  icationFilterChain.java:235)
> at
>  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
>  ilterChain.java:206)
> at
>  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
>  alve.java:233)
> at
>  org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
>  alve.java:175)
> at
>  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
>  ava:128)
> at
>  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
>  ava:102)
> at
>  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
>  ve.java:109)
> at
>  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
>  a:286)
> at
>  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
>  :844)
> at
>  org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
>  ss(Http11Protocol.java:583)
> at
>  org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
>  7)
> at java.lang.Thread.run(Thread.java:619)
>  Caused by: java.text.ParseException: Unparseable date:
>  "1995-02-16T00:00:00Z"
> at java.text.DateFormat.parse(DateFormat.java:337)
> at org.apache.solr.schema.DateField.toObject(DateField.java:170)
> ... 21 more
>



-- 
--Noble Paul


Re: DataField parsing error using BinaryResponseParser for solrj

2008-04-20 Thread Eason . Lee
Thanks

2008/4/21, Noble Paul നോബിള്‍ नोब्ळ् <[EMAIL PROTECTED]>:
>
> It is not a problem with the BinaryResponseWriter itself. It is caused
> by the bug https://issues.apache.org/jira/browse/SOLR-470
> we need to fix it now.
> --Noble
>
> On Mon, Apr 21, 2008 at 9:16 AM, Eason. Lee <[EMAIL PROTECTED]> wrote:
> > Error comes from solr while parsing the datefield
> >  It is ok with XMLResponseParser
> >
> >  Apr 22, 2008 11:02:13 AM org.apache.solr.common.SolrException log
> >  SEVERE: java.lang.RuntimeException: java.text.ParseException:
> Unparseable
> >  date:
> >  "1995-02-16T00:00:00Z"
> > at org.apache.solr.schema.DateField.toObject(DateField.java:173)
> > at org.apache.solr.schema.DateField.toObject(DateField.java:83)
> > at
> >  org.apache.solr.request.BinaryResponseWriter$Resolver.getDoc(BinaryRe
> >  sponseWriter.java:137)
> > at
> >  org.apache.solr.request.BinaryResponseWriter$Resolver.writeDocList(Bi
> >  naryResponseWriter.java:115)
> > at
> >  org.apache.solr.request.BinaryResponseWriter$Resolver.resolve(BinaryR
> >  esponseWriter.java:84)
> > at
> >  org.apache.solr.common.util.NamedListCodec.writeVal(NamedListCodec.ja
> >  va:128)
> > at
> >  org.apache.solr.common.util.NamedListCodec.writeNamedList(NamedListCo
> >  dec.java:118)
> > at
> >  org.apache.solr.common.util.NamedListCodec.marshal(NamedListCodec.jav
> >  a:77)
> > at
> >  org.apache.solr.request.BinaryResponseWriter.write(BinaryResponseWrit
> >  er.java:44)
> > at
> >  org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> >  r.java:295)
> > at
> >  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appl
> >  icationFilterChain.java:235)
> > at
> >  org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationF
> >  ilterChain.java:206)
> > at
> >  org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperV
> >  alve.java:233)
> > at
> >  org.apache.catalina.core.StandardContextValve.invoke(StandardContextV
> >  alve.java:175)
> > at
> >  org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.j
> >  ava:128)
> > at
> >  org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.j
> >  ava:102)
> > at
> >  org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineVal
> >  ve.java:109)
> > at
> >  org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.jav
> >  a:286)
> > at
> >  org.apache.coyote.http11.Http11Processor.process(Http11Processor.java
> >  :844)
> > at
> >  org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.proce
> >  ss(Http11Protocol.java:583)
> > at
> >  org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:44
> >  7)
> > at java.lang.Thread.run(Thread.java:619)
> >  Caused by: java.text.ParseException: Unparseable date:
> >  "1995-02-16T00:00:00Z"
> > at java.text.DateFormat.parse(DateFormat.java:337)
> > at org.apache.solr.schema.DateField.toObject(DateField.java:170)
> > ... 21 more
> >
>
>
>
> --
> --Noble Paul
>