Oh and I"m sure that I'm using Java 6 because the properties from the Solr webpage spit out:
java.runtime.version = 1.6.0_26-b03-384-10M3425 On Tue, Sep 13, 2011 at 4:15 PM, Pulkit Singhal <pulkitsing...@gmail.com>wrote: > This solution doesn't seem to be working for me. > > I am using Solr trunk and I have the same question as Bernd with a small > twist: the field that should NOT be empty, happens to be a derived field > called price, see the config below: > > <entity ... > transformer="RegexTransformer,HTMLStripTransformer,DateFormatTransformer, > script:skipRow"> > > <field column="description" > xpath="/rss/channel/item/description" > /> > > <field column="price" > regex=".*\$(\d*.\d*)" > sourceColName="description" > /> > ... > </entity> > > I have also changed the sample script to check the price field isntead of > the link field that was being used as an example in this thread earlier: > > > <script> > <![CDATA[ > function skipRow(row) { > var price = row.get( 'price' ); > if ( price == null || price == '' ) { > > row.put( '$skipRow', 'true' ); > } > return row; > } > ]]> > </script> > > Does anyone have any thoughts on what I'm missing? > Thanks! > - Pulkit > > > On Mon, Jan 10, 2011 at 3:06 AM, Bernd Fehling < > bernd.fehl...@uni-bielefeld.de> wrote: > >> Hi Gora, >> >> thanks a lot, very nice solution, works perfectly. >> I will dig more into ScriptTransformer, seems to be very powerful. >> >> Regards, >> Bernd >> >> Am 08.01.2011 14:38, schrieb Gora Mohanty: >> > On Fri, Jan 7, 2011 at 12:30 PM, Bernd Fehling >> > <bernd.fehl...@uni-bielefeld.de> wrote: >> >> Hello list, >> >> >> >> is it possible to load only selected documents with >> XPathEntityProcessor? >> >> While loading docs I want to drop/skip/ignore documents with missing >> URL. >> >> >> >> Example: >> >> <documents> >> >> <document> >> >> <title>first title</title> >> >> <id>identifier_01</id> >> >> <link>http://www.foo.com/path/bar.html</link> >> >> </document> >> >> <document> >> >> <title>second title</title> >> >> <id>identifier_02</id> >> >> <link></link> >> >> </document> >> >> </documents> >> >> >> >> The first document should be loaded, the second document should be >> ignored >> >> because it has an empty link (should also work for missing link field). >> > [...] >> > >> > You can use a ScriptTransformer, along with $skipRow/$skipDoc. >> > E.g., something like this for your data import configuration file: >> > >> > <dataConfig> >> > <script><![CDATA[ >> > function skipRow(row) { >> > var link = row.get( 'link' ); >> > if( link == null || link == '' ) { >> > row.put( '$skipRow', 'true' ); >> > } >> > return row; >> > } >> > ]]></script> >> > <dataSource type="FileDataSource" /> >> > <document> >> > <entity name="f" processor="FileListEntityProcessor" >> > baseDir="/home/gora/test" fileName=".*xml" newerThan="'NOW-3DAYS'" >> > recursive="true" rootEntity="false" dataSource="null"> >> > <entity name="top" processor="XPathEntityProcessor" >> > forEach="/documents/document" url="${f.fileAbsolutePath}" >> > transformer="script:skipRow"> >> > <field column="link" xpath="/documents/document/link"/> >> > <field column="title" xpath="/documents/document/title"/> >> > <field column="id" xpath="/documents/document/id"/> >> > </entity> >> > </entity> >> > </document> >> > </dataConfig> >> > >> > Regards, >> > Gora >> > >