Hello, 1) The documented explanation of skipDoc and skipRow is not enough for me to discern the difference between them: $skipDoc : Skip the current document . Do not add it to Solr. The value can be String true/false $skipRow : Skip the current row. The document will be added with rows from other entities. The value can be String true/false Can someone please elaborate and help me out with an example?
2) I am working off the Solr trunk (4.x) and nothing I do seems to make the import for a given row/doc get skipped. As proof I've added these tests to my data import xml and all the rows are still getting indexed!!! If anyone sees something wrong with my config please tell me. Make sure to take note of the blatant use of row.put( '$skipDoc', 'true' ); and <field column="$skipDoc" template="true"/> Yet stuff still gets imported, this is beyond me. Need a fresh pair of eyes :) <dataConfig> <dataSource type="URLDataSource" /> <script> <![CDATA[ function skipRow(row) { row.put( '$skipDoc', 'true' ); return row; } ]]> </script> <document> <entity name="amazon" pk="link" url="http://www.amazon.com/gp/rss/new-releases/apparel/1040660/ref=zg_bsnr_1040660_rsslink" processor="XPathEntityProcessor" forEach="/rss/channel | /rss/channel/item" transformer="RegexTransformer,HTMLStripTransformer,DateFormatTransformer,script:skipRow,TemplateTransformer"> <field column="description" xpath="/rss/channel/item/description" /> <field column="price" regex=".*\$(\d*.\d*)" sourceColName="description" /> <field column="$skipDoc" template="true"/> <field column="link" xpath="/rss/channel/item/link" /> </entity> </document> </dataConfig> Thanks! - Pulkit