Hello,

1)  The documented explanation of skipDoc and skipRow is not enough
for me to discern the difference between them:
$skipDoc : Skip the current document . Do not add it to Solr. The
value can be String true/false
$skipRow : Skip the current row. The document will be added with rows
from other entities. The value can be String true/false
Can someone please elaborate and help me out with an example?

2) I am working off the Solr trunk (4.x) and nothing I do seems to
make the import for a given row/doc get skipped.
As proof I've added these tests to my data import xml and all the rows
are still getting indexed!!!
If anyone sees something wrong with my config please tell me.
Make sure to take note of the blatant use of row.put( '$skipDoc',
'true' ); and <field column="$skipDoc" template="true"/>
Yet stuff still gets imported, this is beyond me. Need a fresh pair of eyes :)

<dataConfig>
    <dataSource type="URLDataSource" />
    <script>
        <![CDATA[
        function skipRow(row) {
            row.put( '$skipDoc', 'true' );
            return row;
        }
        ]]>
    </script>
    <document>
        <entity name="amazon"
                pk="link"

url="http://www.amazon.com/gp/rss/new-releases/apparel/1040660/ref=zg_bsnr_1040660_rsslink";
                processor="XPathEntityProcessor"
                forEach="/rss/channel | /rss/channel/item"

transformer="RegexTransformer,HTMLStripTransformer,DateFormatTransformer,script:skipRow,TemplateTransformer">
            <field column="description"
                   xpath="/rss/channel/item/description"
                   />
            <field column="price"
                   regex=".*\$(\d*.\d*)"
                   sourceColName="description"
                   />
            <field column="$skipDoc" template="true"/>
            <field column="link" xpath="/rss/channel/item/link" />
        </entity>
    </document>
</dataConfig>


Thanks!
- Pulkit

Reply via email to