Hi All

I am facing a problem with XPathEntityProcessor .

Objective:
When I index Resource XML file using DIH XPathEntityProcessor then there
should be 2 solr documents
01) Link where id is 1000 with 2 tags ABC and DEF
02) Link where id is 2000 with 3 tags GHI, JKL and MNO

Solr Version: 4.10.2

Problem:
I am not able to index <TAG/> data properly.

Expected Output:
{
"id": "1000",
"field_name": "val1",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE"
},
{
"id": "2000",
"field_name": "val2",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
}
========================================================================================================

Resource XML:

<RESOURCE>
<LINK ID="1000">
<FIELD>val1</FIELD>
<TAG>
<TAG_CODE>ABC</TAG_CODE>
<TAG_VALUE>ABC_VALUE</TAG_VALUE>
</TAG>
<TAG>
<TAG_CODE>DEF</TAG_CODE>
<TAG_VALUE>DEF_VALUE</TAG_VALUE>
</TAG>
</LINK>
<LINK ID="2000">
<FIELD>val2</FIELD>
<TAG>
<TAG_CODE>GHI</TAG_CODE>
<TAG_VALUE>GHI_VALUE</TAG_VALUE>
</TAG>
<TAG>
<TAG_CODE>JKL</TAG_CODE>
<TAG_VALUE>JKL_VALUE</TAG_VALUE>
</TAG>
<TAG>
<TAG_CODE>MNO</TAG_CODE>
<TAG_VALUE>MNO_VALUE</TAG_VALUE>
</TAG>
</LINK>
</RESOURCE>

========================================================================================================

DataConfig XML (TRY 1):
<dataConfig>
<script><![CDATA[
function f1(row) {
var code = row.get("TAG_CODE");
var val = row.get("TAG_VALUE");
row.put(code, val);
row.remove("TAG_CODE");
row.remove("TAG_VALUE");
return row;
}
    ]]></script>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
<field column="id" xpath="/RESOURCE/LINK/@ID" />
            <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
<entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
<field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
<field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
</entity>
        </entity>
    </document>
</dataConfig>

Output:
{
"id": "1000",
"field_name": "val1",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
},
{
"id": "2000",
"field_name": "val2",
"ABC": "ABC_VALUE",
"DEF": "DEF_VALUE",
"GHI": "GHI_VALUE",
"JKL": "JKL_VALUE",
"MNO": "MNO_VALUE"
}

========================================================================================================

DataConfig XML (TRY 2):
<dataConfig>
<script><![CDATA[
function f1(row) {
var code = row.get("TAG_CODE");
var val = row.get("TAG_VALUE");
row.put(code, val);
row.remove("TAG_CODE");
row.remove("TAG_VALUE");
return row;
}
    ]]></script>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
<field column="id" xpath="/RESOURCE/LINK/@ID" />
            <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
<entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
<field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
<field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
</entity>
        </entity>
    </document>
</dataConfig>

Output:
{
"id": "1000",
"field_name": "val1"
},
{
"id": "2000",
"field_name": "val2"
}

========================================================================================================

DataConfig XML (TRY 3):
<dataConfig>
<script><![CDATA[
function f1(row) {
var code = row.get("TAG_CODE");
var val = row.get("TAG_VALUE");
row.put(code, val);
row.remove("TAG_CODE");
row.remove("TAG_VALUE");
return row;
}
    ]]></script>
    <dataSource type="URLDataSource" />
    <document>
        <entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
<field column="id" xpath="/RESOURCE/LINK/@ID" />
            <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
<entity name="testdata" url="http://host:port/uri";
                processor="XPathEntityProcessor"
forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
<field column="TAG_CODE"
xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE"
/>
<field column="TAG_VALUE"
xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE"
/>
</entity>
        </entity>
    </document>
</dataConfig>

Output:
{
"id": "1000",
"field_name": "val1"
},
{
"id": "2000",
"field_name": "val2"
}


-- 
Thanx & Regards
Umang Agrawal


[image: Inline image 1]

Reply via email to