Hi All I am facing a problem with XPathEntityProcessor .
Objective: When I index Resource XML file using DIH XPathEntityProcessor then there should be 2 solr documents 01) Link where id is 1000 with 2 tags ABC and DEF 02) Link where id is 2000 with 3 tags GHI, JKL and MNO Solr Version: 4.10.2 Problem: I am not able to index <TAG/> data properly. Expected Output: { "id": "1000", "field_name": "val1", "ABC": "ABC_VALUE", "DEF": "DEF_VALUE" }, { "id": "2000", "field_name": "val2", "GHI": "GHI_VALUE", "JKL": "JKL_VALUE", "MNO": "MNO_VALUE" } ======================================================================================================== Resource XML: <RESOURCE> <LINK ID="1000"> <FIELD>val1</FIELD> <TAG> <TAG_CODE>ABC</TAG_CODE> <TAG_VALUE>ABC_VALUE</TAG_VALUE> </TAG> <TAG> <TAG_CODE>DEF</TAG_CODE> <TAG_VALUE>DEF_VALUE</TAG_VALUE> </TAG> </LINK> <LINK ID="2000"> <FIELD>val2</FIELD> <TAG> <TAG_CODE>GHI</TAG_CODE> <TAG_VALUE>GHI_VALUE</TAG_VALUE> </TAG> <TAG> <TAG_CODE>JKL</TAG_CODE> <TAG_VALUE>JKL_VALUE</TAG_VALUE> </TAG> <TAG> <TAG_CODE>MNO</TAG_CODE> <TAG_VALUE>MNO_VALUE</TAG_VALUE> </TAG> </LINK> </RESOURCE> ======================================================================================================== DataConfig XML (TRY 1): <dataConfig> <script><![CDATA[ function f1(row) { var code = row.get("TAG_CODE"); var val = row.get("TAG_VALUE"); row.put(code, val); row.remove("TAG_CODE"); row.remove("TAG_VALUE"); return row; } ]]></script> <dataSource type="URLDataSource" /> <document> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK"> <field column="id" xpath="/RESOURCE/LINK/@ID" /> <field column="field_name" xpath="/RESOURCE/LINK/FIELD" /> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK/TAG" transformer="script:f1"> <field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" /> <field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" /> </entity> </entity> </document> </dataConfig> Output: { "id": "1000", "field_name": "val1", "ABC": "ABC_VALUE", "DEF": "DEF_VALUE", "GHI": "GHI_VALUE", "JKL": "JKL_VALUE", "MNO": "MNO_VALUE" }, { "id": "2000", "field_name": "val2", "ABC": "ABC_VALUE", "DEF": "DEF_VALUE", "GHI": "GHI_VALUE", "JKL": "JKL_VALUE", "MNO": "MNO_VALUE" } ======================================================================================================== DataConfig XML (TRY 2): <dataConfig> <script><![CDATA[ function f1(row) { var code = row.get("TAG_CODE"); var val = row.get("TAG_VALUE"); row.put(code, val); row.remove("TAG_CODE"); row.remove("TAG_VALUE"); return row; } ]]></script> <dataSource type="URLDataSource" /> <document> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK"> <field column="id" xpath="/RESOURCE/LINK/@ID" /> <field column="field_name" xpath="/RESOURCE/LINK/FIELD" /> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1"> <field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" /> <field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" /> </entity> </entity> </document> </dataConfig> Output: { "id": "1000", "field_name": "val1" }, { "id": "2000", "field_name": "val2" } ======================================================================================================== DataConfig XML (TRY 3): <dataConfig> <script><![CDATA[ function f1(row) { var code = row.get("TAG_CODE"); var val = row.get("TAG_VALUE"); row.put(code, val); row.remove("TAG_CODE"); row.remove("TAG_VALUE"); return row; } ]]></script> <dataSource type="URLDataSource" /> <document> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK"> <field column="id" xpath="/RESOURCE/LINK/@ID" /> <field column="field_name" xpath="/RESOURCE/LINK/FIELD" /> <entity name="testdata" url="http://host:port/uri" processor="XPathEntityProcessor" forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1"> <field column="TAG_CODE" xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE" /> <field column="TAG_VALUE" xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE" /> </entity> </entity> </document> </dataConfig> Output: { "id": "1000", "field_name": "val1" }, { "id": "2000", "field_name": "val2" } -- Thanx & Regards Umang Agrawal [image: Inline image 1]