Both version seem to be painful in that they will retrieve the URL content
multiple times. The first version is definitely wrong. The second version
is probably wrong because both inner and outer entities are having the same
name. I would try giving different name to the inner entity and seeing if
the issue resolves itself.

But, realistically, I would probably pre-process that document with XLST
instead to flatten the structure. Solr apparently (I did not test) supports
that both in DIH and in the update handler:
https://wiki.apache.org/solr/XsltUpdateRequestHandler . You could XSLT your
schema directly into Solr XML Update document and not even need DIH.

Regards,
   Alex.


----
Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/

On 8 September 2015 at 09:04, Umang Agrawal <umang.i...@gmail.com> wrote:

> Hi All
>
> I am facing a problem with XPathEntityProcessor .
>
> Objective:
> When I index Resource XML file using DIH XPathEntityProcessor then there
> should be 2 solr documents
> 01) Link where id is 1000 with 2 tags ABC and DEF
> 02) Link where id is 2000 with 3 tags GHI, JKL and MNO
>
> Solr Version: 4.10.2
>
> Problem:
> I am not able to index <TAG/> data properly.
>
> Expected Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
> ========================================================================================================
>
> Resource XML:
>
> <RESOURCE>
> <LINK ID="1000">
> <FIELD>val1</FIELD>
> <TAG>
> <TAG_CODE>ABC</TAG_CODE>
> <TAG_VALUE>ABC_VALUE</TAG_VALUE>
> </TAG>
> <TAG>
> <TAG_CODE>DEF</TAG_CODE>
> <TAG_VALUE>DEF_VALUE</TAG_VALUE>
> </TAG>
> </LINK>
> <LINK ID="2000">
> <FIELD>val2</FIELD>
> <TAG>
> <TAG_CODE>GHI</TAG_CODE>
> <TAG_VALUE>GHI_VALUE</TAG_VALUE>
> </TAG>
> <TAG>
> <TAG_CODE>JKL</TAG_CODE>
> <TAG_VALUE>JKL_VALUE</TAG_VALUE>
> </TAG>
> <TAG>
> <TAG_CODE>MNO</TAG_CODE>
> <TAG_VALUE>MNO_VALUE</TAG_VALUE>
> </TAG>
> </LINK>
> </RESOURCE>
>
>
> ========================================================================================================
>
> DataConfig XML (TRY 1):
> <dataConfig>
> <script><![CDATA[
> function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> <field column="id" xpath="/RESOURCE/LINK/@ID" />
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK/TAG" transformer="script:f1">
> <field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
> <field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
> </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> Output:
> {
> "id": "1000",
> "field_name": "val1",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> },
> {
> "id": "2000",
> "field_name": "val2",
> "ABC": "ABC_VALUE",
> "DEF": "DEF_VALUE",
> "GHI": "GHI_VALUE",
> "JKL": "JKL_VALUE",
> "MNO": "MNO_VALUE"
> }
>
>
> ========================================================================================================
>
> DataConfig XML (TRY 2):
> <dataConfig>
> <script><![CDATA[
> function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> <field column="id" xpath="/RESOURCE/LINK/@ID" />
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
> <field column="TAG_CODE" xpath="/RESOURCE/LINK/TAG/TAG_CODE" />
> <field column="TAG_VALUE" xpath="/RESOURCE/LINK/TAG/TAG_VALUE" />
> </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> ========================================================================================================
>
> DataConfig XML (TRY 3):
> <dataConfig>
> <script><![CDATA[
> function f1(row) {
> var code = row.get("TAG_CODE");
> var val = row.get("TAG_VALUE");
> row.put(code, val);
> row.remove("TAG_CODE");
> row.remove("TAG_VALUE");
> return row;
> }
>     ]]></script>
>     <dataSource type="URLDataSource" />
>     <document>
>         <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor" forEach="/RESOURCE/LINK">
> <field column="id" xpath="/RESOURCE/LINK/@ID" />
>             <field column="field_name" xpath="/RESOURCE/LINK/FIELD" />
> <entity name="testdata" url="http://host:port/uri";
>                 processor="XPathEntityProcessor"
> forEach="/RESOURCE/LINK[@ID=${testdata.id}]/TAG" transformer="script:f1">
> <field column="TAG_CODE" 
> xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_CODE"
> />
> <field column="TAG_VALUE" 
> xpath="/RESOURCE/LINK[@ID=${testdata.id}]/TAG/TAG_VALUE"
> />
> </entity>
>         </entity>
>     </document>
> </dataConfig>
>
> Output:
> {
> "id": "1000",
> "field_name": "val1"
> },
> {
> "id": "2000",
> "field_name": "val2"
> }
>
>
> --
> Thanx & Regards
> Umang Agrawal
>
>
> [image: Inline image 1]
>

Reply via email to