My uniqeKey in scema.xml is id. I've tried adding pk="id" to the store entity but it makes no difference.
The result is the same if I set rootEntity="false" on the store entity. However I added debug and verbose output to the dataimporthandler and I noticed a slight change in how the nested queries are executed. Below is with rootEntity="true": <response> <lst name="responseHeader">...</lst> <lst name="initArgs">...</lst> <str name="command">full-import</str> <str name="mode">debug</str> <arr name="documents"/> <lst name="verbose-output"> <lst name="entity:store"> <lst name="document#1"> <str name="query">../../../data/StoresTest.xml</str> <str name="time-taken">0:0:0.1</str> <str>----------- row #1-------------</str> <str name="id">0102</str> <str name="$forEach">/Stores/Store</str> <str>---------------------------------------------</str> <lst name="entity:storearticle"> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="time-taken">0:0:0.1</str> <str name="time-taken">0:0:0.1</str> <str>----------- row #1-------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> <lst name="transformer:LogTransformer"> <str>---------------------------------------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> </lst> </lst> </lst> <lst name="document#2"> <str>----------- row #1-------------</str> <str name="id">0104</str> <str name="$forEach">/Stores/Store</str> <str>---------------------------------------------</str> <lst name="entity:storearticle"> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="time-taken">0:0:0.0</str> <str name="time-taken">0:0:0.0</str> <str name="time-taken">0:0:0.0</str> <str name="time-taken">0:0:0.0</str> <str>----------- row #1-------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> <lst name="transformer:LogTransformer"> <str>---------------------------------------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> </lst> </lst> </lst> <lst name="document#3"/> </lst> </lst> <str name="status">idle</str> <str name="importResponse">Configuration Re-loaded sucessfully</str> <lst name="statusMessages">...</lst> <str name="WARNING">...</str> </response> And with rootEntity="false": <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">40</int> </lst> <lst name="initArgs"> <lst name="defaults"> <str name="config">import-test-articles-config.xml</str> </lst> </lst> <str name="command">full-import</str> <str name="mode">debug</str> <arr name="documents"/> <lst name="verbose-output"> <lst name="entity:store"> <str name="query">../../../data/StoresTest.xml</str> <str name="query">../../../data/StoresTest.xml</str> <str name="time-taken">0:0:0.10</str> <str name="time-taken">0:0:0.10</str> <str>----------- row #1-------------</str> <str name="id">0102</str> <str name="$forEach">/Stores/Store</str> <str>---------------------------------------------</str> <lst name="entity:storearticle"> <lst name="document#1"> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="time-taken">0:0:0.0</str> <str>----------- row #1-------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> <lst name="transformer:LogTransformer"> <str>---------------------------------------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> </lst> </lst> <lst name="document#2"/> </lst> <str>----------- row #2-------------</str> <str name="id">0104</str> <str name="$forEach">/Stores/Store</str> <str>---------------------------------------------</str> <lst name="entity:storearticle"> <lst name="document#2"> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="query">../../../data/StoreArticlesTest.xml</str> <str name="time-taken">0:0:0.0</str> <str name="time-taken">0:0:0.0</str> <str>----------- row #1-------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> <lst name="transformer:LogTransformer"> <str>---------------------------------------------</str> <arr name="store_articles_txt"> <str>18004</str> </arr> <str name="$forEach">/StoreArticles</str> <str>---------------------------------------------</str> </lst> </lst> <lst name="document#3"/> </lst> </lst> </lst> <str name="status">idle</str> <str name="importResponse">Configuration Re-loaded sucessfully</str> <lst name="statusMessages">...</lst> <str name="WARNING">...</str> </response> I'm not very familiar with the verbose output but it seems like with rootEntity="true", one query is made to retrieve the stores and then two, and four queries are made to the nested store-article. With rootEntity="false", two queries are made to retrieve the stores and then one, and two queries are made to the nested store-article. It seems odd that both these cases produces multiple queries for the second store, but maybe that's expected? Anyway, althought the queries differs, the result is the same. /Tobias 2012/7/22 Ahmet Arslan <iori...@yahoo.com> > > I'm trying to index a set of stores and their articles. I > > have two > > XML-files, one that contains the data of the stores and one > > that contains > > articles for each store. I'm using DIH with > > XPathEntityProcessor to process > > the file containing the store, and using a nested entity I > > try to get all > > articles that belongs to the specific store. The problem I > > encounter is > > that every store gets the same articles. > > > > For testing purposes I've stripped down the xml-files to > > only include id:s > > for testing purposes. The store file (StoresTest.xml) looks > > like this: > > > > <?xml version="1.0" encoding="utf-8"?> > > <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores> > > > > The Store-Articles relations file (StoreArticlesTest.xml) > > looks like this: > > <?xml version="1.0" > > encoding="utf-8"?><StoreArticles><Store > > StoreId="0102"><ArticleId>18004</ArticleId></Store><Store > > > StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles> > > > > And my dih-config file looks like this: > > > > <dataConfig> > > <dataSource > > type="FileDataSource" encoding="UTF-8" /> > > <document> > > <entity name="store" > > processor="XPathEntityProcessor" > > stream="true" > > forEach="/Stores/Store" > > url="../../../data/StoresTest.xml" > > transformer="TemplateTransformer" > > > > > <field column="id" xpath="/Stores/Store/Id" /> > > <entity name="storearticle" > > processor="XPathEntityProcessor" > > stream="true" > > forEach="/StoreArticles" > > url="../../../data/StoreArticlesTest.xml" > > transformer="LogTransformer" > > logTemplate="Processing ${store.id}" logLevel="info" > > rootEntity="true"> > > <field column="store_articles_txt" > > xpath="/StoreArticles/Store[@StoreId='${ > > store.id}']/ArticleId" /> > > </entity> > > </entity> > > </document> > > </dataConfig> > > > > The result I get in Solr is this: > > > > <response> > > <lst name="responseHeader">...</lst> > > <result name="response" numFound="2" start="0"> > > <doc> > > <str name="id">0102</str> > > <arr name="store_articles_txt"> > > <str>18004</str> > > </arr> > > </doc> > > <doc> > > <str name="id">0104</str> > > <arr name="store_articles_txt"> > > <str>18004</str> > > </arr> > > </doc> > > </result> > > </response> > > > > As you see, both stores gets the article for the first > > store. I would have > > expected the second store to have two articles: 17004 and > > 10004. > > > > In the log messages printed using LogTransformer I see that > > each > > store.idis processed but somehow it only picks up the > > articles for the > > first store. > > > > Any ideas? > > What happens when you set <entity name="store" rootEntity="false" ? > What is your uniqueKey in schema.xml? >