Ok, problem found by digging in the source code. If it is a bug or "works
by design" I don't know but the reason is when the translation of the
vaiable ${store.id} is made.

The translation is made in the method initXpathReader() with these lines:

>           String xpath = field.get(XPATH);
>           *xpath = context.replaceTokens(xpath);*
>           xpathReader.addField(field.get(DataImporter.COLUMN),
>                   xpath,
>
> Boolean.parseBoolean(field.get(DataImporter.MULTI_VALUED)),
>                   flags);
>         }


The line  *xpath = context.replaceTokens(xpath); *translates the variable
to it's actual value. initXpathReader() is called in the init() method but
is *only *called once for each entity definition:

      if (xpathReader == null)
>           initXpathReader();


This means that the first time initXpathReader() is called, ${store.id} is
translated to 0102 (the first id of the store). When the next store id is
encountered, the xpathReader is already initialized so initXpathReader() is
not called, thus the xpath expression is not updated with the new store id.

There is a bunch of other things happening in the initXpathReader so I'm
not sure if it's safe to just remove the null-check. But, looking at the
SQLEntityProcessor, the translation of the variables in the query string is
performed in the getRow() method, and not in the init method so I think
that the null-check should either be removed or that the xpath expression
translation should be moved so it is performed each time.

/Tobias

2012/7/22 Tobias Berg <tobias.h...@gmail.com>

> The articleId field is the only field in the correlation file so I just
> need to get that one working.
>
> I tried butting the condition in the forEach secion. If I hardcode a
> value, like 0104, it works but it doesn't work with the variable. Haven't
> looked at the sourcecode yet but maybe forEach doesn't support variables?
> That could be a nice patch :)
>
> I thought about $skipDoc but can't figure out how I want to use it, since
> I want to add the field, it's just that it picks the wrong value. Do you
> have something in mind in how to use it for my use-case?
>
> I'll take a look at the source code to see if it can be a bug.
>
> /Tobias
>
> 2012/7/22 Alexandre Rafalovitch <arafa...@gmail.com>
>
>> I am still struggling with nested DIH myself, but I notice that your
>> correlation condition is on the field level (@StoreId='${store.id}).
>> Were you planning to repeat it for each field definition?
>>
>> Have you tried putting it instead in the forEach section?
>>
>> Alternatively, maybe you need to use $skipDoc as in the Wikipedia
>> import example?
>>
>> Regards,
>>    Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Sat, Jul 21, 2012 at 1:34 PM, Tobias Berg <tobias.h...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > I'm trying to index a set of stores and their articles. I have two
>> > XML-files, one that contains the data of the stores and one that
>> contains
>> > articles for each store. I'm using DIH with XPathEntityProcessor to
>> process
>> > the file containing the store, and using a nested entity I try to get
>> all
>> > articles that belongs to the specific store. The problem I encounter is
>> > that every store gets the same articles.
>> >
>> > For testing purposes I've stripped down the xml-files to only include
>> id:s
>> > for testing purposes. The store file (StoresTest.xml) looks like this:
>> >
>> > <?xml version="1.0" encoding="utf-8"?>
>> >
>> <Stores><Store><Id>0102</Id></Store><Store><Id>0104</Id></Store></Stores>
>> >
>> > The Store-Articles relations file (StoreArticlesTest.xml) looks like
>> this:
>> > <?xml version="1.0" encoding="utf-8"?><StoreArticles><Store
>> > StoreId="0102"><ArticleId>18004</ArticleId></Store><Store
>> >
>> StoreId="0104"><ArticleId>17004</ArticleId><ArticleId>10004</ArticleId></Store></StoreArticles>
>> >
>> > And my dih-config file looks like this:
>> >
>> > <dataConfig>
>> >         <dataSource type="FileDataSource" encoding="UTF-8" />
>> >         <document>
>> >    <entity name="store"
>> > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/Stores/Store"
>> > url="../../../data/StoresTest.xml"
>> > transformer="TemplateTransformer"
>> >>
>> > <field column="id"  xpath="/Stores/Store/Id" />
>> > <entity name="storearticle"
>> > processor="XPathEntityProcessor"
>> > stream="true"
>> > forEach="/StoreArticles"
>> > url="../../../data/StoreArticlesTest.xml"
>> > transformer="LogTransformer"
>> > logTemplate="Processing ${store.id}" logLevel="info"
>> > rootEntity="true">
>> >  <field column="store_articles_txt"
>> xpath="/StoreArticles/Store[@StoreId='${
>> > store.id}']/ArticleId" />
>> > </entity>
>> >    </entity>
>> > </document>
>> > </dataConfig>
>> >
>> > The result I get in Solr is this:
>> >
>> > <response>
>> > <lst name="responseHeader">...</lst>
>> > <result name="response" numFound="2" start="0">
>> > <doc>
>> > <str name="id">0102</str>
>> > <arr name="store_articles_txt">
>> > <str>18004</str>
>> > </arr>
>> > </doc>
>> > <doc>
>> > <str name="id">0104</str>
>> > <arr name="store_articles_txt">
>> > <str>18004</str>
>> > </arr>
>> > </doc>
>> > </result>
>> > </response>
>> >
>> > As you see, both stores gets the article for the first store. I would
>> have
>> > expected the second store to have two articles: 17004 and 10004.
>> >
>> > In the log messages printed using LogTransformer I see that each
>> > store.idis processed but somehow it only picks up the articles for the
>> > first store.
>> >
>> > Any ideas?
>> >
>> > /Tobias Berg
>>
>
>

Reply via email to