I have an XML file that I would like to index, that has a structure similar to 
this:

<data>
  <user id="[id-num]">
    <message date="[date]">[message text]</message>
    ...
  </user>
  ...
</data>

I would like to have the documents in the index correspond to the messages in 
the xml file, and have the user's [id-num] value stored as a field in each of 
the user's documents. I think this means that I have to define an entity for 
message that looks like this:

<dataConfig>
  <dataSource type="FileDataSource" encoding="UTF-8" />
  <document>
    <entity name="message"
            processor="XPathEntityProcessor"
            stream="true"
            forEach="/data/user/message/"
            url="message-data.xml">
      <field column="date" xpath="/data/user/message/@date" 
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss"/>
      <field column="text" xpath="/data/user/message" />
   </entity>
  </document>
</dataConfig>

but I don't know where to put the field definition for the user id. It would 
look like

<field column="id" xpath="/data/user/@id" />

I can't put it within the message entity, because it is defined with 
forEach="/data/user/message/" and the id field's xpath value is outside of the 
entity's scope. Putting the id field definition there causes a null pointer 
exception. I don't think I want to create a "user" entity that the "message" 
entity is nested inside of, or is there a way to do that and still have the 
index documents correspond to messages from the file? Are there one or more 
attributes or values of attribute that I haven't run across in my searching 
that provide a way to do what I need to do?
Thanks,
Mike


Reply via email to