nesting of an XPathEntityProcessor into another XPathEntityProcessor
is possible only if a field in an xml is a filename/url .
what is the purpose of nesting like this?
is it because you have multiple addresses? the possible solutions are
discussed elsewhere in this thread

On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
> Hello,
>
> I am also a newbie and was wanting to do almost the exact same thing.
> I was planning on doing the equivalent of:-
>
> <dataConfig>
>    <dataSource type="FileDataSource" encoding="UTF-8" />
>    <document>
>      <entity name ="f" processor="FileListEntityProcessor"
>              baseDir="***"
>              fileName=".*xml"
>              rootEntity="false"
>              dataSource="null" >
>         <entity
>           name="record"
>           processor="XPathEntityProcessor"
>           stream="false"
>           rootEntity="false"            ***changed***
>           forEach="/record"
>           url="${f.fileAbsolutePath}">
>                 <field column="ID" xpath="/record/@id" commonField="true"/> 
> ***change**
>                 <!-- Address  -->
>                  <entity
>                     name="record_adr"
>                     processor="XPathEntityProcessor"
>                     stream="false"
>                     forEach="/record/address"
>                     url="${f.fileAbsolutePath}">
>                          <field column="address_street"  xpath="/
> record/address/@street" />
>                          <field column="address_state"   
> xpath="/record/address//@state" />
>                          <field column="address_type"    xpath="/
> record/address//@type" />
>                </entity>
>            </entity>
>      </entity>
>    </document>
> </dataConfig>
>
> ID is no longer unique within Solr, There would be multiple "documents"
> with a given ID; one for each address. You can then search on ID and get
> the three addresses, you can also search on an address more sensibly.
>
> I have not been able to try this yet as other issues are still to be
> dealt with.
>
> Comments?????
>
>>Hi
>>I may be completely off on this being new to SOLR but I am not sure
>>how to index related groups of fields in a document and preserver
>>their 'grouping'.   I  would appreciate any help on this.    Detailed
>>description of the problem below.
>>
>>I am trying to index an entity that can have multiple occurrences in
>>the same document - e.g. Address.  The address could be Shipping,
>>Home, Office etc.   Each address element has multiple values in it
>>like street, state etc.    Thus each address element is a group with
>>the state and street in one address element being related to each other.
>>
>>It looks like this in my source xml
>>
>><record>
>>    <coreInfo id="123" , .../>
>>    <address street="XYZ1" State="CA" ...type="home" />
>>    <address street="XYZ2" state="CA" ... type="Office"/>
>>    <address street="XYZ3" state="CA" ....type="Other"/>
>></record>
>>
>>I have setup my DIH to treat these as entities as below
>>
>><dataConfig>
>>    <dataSource type="FileDataSource" encoding="UTF-8" />
>>    <document>
>>      <entity name ="f" processor="FileListEntityProcessor"
>>              baseDir="***"
>>              fileName=".*xml"
>>              rootEntity="false"
>>              dataSource="null" >
>>         <entity
>>            name="record"
>>          processor="XPathEntityProcessor"
>>          stream="false"
>>          forEach="/record"
>>            url="${f.fileAbsolutePath}">
>>                 <field column="ID" xpath="/record/@id" />
>>
>>                 <!-- Address  -->
>>                  <entity
>>                      name="record_adr"
>>                    processor="XPathEntityProcessor"
>>                    stream="false"
>>                    forEach="/record/address"
>>                            url="${f.fileAbsolutePath}">
>>                          <field column="address_street"  xpath="/
>>record/address/@street" />
>>                        <field column="address_state"   
>> xpath="/record/address//@state" />
>>                          <field column="address_type"    xpath="/
>>record/address//@type" />
>>               </entity>
>>            </entity>
>>      </entity>
>>    </document>
>></dataConfig>
>>
>>
>>The problem is as follows.  DIH seems to treat these as entities but
>>solr seems to flatten them out on indexing to fields in a document
>>(losing the entity part).
>>
>>So when I search for the an ID - in the response all the street fields
>>are bunched to-gather, followed by all the state fields type etc.
>>Thus I can't associate which street address corresponds to which
>>address type in the response.
>>
>>What seems harder is this - say I need to query on 'Street' = XYZ1 and
>>type="Office".  This should NOT return a document since the street for
>>the office address is "XY2" and not "XYZ1".  However when I query for
>>address_state:"XYZ1" and address_type:"Office" I get back this document.
>>
>>The problem seems to be that while DIH allows 'entities' within a
>>document  the SOLR schema does not preserve them - it 'flattens' all
>>of them out as indices for the document.
>>
>>I could work around the problem by creating SOLR fields like
>>"home_address_street" and "office_address_street" and do some xpath
>>mapping.  However I don't want to do it as we can have multiple
>>'other' addresses.  Also I have other fields whose type is not easily
>>distinguished like address.
>>
>>As I mentioned being new to SOLR I might have completely goofed on a
>>way to set it up - much appreciate any direction on it. I am using
>>SOLR 1.3
>>
>>Regards,
>>Guna
>
> --
>
> ===============================================================
> Fergus McMenemie               Email:fer...@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Reply via email to