nesting of an XPathEntityProcessor into another XPathEntityProcessor is possible only if a field in an xml is a filename/url . what is the purpose of nesting like this? is it because you have multiple addresses? the possible solutions are discussed elsewhere in this thread
On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: > Hello, > > I am also a newbie and was wanting to do almost the exact same thing. > I was planning on doing the equivalent of:- > > <dataConfig> > <dataSource type="FileDataSource" encoding="UTF-8" /> > <document> > <entity name ="f" processor="FileListEntityProcessor" > baseDir="***" > fileName=".*xml" > rootEntity="false" > dataSource="null" > > <entity > name="record" > processor="XPathEntityProcessor" > stream="false" > rootEntity="false" ***changed*** > forEach="/record" > url="${f.fileAbsolutePath}"> > <field column="ID" xpath="/record/@id" commonField="true"/> > ***change** > <!-- Address --> > <entity > name="record_adr" > processor="XPathEntityProcessor" > stream="false" > forEach="/record/address" > url="${f.fileAbsolutePath}"> > <field column="address_street" xpath="/ > record/address/@street" /> > <field column="address_state" > xpath="/record/address//@state" /> > <field column="address_type" xpath="/ > record/address//@type" /> > </entity> > </entity> > </entity> > </document> > </dataConfig> > > ID is no longer unique within Solr, There would be multiple "documents" > with a given ID; one for each address. You can then search on ID and get > the three addresses, you can also search on an address more sensibly. > > I have not been able to try this yet as other issues are still to be > dealt with. > > Comments????? > >>Hi >>I may be completely off on this being new to SOLR but I am not sure >>how to index related groups of fields in a document and preserver >>their 'grouping'. I would appreciate any help on this. Detailed >>description of the problem below. >> >>I am trying to index an entity that can have multiple occurrences in >>the same document - e.g. Address. The address could be Shipping, >>Home, Office etc. Each address element has multiple values in it >>like street, state etc. Thus each address element is a group with >>the state and street in one address element being related to each other. >> >>It looks like this in my source xml >> >><record> >> <coreInfo id="123" , .../> >> <address street="XYZ1" State="CA" ...type="home" /> >> <address street="XYZ2" state="CA" ... type="Office"/> >> <address street="XYZ3" state="CA" ....type="Other"/> >></record> >> >>I have setup my DIH to treat these as entities as below >> >><dataConfig> >> <dataSource type="FileDataSource" encoding="UTF-8" /> >> <document> >> <entity name ="f" processor="FileListEntityProcessor" >> baseDir="***" >> fileName=".*xml" >> rootEntity="false" >> dataSource="null" > >> <entity >> name="record" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record" >> url="${f.fileAbsolutePath}"> >> <field column="ID" xpath="/record/@id" /> >> >> <!-- Address --> >> <entity >> name="record_adr" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record/address" >> url="${f.fileAbsolutePath}"> >> <field column="address_street" xpath="/ >>record/address/@street" /> >> <field column="address_state" >> xpath="/record/address//@state" /> >> <field column="address_type" xpath="/ >>record/address//@type" /> >> </entity> >> </entity> >> </entity> >> </document> >></dataConfig> >> >> >>The problem is as follows. DIH seems to treat these as entities but >>solr seems to flatten them out on indexing to fields in a document >>(losing the entity part). >> >>So when I search for the an ID - in the response all the street fields >>are bunched to-gather, followed by all the state fields type etc. >>Thus I can't associate which street address corresponds to which >>address type in the response. >> >>What seems harder is this - say I need to query on 'Street' = XYZ1 and >>type="Office". This should NOT return a document since the street for >>the office address is "XY2" and not "XYZ1". However when I query for >>address_state:"XYZ1" and address_type:"Office" I get back this document. >> >>The problem seems to be that while DIH allows 'entities' within a >>document the SOLR schema does not preserve them - it 'flattens' all >>of them out as indices for the document. >> >>I could work around the problem by creating SOLR fields like >>"home_address_street" and "office_address_street" and do some xpath >>mapping. However I don't want to do it as we can have multiple >>'other' addresses. Also I have other fields whose type is not easily >>distinguished like address. >> >>As I mentioned being new to SOLR I might have completely goofed on a >>way to set it up - much appreciate any direction on it. I am using >>SOLR 1.3 >> >>Regards, >>Guna > > -- > > =============================================================== > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > =============================================================== > -- --Noble Paul