On 04/12/2012, Spadez <james_will...@hotmail.com> wrote: > Hi. > > I am having a bit of trouble figuruing out the DIH for SQL files. I have > asked around a few different places but havent got any replies so I was > hoping you could help me. > > *I have a database schema like this:* > > CREATE TABLE company ( > id SERIAL PRIMARY KEY, > name varchar(60) NOT NULL > ); > > CREATE TABLE country ( > id SERIAL PRIMARY KEY, > name varchar(255) NOT NULL > ); > > CREATE TABLE location ( > id SERIAL PRIMARY KEY, > name varchar(255) NOT NULL, > coordinate varchar(255) NOT NULL, > location_id integer NOT NULL REFERENCES country (id) > ); > > CREATE TABLE source ( > id SERIAL PRIMARY KEY, > name varchar(60) NOT NULL > ); > > CREATE TABLE item ( > id SERIAL PRIMARY KEY, > title varchar(60) NOT NULL, > description varchar(900) NOT NULL, > company_id integer NOT NULL REFERENCES company (id), > date timestamp NOT NULL, > source_id integer NOT NULL REFERENCES source (id), > link varchar(255) NOT NULL, > location_id integer NOT NULL REFERENCES location (id) > ); > > *My what I want to put into my schema is this information (named as they > are > in my schema):* > > id > title > description > date > source > link > location_name > location_coordinates
It is not entirely clear: (a) How the tables are related. One can guess that item is related to source through source_id, and to location through location_id (b) Which of the Solr fields are to be derived from which table. In particular, what are the tables, company and country, used for. > *I made my DIH like this:* The way to deal with related tables is by using nested entities, e.g., some of your fields can be populated by <dataConfig> <dataSource name="app-ds" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia" password="secret" /> <document> <entity dataSource="app-ds" name="item" query="SELECT id, title, location_id from item"> <entity dataSource="app-ds" name="location" query="SELECT name, coordinate from location where location_id=${item.location_id}"> <field column="id" name="id" /> <field column="title" name="title" /> <field column="name" name="location_name" /> <field column="coordinate" name="location_coordinates" /> </entity> </entity> </document> </dataConfig> You can add more second-level entities, and/or fields as needed. The ${item.location_id} refers to the location_id in the select from the top- level entity. > My main questions relate to the entity datastore query and also what to do > for field columns when it is a linked table. For example the word "name" > isnt unique since it appears in several different tables. You could change the name in the select, e.g., for the top-level entity have: query="select name as top_level_name" and for the inner entity have: query="select name as second_level_name" Regards, Gora