On 04/12/2012, Spadez <james_will...@hotmail.com> wrote:
> Hi.
>
> I am having a bit of trouble figuruing out the DIH for SQL files. I have
> asked around a few different places but havent got any replies so I was
> hoping you could help me.
>
> *I have a database schema like this:*
>
> CREATE TABLE company (
>     id SERIAL PRIMARY KEY,
>     name varchar(60) NOT NULL
> );
>
> CREATE TABLE country (
>     id SERIAL PRIMARY KEY,
>     name varchar(255) NOT NULL
> );
>
> CREATE TABLE location (
>     id SERIAL PRIMARY KEY,
>     name varchar(255) NOT NULL,
>     coordinate varchar(255) NOT NULL,
>     location_id integer NOT NULL REFERENCES country (id)
> );
>
> CREATE TABLE source (
>     id SERIAL PRIMARY KEY,
>     name varchar(60) NOT NULL
> );
>
> CREATE TABLE item (
>     id SERIAL PRIMARY KEY,
>     title varchar(60) NOT NULL,
>     description varchar(900) NOT NULL,
>     company_id integer NOT NULL REFERENCES company (id),
>     date timestamp NOT NULL,
>     source_id integer NOT NULL REFERENCES source (id),
>     link varchar(255) NOT NULL,
>     location_id integer NOT NULL REFERENCES location (id)
> );
>
> *My what I want to put into my schema is this information (named as they
> are
> in my schema):*
>
> id
> title
> description
> date
> source
> link
> location_name
> location_coordinates

It is not entirely clear:
(a) How the tables are related. One can guess that item is
     related to source through source_id, and to location through
     location_id
(b) Which of the Solr fields are to be derived from which table.
     In particular, what are the tables, company and country, used
     for.

> *I made my DIH like this:*

The way to deal with related tables is by using nested entities, e.g.,
some of your fields can be populated by

<dataConfig>
 <dataSource name="app-ds" driver="org.postgresql.Driver"
 url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia"
 password="secret" />
        <document>
           <entity dataSource="app-ds" name="item" query="SELECT id,
 title, location_id from item">
              <entity dataSource="app-ds" name="location" query="SELECT name,
 coordinate from location where location_id=${item.location_id}">
                        <field column="id" name="id" />
                        <field column="title" name="title" />
                        <field column="name" name="location_name" />
                        <field column="coordinate" name="location_coordinates" 
/>
                </entity>
            </entity>
        </document>
</dataConfig>

You can add more second-level entities, and/or fields as needed. The
${item.location_id} refers to the location_id in the select from the top-
level entity.

> My main questions relate to the entity datastore query and also what to do
> for field columns when it is a linked table. For example the word "name"
> isnt unique since it appears in several different tables.

You could change the name in the select, e.g.,
for the top-level entity have:
  query="select name as top_level_name"
and for the inner entity have:
  query="select name as second_level_name"

Regards,
Gora

Reply via email to