RE: dataimport db-data-config.xml

Davis, Daniel (NIH/NLM) [C] Fri, 29 Apr 2016 08:25:45 -0700

Kishor,

Data Import Handler doesn't know how to randomly access rows from the CSV to 
"JOIN" them to rows from the MySQL table at indexing time.
However, both MySQL and Solr know how to JOIN rows/documents from multiple 
tables/collections/cores.


Data Import Handler could read the CSV first, and query MySQL within that, but 
I don't think that's a great architecture because it depends on the business 
requirements in a rather brittle way (more on this below).

So, I see three basic architectures:

Use MySQL to do the JOIN:
----------------------------------
- Your indexing isn't just DIH, but a script that first.
- Imports the CSV into a MySQL table, validating that the id in the CSV table 
is found in the MySQL table.
- Your DIH has either an <entity> for one SQL query that contains an <entity> 
for the other SQL query, or it has a JOIN query/query on a MySQL view.

This is ideal if:
- Your resources (including you) are more familiar with RDBMS technology than 
Solr.
- You have no business requirement to return rows from just the MySQL table or 
just the CSV as search results.
- The data is small enough that the processing time to import into MySQL each 
time you index is acceptable.

Use Solr to do the JOIN:
------------------------------
- Index all the rows from the CSV as documents within Solr, 
- Index all the rows from the MySQL table as documents within Solr,
- Use JOIN queries to query them together.

This is ideal if:
- You don't control the MySQL database, and have no way at all to add a table 
to it.
- You have a business requirement to return either or both results from the 
MySQL table or the CSV.
- You want Solr JOIN queries on your Solr resume ;)   Not a terribly good 
reason, I guess.


Use Data Import Handler to do the JOIN:
---------------------------------------------------
If you absolutely want to join the data using Data Import Handler, then:
- Have DIH loop through the CSV *first*, and then make queries based on the id 
into the MySQL table.
- In this case, the <entity> for the MySQL query will appear within the 
<entity> for the CSV row, which will appear within an <entity> for the CSV file 
within the filesystem.
- The <entity> for the CSV row would be the primary document entity.

This is only appropriate if:
- There is no business requirement to search for results directly from the 
MySQL table on its own.
- Your business requirements suggest one result for each row from the CSV, 
rather than from the MySQL table or either way.
- The CSV contains every id in the MySQL table, or the entries within the MySQL 
table that don't have anything from the CSV shouldn't appear in the results 
anyway.


-----Original Message-----
From: kishor [mailto:krajus...@gmail.com] 
Sent: Friday, April 29, 2016 4:58 AM
To: solr-user@lucene.apache.org
Subject: dataimport db-data-config.xml

I want to import data from mysql-table and csv file ata the same time beacuse 
some data are in mysql tables and some are in csv file . I want to match 
specific id from mysql table in csv file then add the data in solar.

What i think or wnat to do....

<dataConfig>
        <dataSource name ="db1" driver="org.postgresql.Driver"
                    url="jdbc:postgresql://0.0.0.0:5432/XXX"
                    user="XXX" password="root" />

        <document>
                <entity name="user"  dataSource="db1" query='select id, name 
from table' >
                        <field column="id" name="id" />
                        <field column="name" name="name" />
               </entity>


                <entity name="user2"  dataSource="csv-datasource" >
                        <field column="csv data1" name="csv data1" />
               </entity>

Is this possible in solr? 

Please suggest me How to import data from csv and mysql table at the same time.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-db-data-config-xml-tp4270673p4273614.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: dataimport db-data-config.xml

Reply via email to