Hello Erick

Thanks for your response.

The main problem we have, is that the data is denormalized. And this
increases the document to index exponentially.

Let's say we have the following two tables.

----------------------      ----------------------
Table: Patient                   Table:Image
----------------------      ----------------------
Field: IDPatient                 Field: IDImage
Field: Name                      Field: IDPatient
                                     Field: Name  


Then we create a view to denormalize the data.

create or replace view solrview as select * from patient p left join image i
on p.idpatient = i.idpatient

If we select the view we will get the following rows

Patient_IDPatient | Patient_Name | Image_IDImage | Image_IDPatient |
Image_Name
           1                    Peter                   1                    
1                   Dog
           2                    Peter                   2                    
1                   Cat
           3                    Peter                   3                    
1                   Ant
           4                    Josh                    4                    
1                   Deer
           5                    Kent                    5                    
2                   Dog
           6                    Martin                  6                    
2                   Dog
           7                    David                   
           8                    Eric                     


This is the configuration we use in the data-config.xml

<entity name="solrEntiry" pk="solrview_PK" 
    query=" SELECT * FROM solrview" 
    deltaQuery = "SELECT PATIENT_IDPATIENT IMAGE_IDIMAGE FROM solrview 
                       WHERE PATIENT_IAMODIFICATIONDATE >
'${dataimporter.last_index_time}
                       UNION 
                       SELECT PATIENT_IDPATIENT, IMAGE_IDIMAGE FROM solrview 
                       WHERE IMAGE_IAMODIFICATIONDATE >
'${dataimporter.last_index_time}
    deltaImportQuery=" SELECT * FROM solrview 
                               WHERE PATIENT_IDPATIENT =
'${dataimporter.delta.PATIENT_IDPATIENT}'  
                               AND IMAGE_IDIMAGE =
'${dataimporter.delta.IMAGE_IDIMAGE}'"
                               transformer="ClobTransformer">


If a user is now updating the Patient_Name "Peter" to "Frank" then there are
already 3 rows affected.

So even if we arrive to index 6k / second there is still a problem that 80k
can be affected with only a small change like changing the patient_name
field.


What I now try to do now is reducing the fields that will be updated. 
For example if there are only changes on the patient table I will only need
to reindex the fields from the table patient. 

Writing our own SolrJ program isn't an option because changes to the
database can come from different services. It would be to error prone if we
would integrate this SolrJ programm everywhere where data gets written.

The reason to post this issue is to see if somebody already got this
situation and have a better "best practise" :-)


Best Regards Sandro











--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Near-Realtime-with-denormalized-Data-tp4022072p4022323.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to