Hi all,
this is my first post, as I am new to SOLR (some Lucene exp).
I am trying to load data from an existing datamart into SOLR using the
DataImportHandler but in my opinion it is too slow due to the special
structure of the datamart I have to use.
Root Cause:
This datamart uses a row based approach (pivot) to present its data. It
was so done to allow adding more attributes to the data set without
having to change the table structure.
Impact:
To use the DataImportHandler, i have to pivot the data to create again
one row per data set. Unfortunately, this results in more and less
performant queries. Moreover, there are sometimes multiple rows for a
single attribute, that require separate queries - or more tricky
subselects that probably don't speed things up.
Here is an example of the relation between DB requests, row fetches and
actual number of documents created:
<lst name="statusMessages">
<str name="Total Requests made to DataSource">3737</str>
<str name="Total Rows Fetched">5380</str>
<str name="Total Documents Skipped">0</str>
<str name="Full Dump Started">2009-07-22 18:19:06</str>
−
<str name="">
Indexing completed. Added/Updated: 934 documents. Deleted 0 documents.
</str>
<str name="Committed">2009-07-22 18:22:29</str>
<str name="Optimized">2009-07-22 18:22:29</str>
<str name="Time taken ">0:3:22.484</str>
</lst>
(Full index creation.)
There are about half a million data sets, in total. That would require
about 30h for indexing? My feeling is that there are far too many row
fetches per data set.
I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using
around 680MB RAM, Java6. I haven't changed the Lucene configuration
(merge factor 10, ram buffer size 32).
Possible solutions?
A) Write my own DataImportHandler?
B) Write my own "MultiRowTransformer" that accepts several rows as input
argument (not sure this is a valid option)?
C) Approach the DB developers to add a flat table with one data set per row?
D) ...?
If someone would like to share their experiences, that would be great!
Thanks a lot!
Chantal
--
Chantal Ackermann