alternately, you can write your own EntityProcessor and just override the nextRow() . I guess you can still use the JdbcDataSource
On Wed, Jul 22, 2009 at 10:05 PM, Chantal Ackermann<chantal.ackerm...@btelligent.de> wrote: > Hi all, > > this is my first post, as I am new to SOLR (some Lucene exp). > > I am trying to load data from an existing datamart into SOLR using the > DataImportHandler but in my opinion it is too slow due to the special > structure of the datamart I have to use. > > Root Cause: > This datamart uses a row based approach (pivot) to present its data. It was > so done to allow adding more attributes to the data set without having to > change the table structure. > > Impact: > To use the DataImportHandler, i have to pivot the data to create again one > row per data set. Unfortunately, this results in more and less performant > queries. Moreover, there are sometimes multiple rows for a single attribute, > that require separate queries - or more tricky subselects that probably > don't speed things up. > > Here is an example of the relation between DB requests, row fetches and > actual number of documents created: > > <lst name="statusMessages"> > <str name="Total Requests made to DataSource">3737</str> > <str name="Total Rows Fetched">5380</str> > <str name="Total Documents Skipped">0</str> > <str name="Full Dump Started">2009-07-22 18:19:06</str> > - > <str name=""> > Indexing completed. Added/Updated: 934 documents. Deleted 0 documents. > </str> > <str name="Committed">2009-07-22 18:22:29</str> > <str name="Optimized">2009-07-22 18:22:29</str> > <str name="Time taken ">0:3:22.484</str> > </lst> > > (Full index creation.) > There are about half a million data sets, in total. That would require about > 30h for indexing? My feeling is that there are far too many row fetches per > data set. > > I am testing it on a smaller machine (2GB, Windows :-( ), Tomcat6 using > around 680MB RAM, Java6. I haven't changed the Lucene configuration (merge > factor 10, ram buffer size 32). > > Possible solutions? > A) Write my own DataImportHandler? > B) Write my own "MultiRowTransformer" that accepts several rows as input > argument (not sure this is a valid option)? > C) Approach the DB developers to add a flat table with one data set per row? > D) ...? > > If someone would like to share their experiences, that would be great! > > Thanks a lot! > Chantal > > > > -- > Chantal Ackermann > -- ----------------------------------------------------- Noble Paul | Principal Engineer| AOL | http://aol.com