Hi
I have looked and cannot see any clear answers to this on
the Interwebs.
I have an index with, say, 10 fields.
I load that index directly from Oracle - data-config.xml using
JDBC. I can load 10 million rows very quickly. This direct
way of loading from Oracle straight into SOLR is fantastic -
really efficient and saves writing loads of import/export code
(e.g. via a CSV file).
Of those 10 fields - two of them (set to multiValued) come from
a separate table and there are anything from 1 to 10 rows per
row from the main table.
I can use a nested entity to extract the child rows for each of
the 10m rows in the main table - but then SOLR generates 10m
separate SQL calls - and the load time goes from a few minutes
to several days.
On smaller tables - just a few thousand rows - I can use a
second nested entity with a JDBC call - but not for very large
tables.
Could I load the data in two steps:
1) load the main 10m rows
2) load into the existing index by adding the data from a
second SQL call into fields for each existing row (i.e.
an UPDATE instead of an INSERT).
I don't know what syntax/option might achieve that. There
is incremental loading - but I think that replaces whole rows
rather then updating individual fields. Or maybe it does
do both?
Any other techniques that would be fast/efficient?
Help!
--
Cheers
Jules.