-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Mariano,

On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote:
> Hello
> 
> We have an index Solr with 3 nodes, 1 shard et 2 replicas.
> 
> Our goal is to index 42 millions rows. Indexing time is important.
> The data source is an oracle database.
> 
> Our indexing strategy is :
> 
> *         Reading from Oracle to a big CSV file.
> 
> *         Reading from 4 files (big file chunked) and injection via
> ConcurrentUpdateSolrClient
> 
> Is it the optimal way of injecting such mass of data into Solr ?
> 
> For information, estimated time for our solution is 6h.

How big are the CSV files? If most of the time is taken performing the
various SELECT operations, then it's probably a good strategy.

However, you may find that using the disk as a buffer slows everything
down because disk-writes can be very slow.

Why not perform your SELECT(s) and write directly to Solr using one of
the APIs (either a language-specific API, or through the HTTP API)?

Hope that helps,
- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo
cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE
s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH
I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3
6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+
r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5
BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6
ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX
ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey
85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg
GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy
tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD
VH6PlwgqcrO28Jx799mJvpIotoE=
=aMPk
-----END PGP SIGNATURE-----

Reply via email to