-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Mariano,
On 3/19/18 11:50 AM, LOPEZ-CORTES Mariano-ext wrote: > Hello > > We have an index Solr with 3 nodes, 1 shard et 2 replicas. > > Our goal is to index 42 millions rows. Indexing time is important. > The data source is an oracle database. > > Our indexing strategy is : > > * Reading from Oracle to a big CSV file. > > * Reading from 4 files (big file chunked) and injection via > ConcurrentUpdateSolrClient > > Is it the optimal way of injecting such mass of data into Solr ? > > For information, estimated time for our solution is 6h. How big are the CSV files? If most of the time is taken performing the various SELECT operations, then it's probably a good strategy. However, you may find that using the disk as a buffer slows everything down because disk-writes can be very slow. Why not perform your SELECT(s) and write directly to Solr using one of the APIs (either a language-specific API, or through the HTTP API)? Hope that helps, - -chris -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQJRBAEBCAA7FiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlqv7aEdHGNocmlzQGNo cmlzdG9waGVyc2NodWx0ei5uZXQACgkQHPApP6U8pFgJrg//RushznZlTg60TxdE s/XKK+69s9c0+DwZ/IrU366j2ZOcJl8Osu9TpzaCSEpdWuulFG8qCSYThTngaijH I02YCqnK9Ey4+6B7u9QECWNXjdlQXoeINjCnRLVENWzkSmht/U2nW3WTFEPKOvQ3 6ISTPATFnfo6Wt4VYrVefqO/yCCiR5bGL5LsSZYwvqlh9egR8K/wtf4sQ5kji3z+ r2Z0gYpR9igE3ZCIByf6QGq0Ftku90oFCG+kCVNOdgfqwkUaMdc7krv92oTSH4o5 BH+trc2jPf3HKFmp/ywRAPEhAfA5BwbT8vB9gwl/6vuT6efAot7xrLqduF3h7jG6 ffPtkEBbD/ld3inIVta6/hnUwxX9O1fBtJrZegD14cezLV9QcEWFJ8/lUfgGOTdX ZuvwxBFhmCXE9EMWLlpdUOWK9iVBsZoQZxawoqw9xQauBp/Adg29fdeXmEkUssey 85HGDv/x33Bcr1xPGa8nOygWcZRUgGFCh871qStg9GeTNx3C/mSk0wxdKeUDRePg GEuL0p803yCJYAddyF66nnx676LfFeDaocBJelx5UbiteNT23xut7jWP/COyOvoy tpq3c9UfIkobgcA7bZ3IL2Og+hExgo+tLQXiOx6bf2TD1Jk2UOWWk1TAUspuUybD VH6PlwgqcrO28Jx799mJvpIotoE= =aMPk -----END PGP SIGNATURE-----