Hello, Joseph. This rate looks good to me, although if the node is idling and has a plenty of free RAM, you can dissect this file by unix tools and submit these partitions for import in parallel. Hanging connection seems like a bug.
On Thu, Jan 2, 2020 at 10:09 PM Joseph Lorenzini <jalo...@gmail.com> wrote: > Hi all, > > I have TSV file that contains 1.2 million rows. I want to bulk import this > file into solr where each row becomes a solr document. The TSV has 24 > columns. I am using the streaming API like so: > > curl -v ' > > http://localhost:8983/solr/example/update?stream.file=/opt/solr/results.tsv&separator=%09&escape=%5c&stream.contentType=text/csv;charset=utf-8&commit=true > ' > > The ingestion rate is 167,000 rows a minute and takes about 7.5 minutes to > complete. I have a few questions. > > - is there a way to increase the performance of the ingestion rate? I am > open to doing something other than bulk import of a TSV up to and including > writing a small program. I am just not sure what that would look like at a > high level. > - if the file is a TSV, I noticed that solr never closes a HTTP connection > with a 200 OK after all the documents are uploaded. The connection seems to > be held open indefinitely. If however, i upload the same file as a CSV, > then solr does close the http connection. Is this a bug? > -- Sincerely yours Mikhail Khludnev