On Fri, 2016-03-04 at 12:41 +0530, Aneesh Mon N wrote: > - is there any difference in posting the data in json format vs xml? > - do we get any performance improvement if we generate the json/xml > files, scp to the solr server and then push via curl command
I have not tested that, but as part of performance testing indexing, I achieved a markedly increase in performance when I used CSV. That was for very small documents though. I do not know how well it works for large ones. Standard sanity check: Have you tried piping the result from Penthao into /dev/null, to see if it is Solr or the extraction part that is the heavy one? - Toke Eskildsen, State and University Library, Denmark