On 04/03/2016 07:45, Toke Eskildsen wrote:
On Fri, 2016-03-04 at 12:41 +0530, Aneesh Mon N wrote:
    - is there any difference in posting the data in json format vs xml?
    - do we get any performance improvement if we generate the json/xml
    files, scp to the solr server and then push via curl command

I have not tested that, but as part of performance testing indexing, I
achieved a markedly increase in performance when I used CSV. That was
for very small documents though. I do not know how well it works for
large ones.

Standard sanity check: Have you tried piping the result from Penthao
into /dev/null, to see if it is Solr or the extraction part that is the
heavy one?

Absolutely, you need to be sure Pentaho isn't the bottleneck here. In our experience, lightweight tuned scripts will tend to be faster than a framework for this kind of task.

Charlie

- Toke Eskildsen, State and University Library, Denmark




--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to