BTW: stored field compression: are all "stored fields" within a document are put into one compressed chunk, or by per-field basis?
Kind regards, J. Barth > > Regards, > Alex. > Personal website: http://www.outerthoughts.com/ > Current project: http://www.solr-start.com/ - Accelerating your Solr > proficiency > > > On Tue, Apr 29, 2014 at 3:28 PM, Jochen Barth > <ba...@ub.uni-heidelberg.de> wrote: >> Dear reader, >> >> I'm trying to use solr for a hierarchical search: >> metadata from the higher-levelled elements is copied to the lower ones, >> and each element has the complete ocr text which it belongs to. >> >> At volume level, of course, we will have the complete ocr text in one >> <doc> and we need to store it for highlighting. >> >> My solr instance is configured like this: >> java -Xms12000m -Xmx12000m -jar start.jar >> [ imported with 4.7.0, performance tests with 4.8.0 ] >> >> Solr index files are of this size: >> 0.013gb .tip The index into the Term Dictionary >> 0.017gb .nvd Encodes length and boost factors for docs and fields >> 0.546gb .tim The term dictionary, stores term info >> 1.332gb .doc Contains the list of docs which contain each term along >> with frequency >> 4.943gb .pos Stores position information about where a term occurs in >> the index >> 12.743gb .tvd Contains information about each document that has term >> vectors >> 17.340gb .fdt The stored fields for documents "ocr" >> >> Configuring the ocr field as non-stored I'll get those performance >> measures (see docs/s) after warmup: >> >> jb@serv7:~> perl solr-performance.pl zeit 6 >> http://127.0.0.1:58983/solr/collection1/select >> ?wt=json >> &q={%21q.op%3dAND}ocr%3A%28zeit%29 >> &fq=mashed_b%3Afalse >> &fl=id >> &sort=sort_name_s asc,id+asc >> &rows=1000000 >> time: 3.96 s >> bytes: 1.878 MB >> 64768 docs found; got 64768 docs >> 16353 docs/s; 0.474 MB/s >> >> ... and with ocr stored, even _not_ requesting ocr with fl=... with >> disabled <documentCache class="solr.LRUCache" ... /> and >> <enableLazyFieldLoading>false</enableLazyFieldLoading> >> [ with <documentCache and <enableLazyFieldLoading results are even worser ] >> >> ... using solr-4.7.0 and ubuntu12.04 openjdk7 (...u51): >> jb@serv7:~> perl solr-performance.pl zeit 6 >> http://127.0.0.1:58983/solr/collection1/select >> ?wt=json >> &q={%21q.op%3dAND}ocr%3A%28zeit%29 >> &fq=mashed_b%3Afalse >> &fl=id >> &sort=sort_name_s asc,id+asc >> &rows=1000000 >> time: 61.58 s >> bytes: 1.878 MB >> 64768 docs found; got 64768 docs >> 1052 docs/s; 0.030 MB/s >> >> ... using solr-4.8.0 and oracle-jdk1.7.0_55 : >> jb@serv7:~> perl solr-performance.pl zeit 6 >> http://127.0.0.1:58983/solr/collection1/select >> ?wt=json&q={%21q.op%3dAND}ocr%3A%28zeit%29 >> &fq=mashed_b%3Afalse >> &fl=id >> &sort=sort_name_s asc,id+asc >> &rows=1000000 >> time: 58.80 s >> bytes: 1.878 MB >> 64768 docs found; got 64768 docs >> 1102 docs/s; 0.032 MB/s >> >> Is there any reason why stored vs non-stored is 16 times slower? >> Is there a way to "store ocr" field in a separate index or somethings >> like this? >> >> Kind regards, >> J. Barth >> >> >> >> >> -- >> J. Barth * IT, Universitaetsbibliothek Heidelberg * 06221 / 54-2580 >> >> pgp public key: >> http://digi.ub.uni-heidelberg.de/barth%40ub.uni-heidelberg.de.asc -- J. Barth * IT, Universitaetsbibliothek Heidelberg * 06221 / 54-2580 pgp public key: http://digi.ub.uni-heidelberg.de/barth%40ub.uni-heidelberg.de.asc