There is compression of stored data; I don't think it makes sense to
disable it.  The default compression is LZ4 which is the "BEST_SPEED"
option offered by Lucene compared to others.  Back in 2015, the article you
quoted, this faster option wasn't available.  I don't see a no-compression
option:
https://lucene.apache.org/core/9_3_0/core/org/apache/lucene/codecs/StoredFieldsFormat.html

Make sure you're returning documents in the same order that Lucene/Solr has
it internally.  By default, if you aren't specifying any sort options, I
believe Solr will return the documents in this order, but it's worth
double-checking,  If you specify fl=[docid] check that the results show an
increasing number for each document.

Again, it's worth being aware that what you are doing is very far afield
from what a search engine is *for*.  So yeah... performance may not be so
great.  Solr users want top-X documents sorted by something, and/or maybe
some facets/stats summarizing fields.  Not all docs.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Sun, Mar 12, 2023 at 5:57 PM Fikavec F <fika...@yandex.ru> wrote:

>    Continuing my research on the performance of data fetching from Solr, I
> noticed a significant drop in the transfer rate when the size of stored
> fields decreased. Below are the results of measuring the data transfer rate
> (wt=javabin) from a collection of 10 gigabytes in size, but consisting of a
> different number of documents and the size of the stored text field (ram
> disk, one shard, the collection documents contain only "id" and "text_sn" -
> stored unindexed without docValues field):
>
>    - 3.48 Gb/s (or       849 doc/s)  -  collection with         20 479
>    documents of 512 KB each (512*1024 symbols each)
>    - 2.22 Gb/s (or  17 340 doc/s)  -  collection with       654 043
>    documents of 16 KB each (16*1024 symbols each); for a speed of 3.48 Gb/s it
>    should be 27 187 doc/s
>    - 1.16 Gb/s (or  72 500 doc/s)  -  collection with    5 159 740
>    documents of 2 KB each (2*1024 symbols each); for a speed of 3.48 Gb/s it
>    should be 217 500 doc/s
>    - 212 Mb/s  (or 103 500 doc/s) -  collection with  37 153 697
>    documents of 256 bytes each (256 symbols each); for a speed of 3.48 Gb/s it
>    should be  1 699 218 doc/s
>
>    Since the disk or network is not a bottleneck, the CPU is also quite
> fast (4.5 Ghz), where can I further look for the reason for such a drop in
> data transfer speed and is there a chance to improve something there?
>    As far as I understand from the measurement results, per document
> overhead costs arise somewhere when traversing/iterating through the list
> of documents transmitted to the javabin output writer, and since the disk
> is in RAM, these overhead costs are not related to extracting data from the
> disk itself (there may be expenses for extracting data from the disk, but
> they should not have such a big effect). I managed to find an article from
> 2015, which mentions that the problem may be in stored field compression
> and provides a way to disable it
> https://stegard.net/2015/05/performance-of-stored-field-compression-in-lucene-4-1/
> - is it still relevant (it seems that uncompression 10 Gb of data with
> larger documents or smaller ones should not affect the speed so
> significantly, but if instances of the uncompression class and some other
> entities are created for each document without reuse, this is quite
> possible)?
> Best Regards,
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional
> commands, e-mail: dev-h...@solr.apache.org

Reply via email to