There is compression of stored data; I don't think it makes sense to disable it. The default compression is LZ4 which is the "BEST_SPEED" option offered by Lucene compared to others. Back in 2015, the article you quoted, this faster option wasn't available. I don't see a no-compression option: https://lucene.apache.org/core/9_3_0/core/org/apache/lucene/codecs/StoredFieldsFormat.html
Make sure you're returning documents in the same order that Lucene/Solr has it internally. By default, if you aren't specifying any sort options, I believe Solr will return the documents in this order, but it's worth double-checking, If you specify fl=[docid] check that the results show an increasing number for each document. Again, it's worth being aware that what you are doing is very far afield from what a search engine is *for*. So yeah... performance may not be so great. Solr users want top-X documents sorted by something, and/or maybe some facets/stats summarizing fields. Not all docs. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Sun, Mar 12, 2023 at 5:57 PM Fikavec F <fika...@yandex.ru> wrote: > Continuing my research on the performance of data fetching from Solr, I > noticed a significant drop in the transfer rate when the size of stored > fields decreased. Below are the results of measuring the data transfer rate > (wt=javabin) from a collection of 10 gigabytes in size, but consisting of a > different number of documents and the size of the stored text field (ram > disk, one shard, the collection documents contain only "id" and "text_sn" - > stored unindexed without docValues field): > > - 3.48 Gb/s (or 849 doc/s) - collection with 20 479 > documents of 512 KB each (512*1024 symbols each) > - 2.22 Gb/s (or 17 340 doc/s) - collection with 654 043 > documents of 16 KB each (16*1024 symbols each); for a speed of 3.48 Gb/s it > should be 27 187 doc/s > - 1.16 Gb/s (or 72 500 doc/s) - collection with 5 159 740 > documents of 2 KB each (2*1024 symbols each); for a speed of 3.48 Gb/s it > should be 217 500 doc/s > - 212 Mb/s (or 103 500 doc/s) - collection with 37 153 697 > documents of 256 bytes each (256 symbols each); for a speed of 3.48 Gb/s it > should be 1 699 218 doc/s > > Since the disk or network is not a bottleneck, the CPU is also quite > fast (4.5 Ghz), where can I further look for the reason for such a drop in > data transfer speed and is there a chance to improve something there? > As far as I understand from the measurement results, per document > overhead costs arise somewhere when traversing/iterating through the list > of documents transmitted to the javabin output writer, and since the disk > is in RAM, these overhead costs are not related to extracting data from the > disk itself (there may be expenses for extracting data from the disk, but > they should not have such a big effect). I managed to find an article from > 2015, which mentions that the problem may be in stored field compression > and provides a way to disable it > https://stegard.net/2015/05/performance-of-stored-field-compression-in-lucene-4-1/ > - is it still relevant (it seems that uncompression 10 Gb of data with > larger documents or smaller ones should not affect the speed so > significantly, but if instances of the uncompression class and some other > entities are created for each document without reuse, this is quite > possible)? > Best Regards, > --------------------------------------------------------------------- To > unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional > commands, e-mail: dev-h...@solr.apache.org