On 2/26/2019 1:34 AM, Saurabh Sharma wrote:
Now we want to do partial updates.I went through the documentation and
found that all the fields should be stored or docValues for partial
updates. I have few questions regarding this?
1) In case i am just fetching only 1 field while making query.What will the
performance impact due to all fields being stored? Lets say i have an "id"
field and i do have doc value true for the field, will solr use stored
fields in this case? will it load whole document in RAM ?
I am not aware of any option to keep docValues in RAM. If you have
enough memory in your system (memory that has NOT been assigned to any
program), then the OS *might* keep some or all of your index data in
memory. That functionality, present in all modern operating systems, is
the secret to good performance.
The stored data is compressed. The docValues data is not compressed.
Uncompressing stored data uses CPU cycles. Generally if data must be
read off of disk, compressed will be faster. But if the data has been
cached by the OS and comes from memory, which you definitely want to
happen if possible, uncompressed will likely be faster ... and it will
definitely require less CPU.
If you have many fields but you're only fetching one, then docValues
will almost certainly be faster than stored. All of the stored fields
for one document are compressed together, so Solr will be reading data
that it won't actually be using, in order to achieve decompression.
I believe that if you have both stored data and docValues for a field,
Solr will use stored data for search results. I am not positive that
this is the case, but I think it's what happens.
2)What's the impact of large stored fields (.fdt) on query time
performance. Do query time even depend on the stored field or they just
depend on indexes?
The size of your stored data will have no *DIRECT* impact on query
performance. Stored data is not consulted for the query part. It is
consulted when document data is retrieved to return with the response.
A large amount of stored data can have an indirect impact on query
performance. If there is insufficient memory available to the OS disk
cache, then reading the stored data to return results to the client will
push information out of the disk cache that is needed for queries. If
that happens, then Solr will need to re-read that data off the disk to
do a query. Because disks are glacially slow compared to memory,
performance will be impacted.
Here's a page about performance problems. Most of it is about memory,
since that is usually the resource that has the biggest effect on
performance:
https://wiki.apache.org/solr/SolrPerformanceProblems
Thanks,
Shawn