On Fri, 10 Aug 2018, Jeff Lien wrote:
> This patch provides a performance improvement for the CRC16 calculations done
> in read/write
> workloads using the T10 Type 1/2/3 guard field. For example, today with
> sequential write
> workloads (one thread/CPU of IO) we consume 100% of the CPU because of the
> CRC16 computation
> bottleneck. Today's block devices are considerably faster, but the CRC16
> calculation prevents
> folks from utilizing the throughput of such devices. To speed up this
> calculation and expose
> the block device throughput, we slice the old single byte for loop into a 16
> byte for loop,
> with a larger CRC table to match. The result has shown 5x performance
> improvements on various
> big endian and little endian systems running the 4.18.0 kernel version.
You are nevertheless increasing the kernel size by 7.5 KB.
Could the small table still be preserved with a config option for those
who require small more than fast?
That could look like:
static const __u16 t10_dif_crc_table[][256] = {
{
[...]
},
#ifndef CONFIG_CRC16_SMALL
{
[...]
[...]
},
#endif
};
and the code to suit.
Nicolas