A little ping - does anyone have any ideas about why libHDFS zero copy
would fail on a single node cluster?

On Sat, 11 Dec 2021 at 19:55, Pratyush Das <[email protected]> wrote:

> Thanks for the reply!
>
> In my case, the client process should be on the same machine as the single
> datanode because I have set up HDFS on my local machine itself. What other
> reason could there be for zero copy read not being possible?
>
> Some logs about how my datanode is set up -
>
> reikdas@reikdas-HP-Pavilion-x360-Convertible-14-dh1xxx:~$ hdfs dfsadmin
> -report
> Configured Capacity: 143422484480 (133.57 GB)
> Present Capacity: 65070809088 (60.60 GB)
> DFS Remaining: 61662220288 (57.43 GB)
> DFS Used: 3408588800 (3.17 GB)
> DFS Used%: 5.24%
> Replicated Blocks:
> Under replicated blocks: 0
> Blocks with corrupt replicas: 0
> Missing blocks: 0
> Missing blocks (with replication factor 1): 0
> Low redundancy blocks with highest priority to recover: 0
> Pending deletion blocks: 0
> Erasure Coded Block Groups:
> Low redundancy block groups: 0
> Block groups with corrupt internal blocks: 0
> Missing block groups: 0
> Low redundancy blocks with highest priority to recover: 0
> Pending deletion blocks: 0
>
> -------------------------------------------------
> Live datanodes (1):
>
> Name: 127.0.0.1:9866 (localhost)
> Hostname: reikdas-HP-Pavilion-x360-Convertible-14-dh1xxx
> Decommission Status : Normal
> Configured Capacity: 143422484480 (133.57 GB)
> DFS Used: 3408588800 (3.17 GB)
> Non DFS Used: 70994866176 (66.12 GB)
> DFS Remaining: 61662220288 (57.43 GB)
> DFS Used%: 2.38%
> DFS Remaining%: 42.99%
> Configured Cache Capacity: 0 (0 B)
> Cache Used: 0 (0 B)
> Cache Remaining: 0 (0 B)
> Cache Used%: 100.00%
> Cache Remaining%: 0.00%
> Xceivers: 0
> Last contact: Sat Dec 11 19:51:03 EST 2021
> Last Block Report: Sat Dec 11 19:19:57 EST 2021
> Num of Blocks: 55
>
> Regards,
>
> On Fri, 10 Dec 2021 at 23:33, Chris Nauroth <[email protected]> wrote:
>
>> Hello Pratyush,
>>
>> Zero-copy read is only possible when the client process attempting the
>> read is located on the same machine as a DataNode that hosts a replica of
>> the HDFS block.  The implementation relies on the mmap syscall to map the
>> underlying block file directly into the client process's address space.
>> This isn't possible across machine boundaries.
>>
>> If the client process is not co-located with the block, then it falls
>> back to a buffer-copying read, using a ByteBufferPool as a factory for
>> producing buffers to receive copies of the data.  This path does not
>> benefit from the performance enhancements of zero-copy read, but at least
>> it still works functionally.  In practice, it would be rare that all blocks
>> of a multi-block file would be co-located with a single client, hence the
>> importance of applications scheduling work with locality in mind.
>>
>> When hadoopReadZero fails, it also sets errno.  I suspect that if you
>> were to check errno, you would find the error is EPROTONOSUPPORT.  LibHDFS
>> uses this error code when 1) zero-copy read is not possible (e.g. client is
>> not co-located with the block replica), and 2) the caller has not provided
>> any ByteBufferPool for the fallback copying implementation.
>>
>> I suggest adding this to your code:
>>
>> hadoopRzOptionsSetByteBufferPool(opts, ELASTIC_BYTE_BUFFER_POOL_CLASS);
>>
>> This function is documented in the header here:
>>
>>
>> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/include/hdfs/hdfs.h#L978-L994
>>
>> With this in place, your client will fall back to a buffer copy when the
>> process is not co-located with the HDFS block.  Some of this is
>> demonstrated in the test suite for LibHDFS zero copy too:
>>
>>
>> https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests/test_libhdfs_zerocopy.c
>>
>> I see you are already configuring the client to skip checksums.  The
>> other alternative is to use Centralized Cache Management
>> <https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html>,
>> which will perform an eager checksum validation before calling mlock to pin
>> the block into physical memory explicitly at the DataNode host.
>>
>> Chris Nauroth
>>
>>
>> On Sun, Oct 24, 2021 at 1:42 PM Pratyush Das <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I can successfully load files from HDFS via the C API like -
>>>
>>> #include "hdfs.h"
>>> #include <stdio.h>
>>> #include <string.h>
>>> #include <stdlib.h>
>>> #include <stdint.h>
>>> #include <inttypes.h>
>>>
>>> int main(int argc, char **argv) {
>>>     hdfsFS fs = hdfsConnect("127.0.0.1", 9000);
>>>     const char* readPath = "/lineitem.tbl";
>>>     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, 0, 0, 0);
>>>     if(!readFile) {
>>>           fprintf(stderr, "Failed to open %s for reading!\n", readPath);
>>>           exit(-1);
>>>     }
>>>     if (!hdfsFileIsOpenForRead(readFile)) {
>>>         fprintf(stderr, "hdfsFileIsOpenForRead: we just opened a file
>>> with O_RDONLY, and it did not show up as 'open for read'\n");
>>>         exit(-1);
>>>     }
>>>     int size_in_bytes = hdfsAvailable(fs, readFile);
>>>     fprintf(stderr, "hdfsAvailable: %d\n", size_in_bytes);
>>>     char *buffer;
>>>     buffer = (char*)malloc(sizeof(char)*(size_in_bytes+1));
>>>     memset(buffer, 0, sizeof(buffer));
>>>     int num_read_bytes = 0;
>>>     while (num_read_bytes < size_in_bytes) {
>>>         int rbytes = hdfsRead(fs, readFile, &buffer[num_read_bytes],
>>> size_in_bytes);
>>>         num_read_bytes += rbytes;
>>>     }
>>>     printf("%s\n", buffer);
>>>     printf("Total bytes read = %d\n", num_read_bytes);
>>>     free(buffer);
>>>     hdfsCloseFile(fs, readFile);
>>>     hdfsDisconnect(fs);
>>> }
>>>
>>> and I am able to see all the contents of the file printed out
>>> successfully.
>>>
>>> But when I try to use the zero copy API like -
>>>
>>> #include "hdfs.h"
>>> #include <stdio.h>
>>> #include <string.h>
>>> #include <stdlib.h>
>>> #include <stdint.h>
>>> #include <inttypes.h>
>>>
>>> int main(int argc, char **argv) {
>>>     hdfsFS fs = hdfsConnect("127.0.0.1", 9000);
>>>     const char* readPath = "/lineitem.tbl";
>>>     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, 0, 0, 0);
>>>     if(!readFile) {
>>>           fprintf(stderr, "Failed to open %s for reading!\n", readPath);
>>>           exit(-1);
>>>     }
>>>     if (!hdfsFileIsOpenForRead(readFile)) {
>>>         fprintf(stderr, "hdfsFileIsOpenForRead: we just opened a file
>>> with O_RDONLY, and it did not show up as 'open for read'\n");
>>>         exit(-1);
>>>     }
>>>     int size_in_bytes = hdfsAvailable(fs, readFile);
>>>     fprintf(stderr, "hdfsAvailable: %d\n", size_in_bytes);
>>>     struct hadoopRzOptions *opts = NULL;
>>>     opts = hadoopRzOptionsAlloc();
>>>     if (!opts) {
>>>         fprintf(stderr, "Unable to set zero copy options\n");
>>>         exit(-1);
>>>     }
>>>     if (hadoopRzOptionsSetSkipChecksum(opts, 1)) {
>>>         fprintf(stderr, "Unable to set skip checksum\n");
>>>         exit(-1);
>>>     }
>>>     /*if (hadoopRzOptionsSetByteBufferPool(opts, NULL)) {
>>>         fprintf(stderr, "Unable to set byte buffer pool\n");
>>>         exit(-1);
>>>     }*/
>>>     struct hadoopRzBuffer *hbuffer = NULL;
>>>     //hadoopRzBufferFree(readFile, hbuffer);
>>>     hbuffer = hadoopReadZero(readFile, opts, 100);
>>>     if (!hbuffer) {
>>>         fprintf(stderr, "Unable to read zero copy hdfs file\n");
>>>         exit(-1);
>>>     }
>>>     char *buffer; buffer = (char*)malloc(sizeof(char)*(size_in_bytes+1));
>>>     memset(buffer, 0, sizeof(buffer));
>>>     buffer = hadoopRzBufferGet(hbuffer);
>>>     int num_read_bytes = hadoopRzBufferLength(hbuffer);
>>>     printf("Actual size = %d\n", size_in_bytes);
>>>     printf("Bytes read = %d\n", num_read_bytes);
>>>     //printf("%s\n", buffer);
>>>     //printf("%s\n", buffer[size_in_bytes - 1000]);
>>>     hdfsCloseFile(fs, readFile);
>>> }
>>>
>>> I get the error - "Unable to read zero copy hdfs file" which means that
>>> hbuffer didn't read anything in.
>>>
>>> Am I doing something incorrectly?
>>>
>>> Thank you,
>>>
>>> --
>>> Pratyush Das
>>>
>>
>
> --
> Pratyush Das
>


-- 
Pratyush Das

Reply via email to