For each file inside the directory $output, I do a cat to the file and generate 
a sha256 hash. This script takes 9 minutes to read 105 files, with the total 
data of 556MB and generate the digests. Is there a way to make this script 
faster? Maybe generate digests in parallel?

for path in $output
do
    # sha256sum
    digests[$count]=$( $HADOOP_HOME/bin/hdfs dfs -cat "$path" | sha256sum | awk 
'{ print $1 }')
    (( count ++ ))
done


Thanks,

Reply via email to