[
https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349121#comment-16349121
]
Anu Engineer edited comment on HADOOP-15204 at 2/1/18 7:16 PM:
---------------------------------------------------------------
[[email protected]] I did look at getLongBytes. The issue is that it does not
provide the same code improvement as {{getTimeDuration}} and
{{setTimeDuration}}.
Let me support my assertion with some examples that you can look at.
Let us suppose that I have a case to read the standard configured size of
containers. This is a configurable parameter in Ozone.
Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back
to setLong
//
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576);
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB
or what,
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB
// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576);
// But this is bad too, now I can not set container size in MB, since setLong
takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we
tried which triggered this change.
Now let us go to back get examples –
{code:java}
// getLongBytes forces us to write code in a certain way which does not match
with
// rest of code like getTimeDuration. For example, if I have a case where I
want to read // the read the value in MB, and the ozone-default.xml is
configured to GB.
// the case that this line solves
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);
long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in
bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE,
defaultValueInBytes)).toMBs();{code}
Now imagine repeating this code many times all over the code base, plus, the
BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some
equivalent code.
In other words, I submit that these following factors make getLongBytes a less
desirable function compared to getTimeDuration/getStorageSize.
* lack of symetric function in getLongBytes - Without a set function it
degenerates to a messy set of multiplication and division each time we have to
use a storage Unit. With this those issue are cleanly isolated to a single
place.
* Lack of a formal storage unit - lack of a formal value like to
TimeUnit/StorageUnit makes the code less readable (see the example
setStorageSize) where the context also tells you the unit that we are operating
with.
* Does not suit our usage pattern - Ozone code follows the patterns in HDFS.
Hence the default value mapping is not handled well in getLongBytes.
* Units and conversion is needed - In ozone, there are several places where we
convert these numbers. Users can specify Quota as a easy to read Storage Units,
like 5 GB or 10 TB. We have a dedicated code for handling that, we use the
storage numbers to specify how large the off-heap size should be, or as I am
showing in this example, container sizes etc.
* The getLongBytes by itself does not address the lack of StorageUnits, what
this patch does is that it introduces a class that is very similar to
{{TimeUnit}}. This makes the code more readable and easy to maintain.
was (Author: anu):
[[email protected]] I did look at getLongBytes. The issue is that it does not
provide the same code improvement as {{getTimeDuration}} and
{{setTimeDuration}}.
Let me support my assertion with some examples that you can look at.
Let us suppose that I have a case to read the standard configured size of
containers. This is a configurable parameter in Ozone.
Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back
to setLong
//
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576);
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB
or what,
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB
// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576);
// But this is bad too, now I can not set container size in MB, since setLong
takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we
tried which triggered this change.
Now let us go to back get examples --
{code:java}
// getLongBytes forces us to write code in a certain way which does not match
with
// rest of code like getTimeDuration. For example, if I have a case where I
want to read // the read the value in MB, and the ozone-default.xml is
configured to GB.
// the case that this line solves
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);
long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in
bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE,
defaultValueInBytes)).toMBs();{code}
Now imagine repeating this code many times all over the code base, plus, the
BYTES.fromBytes(xxx).ToMBs() is a fuction from this patch. We need some
equivalent code.
In other words, I submit that these following factors make getLongBytes a less
desirable function compared to getTimeDuration/getStorageSize.
* lack of symetric function in getLongBytes - Without a set function it
degenerates to a messy set of multiplication and division each time we have to
use a storage Unit. With this those issue are cleanly isolated to a single
place.
* Lack of a format storage unit - lack of a formal value like to
TimeUnit/StorageUnit makes the code less readable (see the example
setStorageSize) where the context also tells you the unit that we are operating
with.
* Does not suit our usage pattern - Ozone code follows the patterns in HDFS.
Hence the default value mapping is not handled well in getLongBytes.
* Units and Conversation is needed - In ozone, there are several places where
we convert these numbers. Users can specify Quota as a easy to read Storage
Units, like 5 GB or 10 TB. We have a dedicated code for handling that, we use
the storage numbers to specify how large the off-heap size should be, or as I
am showing in this example, container sizes etc.
* The getLongBytes by itself does not address the lack of StorageUnits, what
this patch does is that it introduces a class that is very similar to
{{TimeUnit}}. This makes the code more readable and easy to maintain.
> Add Configuration API for parsing storage sizes
> -----------------------------------------------
>
> Key: HADOOP-15204
> URL: https://issues.apache.org/jira/browse/HADOOP-15204
> Project: Hadoop Common
> Issue Type: Improvement
> Components: conf
> Affects Versions: 3.1.0
> Reporter: Anu Engineer
> Assignee: Anu Engineer
> Priority: Minor
> Fix For: 3.1.0
>
> Attachments: HADOOP-15204.001.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This
> JIRA proposes to add an API like {{Configuration.getStorageSize}} which will
> allow users
> to specify units like KB, MB, GB etc. This is JIRA is inspired by HDFS-8608
> and Ozone. Adding {{getTimeDuration}} support was a great improvement for
> ozone code base, this JIRA hopes to do the same thing for configs that deal
> with disk and memory usage.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]