[
https://issues.apache.org/jira/browse/HADOOP-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14050819#comment-14050819
]
Arpit Agarwal edited comment on HADOOP-10281 at 7/2/14 10:46 PM:
-----------------------------------------------------------------
Hi [~chrili],
Thanks for the updated changes! I am basically +1 for the "preview" patch,
minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your
side? If so what do you think of just eliminating {{HistoryRpcScheduler}}.
_____
Just thinking aloud about a possible future optimization (and please don't
bother doing it in the same Jira even if it makes sense!). I think we can
eliminate the periodic decay timer and perform the decay activity very cheaply
in the context of {{getPriorityLevel}} in a lazy manner. We would need to store
a {{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a
last updated timestamp for the totalCount. Then we could do the following:
# If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater
than the decay period, update {{lastUpdatedTimestamp}} for the entry first and
multiply the {{callCounts}} entry by {{decayFactor}}.
# If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the
decay period, update both the global timestamp and the timestamp for the
corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}}
# In either case if the time elapsed is greater than some factor n of the
{{decayPeriod}}, then we can multiply the corresponding count by
{{decayFactor}}^n.
# If either of the two conditions was true, recompute {{scheduleCacheRef}}.
The other advantage is we could use a smaller {{decayPeriod}} and a larger
{{decayFactor}} without increasing timer activity, which should yield a
smoother decay curve. The only missing part would be the periodic cleanup of
unused {{identities}}.
was (Author: arpitagarwal):
Hi [~chrili],
Thanks for the updated changes! I am basically +1 for the "preview" patch,
minus the {{HistoryRpcScheduler}}. Is this tested and ready to commit from your
side? If so what do you think of just eliminating {{HistoryRpcScheduler}}.
_____
Just thinking aloud about a possible future optimization (and please don't
bother doing it in the same Jira even if it makes sense!). I think we can
eliminate the periodic decay timer and perform the decay activity very cheaply
in the context of {{getPriorityLevel}}. We would need to store a
{{lastUpdatedTimestamp}} with each {{scheduleCacheRef}} entry, and also a last
updated timestamp for the totalCount. Then we could do the following:
# If the {{lastUpdatedTimestamp}} for the identity's cache entry is greater
than the decay period, update {{lastUpdatedTimestamp}} for the entry first and
multiply the {{callCounts}} entry by {{decayFactor}}.
# If the {{lastUpdatedTimestamp}} for the {{totalCount}} is greater than the
decay period, update both the global timestamp and the timestamp for the
corresponding {{cacheEntry}} and then divide {{totalCount}} by {{decayFactor}}
# If for either of the above, the time elapsed is greater than some factor n of
the {{decayPeriod}}, then we can multiply the corresponding timestamp by
{{decayFactor}}^n. This can occur after a long period of inactivity.
# If either of the two conditions was true, recompute {{scheduleCacheRef}}.
The other advantage is we could use a smaller {{decayPeriod}} and a larger
{{decayFactor}} without increasing timer activity, which should yield a
smoother decay curve. The only missing part would be the periodic cleanup of
unused {{identities}}.
> Create a scheduler, which assigns schedulables a priority level
> ---------------------------------------------------------------
>
> Key: HADOOP-10281
> URL: https://issues.apache.org/jira/browse/HADOOP-10281
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Chris Li
> Assignee: Chris Li
> Attachments: HADOOP-10281-preview.patch, HADOOP-10281.patch,
> HADOOP-10281.patch, HADOOP-10281.patch
>
>
> The Scheduler decides which sub-queue to assign a given Call. It implements a
> single method getPriorityLevel(Schedulable call) which returns an integer
> corresponding to the subqueue the FairCallQueue should place the call in.
> The HistoryRpcScheduler is one such implementation which uses the username of
> each call and determines what % of calls in recent history were made by this
> user.
> It is configured with a historyLength (how many calls to track) and a list of
> integer thresholds which determine the boundaries between priority levels.
> For instance, if the scheduler has a historyLength of 8; and priority
> thresholds of 4,2,1; and saw calls made by these users in order:
> Alice, Bob, Alice, Alice, Bob, Jerry, Alice, Alice
> * Another call by Alice would be placed in queue 3, since she has already
> made >= 4 calls
> * Another call by Bob would be placed in queue 2, since he has >= 2 but less
> than 4 calls
> * A call by Carlos would be placed in queue 0, since he has no calls in the
> history
> Also, some versions of this patch include the concept of a 'service user',
> which is a user that is always scheduled high-priority. Currently this seems
> redundant and will probably be removed in later patches, since its not too
> useful.
--
This message was sent by Atlassian JIRA
(v6.2#6252)