Hi, Would anyone like to review this pull request: https://github.com/apache/incubator-apex-malhar/pull/279 This is based on the design discussed in previous mails. I am open to discuss it again if anyone is interested.
-Priyanka On Mon, Apr 4, 2016 at 10:47 PM, Priyanka Gugale <[email protected]> wrote: > Okay so I will open the pull request soon. > > -Priyanka > > On Wed, Mar 23, 2016 at 1:35 PM, Yogi Devendra < > [email protected]> wrote: > >> This looks OK. Let us build it incrementally. >> >> ~ Yogi >> >> On 23 March 2016 at 13:24, Sandeep Deshmukh <[email protected]> >> wrote: >> >> > I would suggest that we go ahead with design as suggested by Priyanka >> where >> > we have bandwidth setup for each operator separately. We can later >> extend >> > this for bandwidth to be shared with different input operators or for >> the >> > DAG as a whole. >> > >> > Regards, >> > Sandeep >> > >> > On Wed, Mar 23, 2016 at 11:51 AM, Priyanka Gugale < >> > [email protected]> >> > wrote: >> > >> > > Right now it's not for output operator, but one can very well use >> > bandwidth >> > > manager to keep track of bandwidth usage and limit your output speed. >> The >> > > bigger challenge there would be, you won't be able to process window >> data >> > > sent by upstream operator in same window. For that you need to do more >> > than >> > > just bandwidth control. >> > > So I would say, bandwidth control feature can be used as it is for >> output >> > > operator as well, only we need to do more than just bandwidth >> limitation >> > > for output operators. >> > > >> > > -Priyanka >> > > >> > > On Wed, Mar 23, 2016 at 11:47 AM, Priyanka Gugale < >> > > [email protected]> >> > > wrote: >> > > >> > > > That's a good question Chaitanya, Right now the bandwidth control >> is at >> > > > Input Operator level and not application level. So if you have two >> > input >> > > > operator you need to set bandwidth on both separately by this >> design. >> > > > May be it would be good to have bandwidth control at Application >> level >> > > > than operator level. Let me think if I can modify the design to do >> > that. >> > > If >> > > > you have any ideas for same, please share them. >> > > > >> > > > -Priyanka >> > > > >> > > > On Wed, Mar 23, 2016 at 11:47 AM, Yogi Devendra < >> > > > [email protected]> wrote: >> > > > >> > > >> Priyanka, >> > > >> >> > > >> From the design description it is not clear how it will be used to >> > > control >> > > >> output bandwidth (point #2,3,4 mentioned by Sandeep) >> > > >> >> > > >> ~ Yogi >> > > >> >> > > >> On 23 March 2016 at 11:39, Chaitanya Chebolu < >> > [email protected] >> > > > >> > > >> wrote: >> > > >> >> > > >> > This is very useful feature. >> > > >> > I would like to know, how you are distributing the bandwidth for >> the >> > > >> below >> > > >> > situation: >> > > >> > - Two input operators say i1 and i2 are deployed on same node and >> > both >> > > >> the >> > > >> > operators have bandwidthManager as plugin. >> > > >> > >> > > >> > On Fri, Mar 18, 2016 at 5:43 PM, Priyanka Gugale < >> > > >> [email protected] >> > > >> > > >> > > >> > wrote: >> > > >> > >> > > >> > > Hi, >> > > >> > > >> > > >> > > Thanks for inputs Sandeep, would take care of those points. >> > > >> > > >> > > >> > > Here is high level design we are considering, We would have >> > > following >> > > >> > > components: >> > > >> > > *1.* *BandwidthManager* >> > > >> > > This keeps track of current bandwidth usage of system and takes >> > > >> decision >> > > >> > if >> > > >> > > requested data bandwidth can be used right away or not. To do >> this >> > > it >> > > >> > > used Leaky >> > > >> > > bucket <https://en.wikipedia.org/wiki/Leaky_bucket> algorithm >> > where >> > > >> it >> > > >> > > emits data as long as it has not overused bandwidth (i.e. >> > bandwidth >> > > >> > > consumption is >=0) and then wait to accumulate bandwidth for a >> > > while >> > > >> > (till >> > > >> > > bandwidth goes from -ve value to +ve). >> > > >> > > >> > > >> > > *2. BandwidthLimitingInputOperator* >> > > >> > > Any Input operator which want to implement bandwidth >> restriction >> > > >> should >> > > >> > > implement BandwidthLimitingInputOperator. The operator have >> > abstract >> > > >> > method >> > > >> > > to initialize instance of BandwidthManager and a method to >> emit >> > > tuple >> > > >> > with >> > > >> > > bandwidth restriction to emit tuples as per available >> bandwidth. >> > > >> > > >> > > >> > > *3. BandwidthPartitioner* >> > > >> > > Bandwidth partitioner is introduced for static partitioning. If >> > > static >> > > >> > > partitioning is used by default StatelessPartitioner class is >> > > >> > initialized. >> > > >> > > With bandwidth restriction we want to equally divide bandwidth >> > > amongst >> > > >> > > available partitions. BandwidthPartitioner should take care of >> it. >> > > It >> > > >> > > extends StatelessPartitioner, it just sets right bandwidth on >> all >> > > >> > > partitions after StatelessPartitioner creates/deletes >> partitiolns. >> > > In >> > > >> > case >> > > >> > > of dynamic partitioning the operator implementing >> > definePartitions, >> > > >> > should >> > > >> > > take care of bandwidth distribution. >> > > >> > > >> > > >> > > This design takes care of basic bandwidth restriction, also >> takes >> > > >> care of >> > > >> > > partitions by equally distributing available bandwidth among >> all >> > > >> > > partitions. Also this is open enough to do further >> modifications >> > to >> > > >> take >> > > >> > > care of complex situations. >> > > >> > > >> > > >> > > Let me know your opinion on what else we can do to design it >> > better. >> > > >> > > >> > > >> > > -Priyanka >> > > >> > > >> > > >> > > On Thu, Mar 3, 2016 at 10:11 AM, Sandeep Deshmukh < >> > > >> > [email protected] >> > > >> > > > >> > > >> > > wrote: >> > > >> > > >> > > >> > > > The main purpose is not to handle back pressure but to limit >> > > >> bandwidth >> > > >> > > > usage by applications. This is useful in ingestion use cases. >> > > >> Typically >> > > >> > > > user needs to ingest say up to 1GB per sec and not more. The >> > > tuple >> > > >> > size >> > > >> > > > may vary based on messages based tuples (few KBs) or block >> > tuples >> > > >> for >> > > >> > > files >> > > >> > > > (few MBs). Bandwidth manager will take max bandwidth that >> can be >> > > >> > utilized >> > > >> > > > by the application and will take care of sharing that across >> > > >> partitions >> > > >> > > > etc. >> > > >> > > > >> > > >> > > > Priyanka: You could also consider following in your design >> > > >> > > > >> > > >> > > > 1. Limiting input rate (across partitions) >> > > >> > > > 2. Limiting output rate (across partitions) >> > > >> > > > 3. Specifying total bandwidth that the Application can >> > utilize >> > > >> > > including >> > > >> > > > input and output? Not sure if this is required. Need >> comments >> > > >> from >> > > >> > > > others >> > > >> > > > here. >> > > >> > > > 4. Include default implementation that will handle 1 and >> 2, >> > and >> > > >> > anyone >> > > >> > > > interested in having their own Bandwidth manager should be >> > able >> > > >> to >> > > >> > > > extend >> > > >> > > > the default one. >> > > >> > > > 5. Can you also look at including/extending tuples per >> sec as >> > > >> > pointed >> > > >> > > > out by Tim/Chinmay. >> > > >> > > > >> > > >> > > > Regards, >> > > >> > > > Sandeep >> > > >> > > > >> > > >> > > > On Thu, Mar 3, 2016 at 12:23 AM, Timothy Farkas < >> > > >> [email protected]> >> > > >> > > > wrote: >> > > >> > > > >> > > >> > > > > Not sure if this is helpful, but there is already a >> utility in >> > > >> Malhar >> > > >> > > for >> > > >> > > > > converting tuples per second to tuples per window. This >> allows >> > > the >> > > >> > user >> > > >> > > > to >> > > >> > > > > define a property in tuples per second, then the operator >> can >> > > >> convert >> > > >> > > > that >> > > >> > > > > to tuples per window so it emits the correct number of >> tuples >> > > per >> > > >> > > window. >> > > >> > > > > >> > > >> > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > >> > >> https://github.com/apache/incubator-apex-malhar/blob/master/library/src/main/java/com/datatorrent/lib/util/time/WindowUtils.java >> > > >> > > > > >> > > >> > > > > On Wed, Mar 2, 2016 at 10:41 AM, Chinmay Kolhatkar < >> > > >> > > > > [email protected]> >> > > >> > > > > wrote: >> > > >> > > > > >> > > >> > > > > > Hi Priyanka, >> > > >> > > > > > >> > > >> > > > > > Indeed this is a useful feature. >> > > >> > > > > > >> > > >> > > > > > I believe number bytes consumed per sec can as well >> > translate >> > > to >> > > >> > > number >> > > >> > > > > of >> > > >> > > > > > tuples consumed per sec. >> > > >> > > > > > >> > > >> > > > > > If above is correct, won't back pressure that is handled >> by >> > > >> > > > bufferserver >> > > >> > > > > > help in your use case? >> > > >> > > > > > >> > > >> > > > > > Thanks, >> > > >> > > > > > Chinmay. >> > > >> > > > > > On 2 Mar 2016 4:49 p.m., "Priyanka Gugale" < >> > > >> > [email protected] >> > > >> > > > >> > > >> > > > > > wrote: >> > > >> > > > > > >> > > >> > > > > > > Many times we need to put bandwidth restrictions or put >> > some >> > > >> > limit >> > > >> > > on >> > > >> > > > > > input >> > > >> > > > > > > operator for number of bytes to be consumed per second. >> > As I >> > > >> > > > understand >> > > >> > > > > > in >> > > >> > > > > > > Apex there is no direct support for this feature. >> > > >> > > > > > > >> > > >> > > > > > > I am planning to write a bandwidth manager which will >> help >> > > in >> > > >> > > > limiting >> > > >> > > > > > > bandwidth at Input operator. Let me know if there are >> any >> > > >> better >> > > >> > > > > > > alternative ways. I will soon publish design for >> Bandwidth >> > > >> > Manager >> > > >> > > I >> > > >> > > > am >> > > >> > > > > > > planning to write. >> > > >> > > > > > > >> > > >> > > > > > > -Priyanka >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > >> > > >> > > > >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > >> > >
