Re: Implementing a input format that splits according to column size

2011-09-12 Thread Jonathan Ellis
On Mon, Sep 12, 2011 at 8:31 AM, Brandon Williams wrote: > It's feasible, but not entirely easy.  Essentially you need to page > through the row since you can't know how large it is beforehand.  IIRC > though, this breaks the current input format contract, since an entire > row is expected to be r

Re: Implementing a input format that splits according to column size

2011-09-12 Thread Brandon Williams
On Mon, Sep 12, 2011 at 1:54 PM, Tharindu Mathew wrote: > Thanks Brandon for the clarification. > > I'd like to support a use case where an index is built in a row in a CF. If you're just _building_ the row, the current state of things will work just fine. The trouble starts when you need to rea

Re: Implementing a input format that splits according to column size

2011-09-12 Thread Tharindu Mathew
Thanks Brandon for the clarification. I'd like to support a use case where an index is built in a row in a CF. So, as a starting point for a query, a known row with a larger number of columns will have to be selected. The split to the hadoop nodes should start at that level. Is this a common use

Re: Implementing a input format that splits according to column size

2011-09-12 Thread Brandon Williams
On Mon, Sep 12, 2011 at 12:35 AM, Tharindu Mathew wrote: > Hi, > > I plan to do $subject and contribute. > > Right now, the hadoop integration splits according to the number of rows in > a slice predicate. This doesn't scale if a row has a large number of > columns. > > I'd like to know from the c