Re: Mapping output of Hourglss jobs to hive tables

Matthew Hayes Wed, 12 Feb 2014 09:22:20 -0800

The jobs have methods getOutputSchemaName() and getOutputSchemaNamespace()
that can be overridden.  By default the strings are being derived from the
class and its package.  Just extend PartitionCollapsingIncrementalJob for
example and override them.  I just filed DATAFU-32 to make it easier to
override the defaults.


Regarding your other question about the key, when you construct the hive
table can you not ignore the key?


On Wed, Feb 12, 2014 at 2:06 AM, Abhishek Gayakwad <[email protected]>wrote:

> Hello,
>
> After running a partition collapsing or preserving job, the generated
> container file has schema as
> PartitionPreservingIncrementalJobOutput/PartitionCollapsingIncrementalJobOutput
> which further has key and value record types in it. When I create hive
> tables using this data, it has two columns for key and value of struct
> type. This takes away readability and is not what I want. I want to store
> only value object in output file. I there any way where I can get rid off
> Partition*JobOutput schema and avoid writing keys as well ?
>
> Thanks
> Abhishek
>
>  --
> You received this message because you are subscribed to the Google Groups
> "DataFu" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: Mapping output of Hourglss jobs to hive tables

Reply via email to