Thanks Zhu for your reply, your points makes sense to me. Regards Abhishek
On Oct 3, 2012, at 8:14 PM, TianYi Zhu <[email protected]> wrote: > Hi Abhishek, > > I've no idea with the optimizer. In my opinion, SQL > like programming language is hard to optimize, hive may slower than pig in > many cases. But on the earth, for every hadoop job, there must be a > best(time or space) sequence of map/reduce phases. you should rewrite your > pig/hive script following something like: > http://ofps.oreilly.com/titles/9781449302641/making_pig_fly.html fits the > optimizer well then it might generate the best sequence. > > my suggestion is, leave hive as a data warehouse, and do most jobs in pig. > As you asked before"1) what hive is good at?", if you have a complex join > written in SQL, you can directly apply it on hive, but it will take you lot > of time to translate it to pig script. > > Thanks, > TianYi > > On Thu, Oct 4, 2012 at 9:41 AM, Abhishek <[email protected]> wrote: > >> Hi Zhu, >> >> Thanks for the reply.I am running some querys where is slower than pig. >> >> I was also thinking that pig optimizer is better than hive optimizer. >> >> Regards >> Abhi >> >> Sent from my iPhone >> >> On Oct 3, 2012, at 7:15 PM, TianYi Zhu <[email protected]> >> wrote: >> >>> from amazon web site: >>> http://aws.amazon.com/elasticmapreduce/faqs/#hive-8 >>> >>> >>> Q: When should I use Hive vs. PIG? >>> >>> Hive and PIG both provide high level data-processing languages with >> support >>> for complex data types for operating on large datasets. The Hive language >>> is a variant of SQL and so is more accessible to people already familiar >>> with SQL and relational databases. Hive has support for partitioned >> tables >>> which allow Amazon Elastic MapReduce job flows to pull down only the >> table >>> partition relevant to the query being executed rather than doing a full >>> table scan. Both PIG and Hive have query plan optimization. PIG is able >> to >>> optimize across an entire scripts while Hive queries are optimized at the >>> statement level. >>> >>> Ultimately the choice of whether to use Hive or PIG will depend on the >>> exact requirements of the application domain and the preferences of the >>> implementers and those writing queries. >>> >>> >>> On Thu, Oct 4, 2012 at 7:52 AM, Abhishek <[email protected]> >> wrote: >>> >>>> Hi all, >>>> >>>> Can we discuss performance of pig vs hive >>>> >>>> 1) what hive is good at? >>>> 2) what pig is good at? >>>> 3) Hive optimizer vs pig optimizer >>>> 4) hive limitations vs pig limitations >>>> >>>> Regards >>>> Abhi >>>> >>>> Sent from my iPhone >>
