The example that reproduces the issue along with data is attached in the very first email on this thread
On Thursday, March 6, 2014, Cheolsoo Park <[email protected]> wrote: > So that's backend. It has nothing to do with the filter extractor. The > filter extractor is for predicate push down on the frontend. > > The code that you're showing is the entry point where Pig mapper begins. So > it doesn't tell us much. The mapper is given a segment of physical plan > (pipeline), and the getNext() call pulls records from roots to leaves one > by one. > > You need to find where time is spent in the pipeline. If you're suspecting > Filter By is slow, then it should be POFilter. Please take thread dump > multiple times and see the stack traces. Unless you provide an example that > reproduces the error, I cannot help you more. > > > > On Thu, Mar 6, 2014 at 6:03 PM, Suhas Satish > <[email protected]<javascript:;>> > wrote: > > > Hi Cheolsoo, > > This is where its hanging - > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;* > > > > org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/ > > PigGenericMapBase.java: > > > > protected void *runPipeline*(PhysicalOperator leaf) throws IOException, > > InterruptedException { > > while(true){ > > Result res = leaf.getNext(DUMMYTUPLE); > > if(res.returnStatus==POStatus.STATUS_OK){ > > collect(outputCollector,(Tuple)res.result); > > continue; > > } > > .... > > > > Cheers, > > Suhas. > > > > > > On Thu, Mar 6, 2014 at 5:56 PM, Cheolsoo Park <[email protected]> > > wrote: > > > > > Hi Suhas, > > > > > > No. The issue with PIG-3461 is that Pig hangs at the query compilation > > with > > > a big filter expression before the job is submitted. > > > In addition, the filter extractor was totally rewritten in 0.12. > > > https://issues.apache.org/jira/browse/PIG-3461 > > > > > > Where exactly is your job hanging? Backend or frontend? Are you running > > it > > > in local mode or remote mode? > > > > > > Thanks, > > > Cheolsoo > > > > > > p.s. > > > There are two known issues with the new filter extractor in 0.12.0 > > although > > > these are probably not related to your issue- > > > https://issues.apache.org/jira/browse/PIG-3510 > > > https://issues.apache.org/jira/browse/PIG-3657 > > > > > > > > > On Thu, Mar 6, 2014 at 5:30 PM, Suhas Satish <[email protected]> > > > wrote: > > > > > > > I seem to be hitting this issue in pig-0.12 although it claims to be > > > fixed > > > > in pig-0.12 > > > > https://issues.apache.org/jira/browse/PIG-3395 > > > > Large filter expression makes Pig hang > > > > > > > > Cheers, > > > > Suhas. > > > > > > > > > > > > On Thu, Mar 6, 2014 at 4:26 PM, Suhas Satish <[email protected] > > > > > > wrote: > > > > > > > > > This is the pig script - > > > > > > > > > > %default previousPeriod $pPeriod > > > > > > > > > > tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS > > (WEEK:int, > > > > > DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int); > > > > > > > > > > gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD; > > > > > > > > > > *pWeek = FILTER gTWeek BY PERIOD == $previousPeriod;* > > > > > > > > > > pWeekRanked = RANK pWeek BY WEEK ASC DENSE; > > > > > > > > > > gpWeekRanked = FOREACH pWeekRanked GENERATE $0; > > > > > store gpWeekRanked INTO 'gpWeekRanked'; > > > > > describe gpWeekRanked; > > > > > > > > > > > > > > > Without the filter statement, the code runs without hanging. > > > > > > > > > > Cheers, > > > > > Suhas. > > > > > > > > > > > > > > > On Thu, Mar 6, 2014 at 3:05 PM, Suhas Satish < > [email protected] > > > > >wrote: > > > > > > > > > >> Hi > > > > >> I launched the attached pig job on pig-12 with hadoop MRv1 with > the > > > > >> attached data, but the FILTER function causes the job to get stuck > > in > > > an > > > > >> infinite loop. > > > > >> > > > > >> pig -p pPeriod=201312 -f test.pig > > > > >> > > > > >> The thread in question seems to be stuck forever inside while loop > > of > > > > >> runPipel -- Cheers, Suhas.
