Hi,

I'm just a beginner in Hadoop and i hope my question isn't too silly...

I've tried to load the apachelog into hdfs,  and i'm using hive to select
them.
select count(*) doesnt returns anything and i cant understand what's written
in the logfile,
i will be grateful if someone could assist me and help me see what went
wrong.

This's what i did.

(1) creating table
CREATE TABLE apachelog (
ipaddress STRING,
identd STRING,
user STRING,
time STRING,
request string,
returncode INT,
size INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe'
WITH SERDEPROPERTIES (
'serialization.format'='org.apache.hadoop.hive.serde2.thrift.TCTLSeparatedProtocol',
'quote.delim'='("|\\[|\\])',
'field.delim'=' ',
'serialization.null.format'='-')
STORED AS TEXTFILE;

(2)Load data into it
hive> load data local inpath
'/home/hoge/localhost_access_log.2012-07-25.txt.gz' into table apachelog;

(3)check with select count(*) 
hive> select count(*) from apachelog;
24693

Everything goes well here, then i load another log with around 230,000 lines
of data.

(4)Load data
hive> load data local inpath
'/home/hoge/localhost_access_log.2012-07-26.txt.gz' into table apachelog;

(5)select count(*)

hive> select count(*) from apachelog;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201208091522_0001, Tracking URL =
http://192.XX.XX.XX:40030/jobdetails.jsp?jobid=job_201208091522_0001
Kill Command = /home/hadoop/download/hadoop-1.0.3/libexec/../bin/hadoop job 
-Dmapred.job.tracker=192.168.232.151:56001 -kill job_201208091522_0001
Hadoop job information for Stage-1: number of mappers: 2; number of
reducers: 1
2012-08-09 15:25:05,751 Stage-1 map = 0%,  reduce = 0%
2012-08-09 15:25:17,840 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 2.61
sec
2012-08-09 15:25:18,846 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 2.61
sec
2012-08-09 15:25:19,888 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 2.61
sec
.......
2012-08-09 15:48:04,281 Stage-1 map = 50%,  reduce = 17%, Cumulative CPU
16.47 sec

The message of 2012-08-09 15:48:04,281 Stage-1 map = 50%,  reduce = 17%,
Cumulative CPU  continues more than 30minutes without any changes.

When i checked on the TaskTracker logs, it shows the following log endless.
2012-08-09 15:48:00,680 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208091522_0001_r_000000_0 0.16666667% reduce > copy (1 of 2 at
0.00 MB/s) >

it keeps on transfering data?? at 0.00MB/s.

And after  a long while it shows something like the following.
2012-08-09 16:20:12,581 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_201208021116_0028_m_000000_3: Task
attempt_201208021116_0028_m_000000_3 failed to report status for 600
seconds. Killing!


Can anyone help me in this?

I'm stuck here..

Thank you so much for your attention.


-- 
View this message in context: 
http://old.nabble.com/running-select-count-in-hive-keeps-on-pending-tp34275126p34275126.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Reply via email to