After configuring hadoop-plugin on eclipse which we learnt in our previous blog(Hadoop-eclipse-plugin installation), we now write our first Word count Map'reduce program using eclipse and maven.
Before we jump into program, let's understand how the job flow works through YARN implementation when map reduce program is submitted by client.
In Hadopo 1.x version, there are two major components which works in Master-Slave fashion.
Since Job tracker is responsible for both resource management (assigning resources to each job) and job scheduling (assigning task to task trackers and monitoring task progress) in a single node, there was a scalability issue in large HDFS clusters with more than 4000 nodes. To overcome this issue, YARN is implemented.
YARN (Yet Another Resource Negotiator) is the framework responsible for providing the computational resources (e.g., CPUs, memory, etc.) needed for application executions.
Job Start up Phase:
Task Execution Phase:
Before we jump into program, let's understand how the job flow works through YARN implementation when map reduce program is submitted by client.
In Hadopo 1.x version, there are two major components which works in Master-Slave fashion.
- Job Tracker : This allocates resources required to run a Map reduce job and scheduling activities.
- Task tracker : These are initiated by Job tracker to process individual tasks.
Since Job tracker is responsible for both resource management (assigning resources to each job) and job scheduling (assigning task to task trackers and monitoring task progress) in a single node, there was a scalability issue in large HDFS clusters with more than 4000 nodes. To overcome this issue, YARN is implemented.
YARN (Yet Another Resource Negotiator) is the framework responsible for providing the computational resources (e.g., CPUs, memory, etc.) needed for application executions.
- The fundamental idea of YARN is to split up the two major responsibilities of the JobTracker i.e. resource management and job scheduling/monitoring, into separate daemons: a global ResourceManager and per-application ApplicationMaster.
- Task Tracker is replaced with Node Manager in YARN which is a per-machine framework agent and it is responsible for containers, monitoring their resource usage (CPU, memory, disk, network) and reporting the same to the Resource Manager.
Job Flow :
- Client submits MapReduce job by interacting with Job objects (Client runs in it’s own JVM).
- Client Job’s code interacts with Resource Manager to acquire application meta-data, such as application id and moves all the job related resources to HDFS to make them available for the rest of the job and then submits the application to Resource Manager.
- Resource Manager chooses a Node Manager with available resources and requests a container for Application Master.
- Node Manager allocates container for Application Master and Application Master (MRAppMaster) will execute and coordinate MapReduce job.
Role of an Application Master:
- As noted above, Both map tasks and reduce tasks are created by Application Master.
- If the submitted job is small, then Application Master runs the job in the same JVM on which Application Master is running. It reduces the overhead of creating new container and running tasks in parallel. These small jobs are called as Uber tasks.
- Uber tasks are decided by three configuration parameters, number of mappers <= 10, number of reducers <= 1 and Input file size is less than or equal to an HDFS block size. These parameters can be configured via mapreduce.job.ubertask.maxmaps , mapreduce.job.ubertask.maxreduces , and mapreduce.job.ubertask.maxbytes properties in mapred-site.xml.
- If job doesn’t qualify as Uber task, Application Master requests containers for all map tasks and reduce tasks.
Job Start up Phase:
- The call to job.waitForCompletion() in the main driver class is where all the execution starts and this call starts communication with the Resource Manager.
- Retrieves the new Job ID or Application ID from Resource Manager.
- The Client system copies Job Resources specified via the -files, -archives, and -jar command-line arguments, as well as the job JAR file on to HDFS.
- Finally, Job is submitted by calling submitApplication() method on Resource Manager.
- Resource Manager triggers its sub-component Scheduler, which allocates containers for mapreduce job execution. Then Resource Manager starts Application Master in the container provided by the scheduler. This container will be managed by Node Manager from here on wards.
- In this phase, HDFS splits the input files into equal sized chunks or segments based on minimum split size (mapreduce.input.fileinputformat.split.minsize) property .
- Each file segment or split is passed to a unique map task if file is splittable. If File is not splittable then entire file will be provided as input to a single map task.
- These map tasks are created by Mapreduce Application Master (MRAppMaster Java Class) and reduce tasks are also created by application master based on mapreduce.job.reducer property.
- Once Containers assigned to tasks, Application Master starts containers by notifying its Node Manager.
- Node Manager copies Job resources (like job JAR file) from HDFS distributed cache and runs map or reduce tasks.
- Application Master collects task progress and status information from all tasks and aggregate values are propagated to Client or user.
- Client Node checks with Application Master for Job completion status at regular intervals of time usually every 5 seconds when job is submitted by calling runJob() method. This time interval can be configured via mapreduce.client.completion.pollinterval property.
- Once the job is completed, Application Master and Task Container clean up their working state.Job’s OutputCommitter calls the cleanup method to handle any cleanup activities.
- Job is archived by Job history server for future reference.
Thats all about the theory part. Let us now write a sample MapReduce program to count the number of words in a given file.
Tools used
- Maven 3.3.9
- Eclipse Luna
- JDK 1.8
- Hadoop 7.2.1
Configure Maven
Download maven from here and extract to C:\maven\apache-maven-3.3.9.
Add MAVEN_HOME to user variable and %MAVEN_HOME%\bin to Path variable.
Add MAVEN_HOME to user variable and %MAVEN_HOME%\bin to Path variable.
Open cmd prompt as administrator and type below command to verify if maven installation.
C:\Windows\System32>mvn -version
You will see below log if it is successfully installed
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: C:\maven\apache-maven-3.3.9\bin\..
Java version: 1.8.0_101, vendor: Oracle Corporation
Java home: C:\java\jdk1.8.0_101\jre
Default locale: en_SG, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"
Start eclipse, go to "Window -> Open Perspective -> Other". From perspectives window, you should see “Java”, Select it and click "OK".Maven home: C:\maven\apache-maven-3.3.9\bin\..
Java version: 1.8.0_101, vendor: Oracle Corporation
Java home: C:\java\jdk1.8.0_101\jre
Default locale: en_SG, platform encoding: Cp1252
OS name: "windows 10", version: "10.0", arch: "amd64", family: "dos"
Right click on package explorer and select New->Other->Maven Project
Click Next and Check Use default Workspace location and click Next
Select maven-archetype-quickstart 1.1 and click Next
Add Group ID, Artifact Id and Package name as mentioned in the below screen and click Finish.
Switch tp “Map/Reduce” perspective by clicking icon at the top right hand corner of the main eclipse panel now, as highlighted below.
After switching perspective, you will see below project in project explore along with DFS Location.
Add below content in pom.xml and save.
pom.xml
<?xml version="1.0"?>
<project
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<artifactId>WordCountMR</artifactId>
<name>WordCountMR</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-annotations</artifactId>
<version>2.7.1</version>
</dependency>
<!-- Hadoop Mapreduce Client Core -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-nfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-app</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-hs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-hs-plugins</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-web-proxy</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-sharedcachemanager</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-resourcemanager</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-nodemanager</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-applicationhistoryservice</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-registry</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-client</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-applications-unmanaged-am-launcher</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-applications-distributedshell</artifactId>
<version>2.7.1</version>
</dependency>
</dependencies>
</project>
Java Classes
Create WordCountMapper.java class and add below content.
WordCountMapper.java
package com.hdp.madreduce.example;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> collector, Reporter reporter)
throws IOException {
String line = value.toString();
StringTokenizer st = new StringTokenizer(line, " ");
while (st.hasMoreTokens()) {
word.set(st.nextToken());
collector.collect(word, one);
}
}
}
Create WordCountReducer.java class and add below content.
WordCountReducer.java
package com.hdp.madreduce.example;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
public class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> outputCollector,
Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum = sum + values.next().get();
}
outputCollector.collect(key, new IntWritable(sum));
}
}
Create WordCount.java class and add below content.
WordCount.java
import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.RunningJob; import org.apache.hadoop.mapred.TextOutputFormat; public class WordCount { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Path inputPath = new Path("hdfs://127.0.0.1:9000/input/WordCountSample.txt"); Path outputPath = new Path("hdfs://127.0.0.1:9000/output/result"); JobConf job = new JobConf(conf, WordCount.class); job.setJarByClass(WordCount.class); job.setJobName("WordCounterJob"); FileInputFormat.setInputPaths(job, inputPath); FileOutputFormat.setOutputPath(job, outputPath); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); job.setOutputFormat(TextOutputFormat.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); FileSystem hdfs = FileSystem.get(URI.create("hdfs://127.0.0.1:9000"), conf); if (hdfs.exists(outputPath)) hdfs.delete(outputPath, true); RunningJob runningJob = JobClient.runJob(job); System.out.println("job.isSuccessfull: " + runningJob.isComplete()); } }
After creating all the classes, your project explorer looks like below.
Right click on WordCount.java ->Run As -> Run on Hadoop
If program runs successfully, you should see below content in eclipse console.
.
To get detail log, add hadoop-common-2.7.1-test.sources.jar file from C:\hadoop-2.7.1\share\hadoop\common\sources
Rerun the WordCount.java class to see below log
Eclipse console
2016-09-10 09:31:28,117 INFO Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2016-09-10 09:31:28,124 INFO jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2016-09-10 09:31:28,442 WARN mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(64)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2016-09-10 09:31:28,494 WARN mapreduce.JobResourceUploader (JobResourceUploader.java:uploadFiles(171)) - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2016-09-10 09:31:28,513 INFO input.FileInputFormat (FileInputFormat.java:listStatus(283)) - Total input paths to process : 1
2016-09-10 09:31:28,660 INFO mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(198)) - number of splits:1
2016-09-10 09:31:28,907 INFO mapreduce.JobSubmitter (JobSubmitter.java:printTokens(287)) - Submitting tokens for job: job_local375951257_0001
2016-09-10 09:31:29,296 INFO mapreduce.Job (Job.java:submit(1294)) - The url to track the job: http://localhost:8080/
2016-09-10 09:31:29,297 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1339)) - Running job: job_local375951257_0001
2016-09-10 09:31:29,302 INFO mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(471)) - OutputCommitter set in config null
2016-09-10 09:31:29,311 INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(100)) - File Output Committer Algorithm version is 1
2016-09-10 09:31:29,315 INFO mapred.LocalJobRunner (LocalJobRunner.java:createOutputCommitter(489)) - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2016-09-10 09:31:29,415 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for map tasks
2016-09-10 09:31:29,416 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(224)) - Starting task: attempt_local375951257_0001_m_000000_0
2016-09-10 09:31:29,466 INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(100)) - File Output Committer Algorithm version is 1
2016-09-10 09:31:29,478 INFO util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(192)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-09-10 09:31:29,594 INFO mapred.Task (Task.java:initialize(612)) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@46ae4d2d
2016-09-10 09:31:29,605 INFO mapred.MapTask (MapTask.java:runNewMapper(756)) - Processing split: hdfs://127.0.0.1:9000/input/WordCountSample.txt:0+433
2016-09-10 09:31:29,672 INFO mapred.MapTask (MapTask.java:setEquator(1205)) - (EQUATOR) 0 kvi 26214396(104857584)
2016-09-10 09:31:29,672 INFO mapred.MapTask (MapTask.java:init(998)) - mapreduce.task.io.sort.mb: 100
2016-09-10 09:31:29,672 INFO mapred.MapTask (MapTask.java:init(999)) - soft limit at 83886080
2016-09-10 09:31:29,672 INFO mapred.MapTask (MapTask.java:init(1000)) - bufstart = 0; bufvoid = 104857600
2016-09-10 09:31:29,672 INFO mapred.MapTask (MapTask.java:init(1001)) - kvstart = 26214396; length = 6553600
2016-09-10 09:31:29,684 INFO mapred.MapTask (MapTask.java:createSortingCollector(403)) - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2016-09-10 09:31:29,816 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) -
2016-09-10 09:31:29,823 INFO mapred.MapTask (MapTask.java:flush(1460)) - Starting flush of map output
2016-09-10 09:31:29,823 INFO mapred.MapTask (MapTask.java:flush(1482)) - Spilling map output
2016-09-10 09:31:29,823 INFO mapred.MapTask (MapTask.java:flush(1483)) - bufstart = 0; bufend = 676; bufvoid = 104857600
2016-09-10 09:31:29,823 INFO mapred.MapTask (MapTask.java:flush(1485)) - kvstart = 26214396(104857584); kvend = 26214152(104856608); length = 245/6553600
2016-09-10 09:31:29,860 INFO mapred.MapTask (MapTask.java:sortAndSpill(1667)) - Finished spill 0
2016-09-10 09:31:29,875 INFO mapred.Task (Task.java:done(1038)) - Task:attempt_local375951257_0001_m_000000_0 is done. And is in the process of committing
2016-09-10 09:31:29,896 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - map
2016-09-10 09:31:29,896 INFO mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_local375951257_0001_m_000000_0' done.
2016-09-10 09:31:29,896 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(249)) - Finishing task: attempt_local375951257_0001_m_000000_0
2016-09-10 09:31:29,897 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-09-10 09:31:29,902 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(448)) - Waiting for reduce tasks
2016-09-10 09:31:29,903 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(302)) - Starting task: attempt_local375951257_0001_r_000000_0
2016-09-10 09:31:29,912 INFO output.FileOutputCommitter (FileOutputCommitter.java:<init>(100)) - File Output Committer Algorithm version is 1
2016-09-10 09:31:29,914 INFO util.ProcfsBasedProcessTree (ProcfsBasedProcessTree.java:isAvailable(192)) - ProcfsBasedProcessTree currently is supported only on Linux.
2016-09-10 09:31:30,006 INFO mapred.Task (Task.java:initialize(612)) - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@7ddb8f1c
2016-09-10 09:31:30,010 INFO mapred.ReduceTask (ReduceTask.java:run(362)) - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@46f15a4f
2016-09-10 09:31:30,028 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:<init>(196)) - MergerManager: memoryLimit=1314232704, maxSingleShuffleLimit=328558176, mergeThreshold=867393600, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2016-09-10 09:31:30,031 INFO reduce.EventFetcher (EventFetcher.java:run(61)) - attempt_local375951257_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2016-09-10 09:31:30,126 INFO reduce.LocalFetcher (LocalFetcher.java:copyMapOutput(144)) - localfetcher#1 about to shuffle output of map attempt_local375951257_0001_m_000000_0 decomp: 802 len: 806 to MEMORY
2016-09-10 09:31:30,136 INFO reduce.InMemoryMapOutput (InMemoryMapOutput.java:shuffle(100)) - Read 802 bytes from map-output for attempt_local375951257_0001_m_000000_0
2016-09-10 09:31:30,139 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:closeInMemoryFile(314)) - closeInMemoryFile -> map-output of size: 802, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->802
2016-09-10 09:31:30,141 INFO reduce.EventFetcher (EventFetcher.java:run(76)) - EventFetcher is interrupted.. Returning
2016-09-10 09:31:30,142 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-10 09:31:30,143 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(674)) - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2016-09-10 09:31:30,163 INFO mapred.Merger (Merger.java:merge(606)) - Merging 1 sorted segments
2016-09-10 09:31:30,163 INFO mapred.Merger (Merger.java:merge(705)) - Down to the last merge-pass, with 1 segments left of total size: 798 bytes
2016-09-10 09:31:30,168 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(751)) - Merged 1 segments, 802 bytes to disk to satisfy reduce memory limit
2016-09-10 09:31:30,170 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(781)) - Merging 1 files, 806 bytes from disk
2016-09-10 09:31:30,171 INFO reduce.MergeManagerImpl (MergeManagerImpl.java:finalMerge(796)) - Merging 0 segments, 0 bytes from memory into reduce
2016-09-10 09:31:30,171 INFO mapred.Merger (Merger.java:merge(606)) - Merging 1 sorted segments
2016-09-10 09:31:30,173 INFO mapred.Merger (Merger.java:merge(705)) - Down to the last merge-pass, with 1 segments left of total size: 798 bytes
2016-09-10 09:31:30,174 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-10 09:31:30,212 INFO Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2016-09-10 09:31:30,303 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1360)) - Job job_local375951257_0001 running in uber mode : false
2016-09-10 09:31:30,305 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 100% reduce 0%
2016-09-10 09:31:30,415 INFO mapred.Task (Task.java:done(1038)) - Task:attempt_local375951257_0001_r_000000_0 is done. And is in the process of committing
2016-09-10 09:31:30,419 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - 1 / 1 copied.
2016-09-10 09:31:30,420 INFO mapred.Task (Task.java:commit(1199)) - Task attempt_local375951257_0001_r_000000_0 is allowed to commit now
2016-09-10 09:31:30,434 INFO output.FileOutputCommitter (FileOutputCommitter.java:commitTask(482)) - Saved output of task 'attempt_local375951257_0001_r_000000_0' to hdfs://127.0.0.1:9000/output/result/_temporary/0/task_local375951257_0001_r_000000
2016-09-10 09:31:30,436 INFO mapred.LocalJobRunner (LocalJobRunner.java:statusUpdate(591)) - reduce > reduce
2016-09-10 09:31:30,436 INFO mapred.Task (Task.java:sendDone(1158)) - Task 'attempt_local375951257_0001_r_000000_0' done.
2016-09-10 09:31:30,436 INFO mapred.LocalJobRunner (LocalJobRunner.java:run(325)) - Finishing task: attempt_local375951257_0001_r_000000_0
2016-09-10 09:31:30,436 INFO mapred.LocalJobRunner (LocalJobRunner.java:runTasks(456)) - reduce task executor complete.
2016-09-10 09:31:31,306 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1367)) - map 100% reduce 100%
2016-09-10 09:31:31,307 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1378)) - Job job_local375951257_0001 completed successfully
2016-09-10 09:31:31,331 INFO mapreduce.Job (Job.java:monitorAndPrintJob(1385)) - Counters: 35
File System Counters
FILE: Number of bytes read=1974
FILE: Number of bytes written=632424
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=866
HDFS: Number of bytes written=416
HDFS: Number of read operations=15
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Map-Reduce Framework
Map input records=6
Map output records=62
Map output bytes=676
Map output materialized bytes=806
Input split bytes=112
Combine input records=0
Combine output records=0
Reduce input groups=45
Reduce shuffle bytes=806
Reduce input records=62
Reduce output records=45
Spilled Records=124
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=481296384
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=433
File Output Format Counters
Bytes Written=416
job.isSuccessful true
Double click on part-r-00000 file from output->result, you should see below output.
Rerun the WordCount.java class to see below log
Result file content:
A 1
Apache 1
Hadoop 3
across 2
allows 1
an 2
and 2
application 1
clusters 2
computation 2
computers 2
datasets 1
designed 1
distributed 2
each 1
environment 1
frame-worked 1
framework 1
from 1
in 2
is 2
java 1
large 1
local 1
machines 1
models 1
of 4
offering 1
open 1
processing 1
programming 1
provides 1
scale 1
server 1
simple 1
single 1
source 1
storage 2
that 2
thousands 1
to 2
up 1
using 1
works 1
written 1
WordCount Program in Debug mode :
Place debugger point in WordCountMapper class and right click on WordCount and Debug as - > Java Application, you should be able to debug your program line by line on run-time.
Source Code :
Github : https://github.com/naveenacharya1/Hadoop/tree/master/hadoop-map-reduce/WordCountMRZip Version
Good boss,
ReplyDeleteNode and Data node throws error , could get what would be the issue , May be some null pointer
ReplyDeleteCan u paste startup log stack trace from local machine?
Deletedone , works now
Deletehow I run it with cmd ?
ReplyDelete