<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
<description>A base for other temporary directores</description>
</property>
</configuration>
配置中的hadoop.tmp.dir路径必须自己创建,也就是说必须创建好/var/log/hadoop路径,这样在启动的时候才不至于出问题。配置项的解释可以看如下的解释,还能够设置读写文件的大小
Parameter
Value
Notes
fs.defaultFS
NameNode URI
hdfs://host:port/
io.file.buffer.size
131072
Size of read/write buffer used in SequenceFiles.
2.2.2 hdfs-site.xml
Parameter
Value
Notes
dfs.namenode.name.dir
Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.
If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.hosts / dfs.hosts.exclude
List of permitted/excluded DataNodes.
If necessary, use these files to control the list of allowable datanodes.
dfs.blocksize
268435456
HDFS blocksize of 256MB for large file-systems.
dfs.namenode.handler.count
100
More NameNode server threads to handle RPCs from large number of DataNodes.
dfs.datanode.data.dir
Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.
If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.
2.2.3 yarn-site.xml
Parameter
Value
Notes
yarn.acl.enable
true / false
Enable ACLs? Defaults to false.
yarn.admin.acl
Admin ACL
ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which means anyone. Special value of just space means no one has access.
yarn.log-aggregation-enable
FALSE
Configuration to enable or disable log aggregation
yarn.resourcemanager.address
ResourceManager host:port for clients to submit jobs.
host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address
ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.
host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address
ResourceManager host:port for NodeManagers.
host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address
ResourceManager host:port for administrative commands.
host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address
ResourceManager web-ui host:port.
host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.hostname
ResourceManager host.
host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.resourcemanager.scheduler.class
ResourceManager Scheduler class.
CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb
Minimum limit of memory to allocate to each container request at the Resource Manager.
In MBs
yarn.scheduler.maximum-allocation-mb
Maximum limit of memory to allocate to each container request at the Resource Manager.
In MBs
yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path
List of permitted/excluded NodeManagers.
If necessary, use these files to control the list of allowable NodeManagers.
yarn.nodemanager.resource.memory-mb
Resource i.e. available physical memory, in MB, for given NodeManager
Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio
Maximum ratio by which virtual memory usage of tasks may exceed physical memory
The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs
Comma-separated list of paths on the local filesystem where intermediate data is written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs
Comma-separated list of paths on the local filesystem where logs are written.
Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds
10800
Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir
/logs
HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix
logs
Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
yarn.log-aggregation.retain-seconds
-1
How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds
-1
Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.
2.2.4 mapred-site.xml
Parameter
Value
Notes
mapreduce.framework.name
yarn
Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb
1536
Larger resource limit for maps.
mapreduce.map.java.opts
-Xmx1024M
Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb
3072
Larger resource limit for reduces.
mapreduce.reduce.java.opts
-Xmx2560M
Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb
512
Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor
100
More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies
50
Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
mapreduce.jobhistory.address
MapReduce JobHistory Server host:port
Default port is 10020.
mapreduce.jobhistory.webapp.address
MapReduce JobHistory Server Web UI host:port
Default port is 19888.
mapreduce.jobhistory.intermediate-done-dir
/mr-history/tmp
Directory where history files are written by MapReduce jobs.
mapreduce.jobhistory.done-dir
/mr-history/done
Directory where history files are managed by the MR JobHistory Server.
2.3 hadoop集群