balance是啥,顾名思义 是个平衡器
主要是平衡各个datanode之间的使用
网上的文档一个比一个写的6结果,有的命令都拼错了。。。而且你知道究竟平衡的是啥么
直接上官网
--查看balance 也就是集群之间转移数据的速度
hdfs dfsadmin -getBalancerBandwidth node17:9867
Balancer bandwidth is 10485760 bytes per second. --10M嫌慢 设置20M
这里权限有点问题。。。认证hdfs
hdfs dfsadmin -setBalancerBandwidth 20971520
[root@worker01 /home/devuser]# hdfs dfsadmin -setBalancerBandwidth 20971520
NumberFormatException: For input string: " 20971520"
Usage: hdfs dfsadmin [-setBalancerBandwidth <bandwidth in bytes per second>]
[root@worker01 /home/devuser]# hdfs dfsadmin -setBalancerBandwidth 20971520
setBalancerBandwidth: Access denied for user hive. Superuser privilege is required
[root@worker01 /home/devuser]# kinit hdfs
Password for hdfs@CDH.COM:
[root@worker01 /home/devuser]# hdfs dfsadmin -setBalancerBandwidth 20971520
[root@worker01 /home/devuser]# hdfs dfsadmin -setBalancerBandwidth 20971520
Balancer bandwidth is set to 20971520 for master.data.com/9.134.64.234:8020
Balancer bandwidth is set to 20971520 for node01.data.com/9.134.66.48:8020
--这里的时候我遇到一个问题 ip我知道,这个端口是啥。。但是我注意这个是ipc_port


开始准备balancer
[root@worker01 /home/devuser]# hdfs balancer --help
Usage: hdfs balancer
[-policy <policy>] the balancing policy: datanode or blockpool
[-threshold <threshold>] Percentage of disk capacity
[-exclude [-f <hosts-file> | <comma-separated list of hosts>]] Excludes the specified datanodes.
[-include [-f <hosts-file> | <comma-separated list of hosts>]] Includes only the specified datanodes.
[-source [-f <hosts-file> | <comma-separated list of hosts>]] Pick only the specified datanodes as source nodes.
[-blockpools <comma-separated list of blockpool ids>] The balancer will only run on blockpools included in this list.
[-idleiterations <idleiterations>] Number of consecutive idle iterations (-1 for Infinite) before exit.
[-runDuringUpgrade] Whether to run the balancer during an ongoing HDFS upgrade.This is usually not desired since it will not affect used space on over-utilized machines.
Generic options supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:
command [genericOptions] [commandOptions]
这时候要考虑一个问题?怎么样才算平衡?
比如10个dn,每个100G容量,共计1T, 总共使用了200G 其中 dn1使用了1k dn2使用了99G
那么我要怎么平衡? dn1和dn2平衡到20G还是50G。我也不知道。
开始实验
threshold
hdfs balancer -threshold 5 --阈值=5 也就是容忍datanode数据的差距是5%
[root@worker01 /home/devuser]# hdfs balancer -threshold 5
22/06/27 11:29:18 INFO balancer.Balancer: Using a threshold of 5.0
22/06/27 11:29:18 INFO balancer.Balancer: namenodes = [hdfs://s2cluster]
22/06/27 11:29:18 INFO balancer.Balancer: parameters = Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 5.0, max idle iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]
22/06/27 11:29:18 INFO balancer.Balancer: included nodes = []
22/06/27 11:29:18 INFO balancer.Balancer: excluded nodes = []
22/06/27 11:29:18 INFO balancer.Balancer: source nodes = []
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
22/06/27 11:29:19 INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
22/06/27 11:29:19 INFO block.BlockTokenSecretManager: Setting block keys
22/06/27 11:29:19 INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 50 (default=50)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760)
22/06/27 11:29:19 INFO block.BlockTokenSecretManager: Setting block keys
22/06/27 11:29:19 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240)
22/06/27 11:29:19 INFO balancer.Balancer: dfs.blocksize = 134217728 (default=134217728)
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.117.90:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.68.200:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.80.60:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.81.221:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.124.14:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.122.87:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.83.33:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.163.60:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.115.141:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.124.36:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.71.192:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.123.37:1004
22/06/27 11:29:19 INFO net.NetworkTopology: Adding a new node: /default/9.134.117.73:1004
22/06/27 11:29:19 INFO

本文详细记录了Hadoop HDFS中数据均衡的过程,包括设置平衡阈值、调整带宽和执行balancer命令。讨论了balancer如何选择源和目标节点,以及在遇到standby节点问题时的处理。同时,分析了数据节点间DFSUsed%不一致的原因,并探讨了平衡的目标——DFSUsed占ConfiguredCapacity的比例。
https://hadoop.apache.org/docs/r3.2.2/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

3200

被折叠的 条评论
为什么被折叠?



