openstack 与 ceph (osd 部署)

本文详细介绍如何在Ceph集群中部署OSD,包括系统初始化、磁盘准备、创建及配置OSD、添加认证密钥、管理CRUSH MAP等关键步骤,并验证集群健康状态。

OSD 部署 (ceph-0.8.17)

目标

ceph 节点中, 每个磁盘(10块) 创建独立 Raid0
每个磁盘创建独立对应 OSD
创建CEPH 集群
CEPH数据备份副本数量为 3

知识扫盲

osd

osd 用于存储数据
osd 可以理解为数据存储的小房间
每个 osd 对应一个独立的磁盘
ceph 集群是有多个 osd 组成, 而对于用户来说, osd 是透明不可见
当 monitor 运行后, 你需要添加 osd
你的集群只有在获得足够的 osd 数量用于复制对象时候才可能获得 active + clean 状态

例如 osd pool size = 2, 那么至少需要 2 个 OSD

crush map

crush map 可以理解为 ceph 集群中的结构蓝图
所有的存储节点, 及存储节点对应的机柜, osd 信息都可以通过 crush map 体现出来
在启动 MONITOR 后,你的集群具有默认的 CURSH MAP
当前 CURSH MAP 并没有 CEPH OSD 进程映射到 CEPH 节点
详细信息参考 crush map 文档

创建 osd

###1. 系统初始化

安装 Centos 7.1 版本操作系统
在 /etc/hosts 定义集群中所有主机名及 ip 地址
保证时间同步
确保 iptables, selinux 都处于关闭状态

###2. 准备磁盘

把打算用于创建 ceph 存储的磁盘进行分区, 并执行格式化, 参考该脚本, 并在每个节点上执行

#!/bin/bash
LANG=en_US
disk=`fdisk -l | grep ^Disk  | grep sectors | grep sd | grep -v sda | awk -F[:\ ]  '{print $2}' | sort`
yum install -y hdparm 
for partition in  $disk
do
	dd if=/dev/zero of=$partition bs=1M count=100
	parted -s $partition mklabel gpt
	parted $partition mkpart primary xfs 1 100%
	hdparm -z "$partition"1
	mkfs.xfs -f -i size=512  "$partition"1
done

###3. 创建osd

每个独立的物理硬盘, 都已经完成格式化
我们修改 /etc/fstab 把磁盘挂载到对应的目录中, 成为 CEPH 集群存储中的一部分
我们都会为每个独立的 磁盘创建一个独立的 OSD 与其对应

#!/bin/bash
LANG=en_US
num=0
for ip in XXX.XX.128.55 XXX.XX.128.56 XXX.XX.128.57 XXX.XX.128.73 XXX.XX.128.74 XXX.XX.128.75 XXX.XX.128.76
do
				diskpart=`ssh $ip "fdisk -l  | grep GPT | grep -v sda" | awk '{print $1}' | sort`
				for partition in $diskpart
				do
												ssh $ip "ceph osd create"
												ssh $ip "mkdir /var/lib/ceph/osd/ceph-$num"
												ssh $ip "echo $partition  /var/lib/ceph/osd/ceph-$num   xfs defaults 0 0 >> /etc/fstab"
												let num++
				done
				ssh $ip "mount -a"
done

重启, 验证自动挂载是否正常

###4. ceph 配置文件

[global]
fsid = dc4f91c1-8792-4948-b68f-2fcea75f53b9
mon initial members = xx-xxx-xxxx-xxxxxxx15-128055, xx-xxx-xxxx-xxxxxxx17-128057, xx-xxx-xxxx-xxxxxxx24-128074
mon host = XXX.XX.128.55, XXX.XX.128.57, XXX.XX.128.74
public network = XXX.XX.128.0/21          <- public network 主要用于应答客户数据请求
cluster network = XXX.XX.128.0/21         <- cluster network 主要用户 ceph 之间数据同步 
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 1024
filestore xattr use omap = true
osd pool default size = 3                 <- 集群中每个数据保存 3 副本
osd pool default min size = 1
osd pool default pg num = 1024
osd pool default pgp num = 1024
osd crush chooseleaf type = 1

###5. 初始化 OSD 数据目录

#!/bin/bash
LANG=en_US
num=0

for ip in XXX.XX.128.55 XXX.XX.128.56 XXX.XX.128.57 XXX.XX.128.73 XXX.XX.128.74 XXX.XX.128.75 XXX.XX.128.76
do
				diskpart=`ssh $ip "fdisk -l  | grep GPT | grep -v sda" | awk '{print $1}' | sort`
				for partition in $diskpart
				do
								ssh $ip "ceph-osd -i $num --mkfs --mkkey --osd-uuid dc4f91c1-8792-4948-b68f-2fcea75f53b9"
								let num++
				done
done

检测结果

[root@xx-xxx-xxxx-xxxxxxx15-128055 tmp]# ls /var/lib/ceph/osd/ceph*
/var/lib/ceph/osd/ceph-0:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-1:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-2:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-3:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-4:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-5:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-6:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-7:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-8:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami
/var/lib/ceph/osd/ceph-9:
ceph_fsid  current  fsid  journal  keyring  magic  ready  store_version  superblock  whoami

###6. 注册 OSD 认证密钥

#!/bin/bash
LANG=en_US
num=0
for ip in XXX.XX.128.55 XXX.XX.128.56 XXX.XX.128.57 XXX.XX.128.73 XXX.XX.128.74 XXX.XX.128.75 XXX.XX.128.76
do
				diskpart=`ssh $ip "fdisk -l  | grep GPT | grep -v sda" | awk '{print $1}' | sort`
				for partition in $diskpart
				do
								ssh $ip "ceph auth add osd.$num osd 'allow *' mon 'allow profile osd' -i /var/lib/ceph/osd/ceph-$num/keyring"
								let num++
				done
done

参考执行结果

[root@xx-xxx-xxxx-xxxxxxx15-128055 tmp]# ./authosd.sh
added key for osd.0
added key for osd.1
added key for osd.2
added key for osd.3
added key for osd.4
...
...
added key for osd.63
added key for osd.64
added key for osd.65
added key for osd.66
added key for osd.67
added key for osd.68
added key for osd.69

###7. ceph 节点管理
添加 ceph 节点到 CURSH MAP 中, 并把 ceph 节点放置 ROOT 节点下

#!/bin/bash
for host in xx-xxx-xxxx-xxxxxxx15-128055 xx-xxx-xxxx-xxxxxxx16-128056 xx-xxx-xxxx-xxxxxxx17-128057 xx-xxx-xxxx-xxxxxxx23-128073 xx-xxx-xxxx-xxxxxxx24-128074 xx-xxx-xxxx-xxxxxxx25-128075 xx-xxx-xxxx-xxxxxxx26-128076
do
	ceph osd crush add-bucket $host host
	ceph osd crush move $host root=default
done

参考执行结果

[root@xx-xxx-xxxx-xxxxxxx15-128055 tmp]# ./hostmap.sh
added bucket xx-xxx-xxxx-xxxxxxx15-128055 type host to crush map
moved item id -2 name 'xx-xxx-xxxx-xxxxxxx15-128055' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx16-128056 type host to crush map
moved item id -3 name 'xx-xxx-xxxx-xxxxxxx16-128056' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx17-128057 type host to crush map
moved item id -4 name 'xx-xxx-xxxx-xxxxxxx17-128057' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx23-128073 type host to crush map
moved item id -5 name 'xx-xxx-xxxx-xxxxxxx23-128073' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx24-128074 type host to crush map
moved item id -6 name 'xx-xxx-xxxx-xxxxxxx24-128074' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx25-128075 type host to crush map
moved item id -7 name 'xx-xxx-xxxx-xxxxxxx25-128075' to location {root=default} in crush map
added bucket xx-xxx-xxxx-xxxxxxx26-128076 type host to crush map
moved item id -8 name 'xx-xxx-xxxx-xxxxxxx26-128076' to location {root=default} in crush map

###8. 管理 crush map osd
增加 OSD 到 CURSH MAP, 然后你就可以接收数据, 你同样可以重新编译 CURSH MAP, 添加 OSD 到磁盘, 添加主机到 CURSH MAP, 为磁盘添加设备

分配权重, 重新进行编译设定即可

#!/bin/bash
LANG=en_US
num=0
for ip in XXX.XX.128.55 XXX.XX.128.56 XXX.XX.128.57 XXX.XX.128.73 XXX.XX.128.74 XXX.XX.128.75 XXX.XX.128.76
do
				diskpart=`ssh $ip "fdisk -l  | grep GPT | grep -v sda" | awk '{print $1}' | sort`
				for partition in $diskpart
				do
								hostname=`ssh $ip hostname -s`
								ceph osd crush add osd.$num 1.0 root=default host=$hostname
								let num++
				done
done

###9. 启动 osd

#!/bin/bash
LANG=en_US
num=0
for ip in XXX.XX.128.55 XXX.XX.128.56 XXX.XX.128.57 XXX.XX.128.73 XXX.XX.128.74 XXX.XX.128.75 XXX.XX.128.76
do
				diskpart=`ssh $ip "fdisk -l  | grep GPT | grep -v sda" | awk '{print $1}' | sort`
				for partition in $diskpart
				do
								ssh $ip "touch /var/lib/ceph/osd/ceph-$num/sysvinit"
								ssh $ip "/etc/init.d/ceph start osd.$num"
								let num++
				done
done

###10. 校验状态

[root@xx-xxx-xxxx-xxxxxxx17-128057 ~]# ceph osd tree
# id    weight  type name       up/down reweight
-1      70      root default
-2      10              host xx-xxx-xxxx-xxxxxxx15-128055
0       1                       osd.0   up      1
1       1                       osd.1   up      1
2       1                       osd.2   up      1
3       1                       osd.3   up      1
4       1                       osd.4   up      1
5       1                       osd.5   up      1
6       1                       osd.6   up      1
7       1                       osd.7   up      1
8       1                       osd.8   up      1
9       1                       osd.9   up      1
-3      10              host xx-xxx-xxxx-xxxxxxx16-128056
10      1                       osd.10  up      1
11      1                       osd.11  up      1
12      1                       osd.12  up      1
13      1                       osd.13  up      1
14      1                       osd.14  up      1
15      1                       osd.15  up      1
16      1                       osd.16  up      1
17      1                       osd.17  up      1
18      1                       osd.18  up      1
19      1                       osd.19  up      1
-4      10              host xx-xxx-xxxx-xxxxxxx17-128057
20      1                       osd.20  up      1
21      1                       osd.21  up      1
22      1                       osd.22  up      1
23      1                       osd.23  up      1
24      1                       osd.24  up      1
25      1                       osd.25  up      1
26      1                       osd.26  up      1
27      1                       osd.27  up      1
28      1                       osd.28  up      1
29      1                       osd.29  up      1
-5      10              host xx-xxx-xxxx-xxxxxxx23-128073
30      1                       osd.30  up      1
31      1                       osd.31  up      1
32      1                       osd.32  up      1
33      1                       osd.33  up      1
34      1                       osd.34  up      1
35      1                       osd.35  up      1
36      1                       osd.36  up      1
37      1                       osd.37  up      1
38      1                       osd.38  up      1
39      1                       osd.39  up      1
-6      10              host xx-xxx-xxxx-xxxxxxx24-128074
40      1                       osd.40  up      1
41      1                       osd.41  up      1
42      1                       osd.42  up      1
43      1                       osd.43  up      1
44      1                       osd.44  up      1
45      1                       osd.45  up      1
46      1                       osd.46  up      1
47      1                       osd.47  up      1
48      1                       osd.48  up      1
49      1                       osd.49  up      1
-7      10              host xx-xxx-xxxx-xxxxxxx25-128075
50      1                       osd.50  up      1
51      1                       osd.51  up      1
52      1                       osd.52  up      1
53      1                       osd.53  up      1
54      1                       osd.54  up      1
55      1                       osd.55  up      1
56      1                       osd.56  up      1
57      1                       osd.57  up      1
58      1                       osd.58  up      1
59      1                       osd.59  up      1
-8      10              host xx-xxx-xxxx-xxxxxxx26-128076
60      1                       osd.60  up      1
61      1                       osd.61  up      1
62      1                       osd.62  up      1
63      1                       osd.63  up      1
64      1                       osd.64  up      1
65      1                       osd.65  up      1
66      1                       osd.66  up      1
67      1                       osd.67  up      1
68      1                       osd.68  up      1
69      1                       osd.69  up      1

###11. 验证 CEPH 健康状态
[root@xx-xxx-xxxx-xxxxxxx15-128055 tmp]# ceph -s
cluster dc4f91c1-8792-4948-b68f-2fcea75f53b9
health HEALTH_WARN too few pgs per osd (2 < min 20)
monmap e1: 3 mons at {xx-xxx-xxxx-xxxxxxx15-128055=XXX.XX.128.55:6789/0,xx-xxx-xxxx-xxxxxxx17-128057=XXX.XX.128.57:6789/0,xx-xxx-xxxx-xxxxxxx24-128074=XXX.XX.128.74:6789/0}, election epoch 8, quorum 0,1,2 xx-xxx-xxxx-xxxxxxx15-128055,xx-xxx-xxxx-xxxxxxx17-128057,xx-xxx-xxxx-xxxxxxx24-128074
osdmap e226: 70 osds: 70 up, 70 in
pgmap v265: 192 pgs, 3 pools, 0 bytes data, 0 objects
74632 MB used, 254 TB / 254 TB avail
192 active+clean

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Terry_Tsang

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值