ceph (luminous 版) 手动创建 cephfs

原创已于 2022-03-30 10:26:36 修改 · 582 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#ceph #cephfs

于 2020-12-09 18:33:14 首次发布

本文档详述如何在现有Ceph环境中手动创建并配置CephFS服务，包括理解CephFS工作原理、环境准备、创建MDs、设置存储池、客户端配置与使用、故障排查等步骤。在创建CephFS时，涉及数据池和元数据池的创建、MDs服务器的启动与管理、客户端挂载与权限设置，以及副本和规则的管理。

目标

在当前现有的 ceph 环境下添加 mds (cephfs) 服务

在这里插入图片描述

理解

客户端可以通过 nfsV4, cephfs 的方法对 cephfs 进行访问
使用通用 posfix 标准
要创建 cephfs 你必须在 ceph rados 下创建两个 POOL
data pool 用于存储数据
metadata pool 用于存储数据的元数据 ( 可以连接为存储了文件的索引节点信息)
当客户端要访问 cephfs 上的文件时，首先要连接 mds 服务
假如客户端需要对文件执行操作，需要先连接至 MDS server, mds 记录了客户端的操作日志，通过 metadata 中获取 innode 信息，返回至客户端，然后客户端要转去 data pool 访问文件数据

环境

ceph 状态

# ceph -s
  cluster:
    id:     7e720238-7xxxxxxxxxxxxxxd9d9a49ac4e4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ns-storage-020100,ns-storage-020101,ns-storage-020102
    mgr: ns-storage-020100(active), standbys: ns-storage-020101, ns-storage-020102
    osd: 18 osds: 18 up, 18 in

  data:
    pools:   3 pools, 1152 pgs
    objects: 250 objects, 631 MB
    usage:   40584 MB used, 66966 GB / 67006 GB avail
    pgs:     1152 active+clean

ceph osd 状态

# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME                              STATUS REWEIGHT PRI-AFF
-12       24.00000 root noah
 -9        8.00000     host ns-storage-020100.vclound.com
 12   hdd  4.00000         osd.12                             up  1.00000 1.00000
 13   hdd  4.00000         osd.13                             up  1.00000 1.00000
-10        8.00000     host ns-storage-020101.vclound.com
 14   hdd  4.00000         osd.14                             up  1.00000 1.00000
 15   hdd  4.00000         osd.15                             up  1.00000 1.00000
-11        8.00000     host ns-storage-020102.vclound.com
 16        4.00000         osd.16                             up  1.00000 1.00000
 17        4.00000         osd.17                             up  1.00000 1.00000
 -1       47.63620 root default
 -2       15.63620     host ns-storage-020100
  0   hdd  3.63620         osd.0                              up  1.00000 1.00000
  1   hdd  4.00000         osd.1                              up  1.00000 1.00000
  2   hdd  4.00000         osd.2                              up  1.00000 1.00000
  3   hdd  4.00000         osd.3                              up  1.00000 1.00000
 -3       16.00000     host ns-storage-020101
  4   hdd  4.00000         osd.4                              up  1.00000 1.00000
  5   hdd  4.00000         osd.5                              up  1.00000 1.00000
  6   hdd  4.00000         osd.6                              up  1.00000 1.00000
  7   hdd  4.00000         osd.7                              up  1.00000 1.00000
 -4       16.00000     host ns-storage-020102
  8   hdd  4.00000         osd.8                              up  1.00000 1.00000
  9   hdd  4.00000         osd.9                              up  1.00000 1.00000
 10   hdd  4.00000         osd.10                             up  1.00000 1.00000
 11   hdd  4.00000         osd.11                             up  1.00000 1.00000

创建 mds

分别在每个节点上创建对应目录，注意这个 ID 不可以直接用数字, 这里取 hostname 为 id 值

 ex:  mkdir -p  /var/lib/ceph/mds/ceph-{id}

在每个机器上分别执行

mkdir -p  /var/lib/ceph/mds/ceph-$(hostname -s)

为每个机器创建 keyrings
分别在每个机器上执行

ceph-authtool --create-keyring /var/lib/ceph/mds/ceph-{id}/keyring --gen-key -n mds.0

执行:
# ceph-authtool --create-keyring /var/lib/ceph/mds/ceph-$(hostname -s)/keyring --gen-key -n mds.$(hostname -s )
creating /var/lib/ceph/mds/ceph-ns-storage-020100/keyring

]# ceph-authtool --create-keyring /var/lib/ceph/mds/ceph-$(hostname -s)/keyring --gen-key -n mds.$(hostname -s )
creating /var/lib/ceph/mds/ceph-ns-storage-020102/keyring

# ceph-authtool --create-keyring /var/lib/ceph/mds/ceph-$(hostname -s)/keyring --gen-key -n mds.$(hostname -s )
creating /var/lib/ceph/mds/ceph-ns-storage-020102/keyring

为每个机器授权
分别在每个机器上执行

# ceph auth add mds.0 osd "allow rwx" mds "allow" mon "allow profile mds" -i /var/lib/ceph/mds/ceph-$(hostname -s)/keyring
added key for mds.0
# ceph auth add mds.1 osd "allow rwx" mds "allow" mon "allow profile mds" -i /var/lib/ceph/mds/ceph-$(hostname -s)/keyring
added key for mds.1
]# ceph auth add mds.2 osd "allow rwx" mds "allow" mon "allow profile mds" -i /var/lib/ceph/mds/ceph-$(hostname -s)/keyring
added key for mds.2

在每个服务器上添加配置

ceph.conf
[mds.0]
host =  ns-storage-020100
[mds.1]
host =  ns-storage-020101
[mds.2]
host =  ns-storage-020102

切记更改用户权限

chown ceph:ceph /var/lib/ceph/mds -R

启动服务

# cp    /usr/lib/systemd/system/ceph-mds@.service    /usr/lib/systemd/system/ceph-mds@$(hostname -s )
#  systemctl start  ceph-mds@$(hostname -s )

# systemctl  status  ceph-mds@$(hostname -s)
● ceph-mds@ns-storage-020100.service - Ceph metadata server daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mds@ns-storage-020100.service; disabled; vendor preset: disabled)
   Active: active (running) since 四 2020-12-10 09:53:52 CST; 6s ago
 Main PID: 60814 (ceph-mds)
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@ns-storage-020100.service
           └─60814 /usr/bin/ceph-mds -f --cluster ceph --id ns-storage-020100 --setuser ceph --setgroup ceph

12月 10 09:53:52 ns-storage-020100.vclound.com systemd[1]: Started Ceph metadata server daemon.
12月 10 09:53:52 ns-storage-020100.vclound.com systemd[1]: Starting Ceph metadata server daemon...
12月 10 09:53:52 ns-storage-020100.vclound.com ceph-mds[60814]: starting mds.ns-storage-020100 at -

管理 mds

cephfs 状态 (由于没有 fs, 因此全部 mds 都是出于 standby 状态

# ceph fs status
+-------------------+
|    Standby MDS    |
+-------------------+
| ns-storage-020102 |
| ns-storage-020100 |
| ns-storage-020101 |
+-------------------+

ceph 状态, (由于还没有创建 fs, 因此在 services 中无法识别 mds 信息

# ceph -s
  cluster:
    id:     7e72023xxxxxxxxxxxxxxxxxxxxxxxd9d9a49ac4e4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ns-storage-020100,ns-storage-020101,ns-storage-020102
    mgr: ns-storage-020100(active), standbys: ns-storage-020101, ns-storage-020102
    osd: 18 osds: 18 up, 18 in

  data:
    pools:   3 pools, 1152 pgs
    objects: 250 objects, 631 MB
    usage:   40605 MB used, 66966 GB / 67006 GB avail
    pgs:     1152 active+clean

创建 cephfs 专用 pool

# ceph osd pool create cephfs_data 256 256       (存储数据专用)
pool 'cephfs_data' created
# ceph osd pool create cephfs_metadata 256 256   (存储 metadata 专用)
pool 'cephfs_metadata' created

把 pool 定义为 cephfs 专用

# ceph osd pool application enable cephfs_metadata cephfs
enabled application 'cephfs' on pool 'cephfs_metadata'
# ceph osd pool application enable cephfs_data cephfs
enabled application 'cephfs' on pool 'cephfs_data'

参考创建 cephfs 方法 (语法)

fs new <fs_name> <metadata> <data> {--force} {--allow-dangerous-metadata-overlay}

创建 cephfs

# ceph fs new noah_fs cephfs_metadata cephfs_data
new fs with metadata pool 5 and data pool 4

再次查询 cephfs 服务状态

# ceph -s
  cluster:
    id:     7e72xxxxxxxxxxxxxxxxxxxxxxxxxx9d9a49ac4e4
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ns-storage-020100,ns-storage-020101,ns-storage-020102
    mgr: ns-storage-020100(active), standbys: ns-storage-020101, ns-storage-020102
    mds: noah_fs-1/1/1 up  {0=ns-storage-020101=up:active}, 2 up:standby
    osd: 18 osds: 18 up, 18 in

  data:
    pools:   5 pools, 1664 pgs
    objects: 271 objects, 631 MB
    usage:   40595 MB used, 66966 GB / 67006 GB avail
    pgs:     1664 active+clean

查询 cephfs 信息

# ceph fs ls
name: noah_fs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

检测服务状态

# ceph fs status
noah_fs - 0 clients
=======
+------+--------+-------------------+---------------+-------+-------+
| Rank | State  |        MDS        |    Activity   |  dns  |  inos |
+------+--------+-------------------+---------------+-------+-------+
|  0   | active | ns-storage-020101 | Reqs:    0 /s |    0  |    1  |
+------+--------+-------------------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 2246  | 45.1T |
|   cephfs_data   |   data   |    0  | 45.1T |
+-----------------+----------+-------+-------+

+-------------------+
|    Standby MDS    |
+-------------------+
| ns-storage-020102 |
| ns-storage-020100 |
+-------------------+
MDS version: ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)

由于 admin 之前没有对 mds 授权，因此对 admin 进行授权

client.admin
        key: AQD6FlpdpOhJHxAAzuWwYHkYC9NUKrT4GgM8iQ==
        auid: 0
        caps: [mds] allow
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

授权方法

# ceph auth caps client.admin mon 'allow *' osd 'allow *' mds 'allow *' mgr 'allow *'
updated caps for client.admin

检验 admin 权限

client.admin
        key: AQD6FlpdpOhJHxAAzuWwYHkYC9NUKrT4GgM8iQ==
        auid: 0
        caps: [mds] allow *
        caps: [mgr] allow *
        caps: [mon] allow *
        caps: [osd] allow *

对客户端进行 auth 授权

# ceph fs authorize noah_fs client.terry /terry rw  /backupdata rw  
[client.terry]
        key = AQCCl9FfljvBOBAAf+JKomWC8djGk3qjUqyQFA==

由于默认没有 terry 目录因此必须要创建一个可以访问 / 的用户

# ceph fs authorize noah_fs client.mary / rw    
[client.mary]
       key =  AQBNwdFfZZGMERAAJ/CdMbLy7BvMqt49R2ywXg==

查询授权

# ceph auth list | grep -A 6 terry
installed auth entries:

client.terry
        key: AQCClxxxxxxxxxxxxxxxxxxxxxdjGk3qjUqyQFA==
        caps: [mds] allow rw path=/terry, allow rw path=/backupdata
        caps: [mon] allow r
        caps: [osd] allow rw pool=cephfs_data

# ceph auth list | grep -A 6 mary
client.mary
        key: AQBNwdFxxxxxxxxxxxxxxxxxxxxxxxBvMqt49R2ywXg==
        caps: [mds] allow rw
        caps: [mon] allow r
        caps: [osd] allow rw pool=cephfs_data

cephfs 配置参考

cache 配置
 mds 配置

客户端 cephfs 使用

客户端配置参考

客户端配置

客户端需要配置 ceph.conf 用于连接 ceph

[global]
fsid = 7e7202xxxxxxxxxxxxxxxxx9a49ac4e4
mon initial members =  ns-storage-020100,ns-storage-020101,ns-storage-020102
mon host = IPADDR,IPADDR,IPADDR
public network = 1.1.1.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 2048
filestore xattr use omap = true
osd pool default size = 3
osd pool default min size = 1
osd pool default pg num = 256
osd pool default pgp num = 256
osd crush chooseleaf type = 1
[osd]
osd journal size = 2048
osd heartbeat grace = 20
osd heartbeat interval = 5

[mds.0]
host =  ns-storage-020100
[mds.1]
host =  ns-storage-020101
[mds.2]
host =  ns-storage-020102

配置 secret key

获取 mary 客户端 keyring , 并把该文件存放至客户端

ceph auth get client.mary -o ceph.client.mary.keyring

当目录 /terry /backupdata 不存在时，客户 terry 是无法进行挂载并报下面错误

mount error 2 = No such file or directory

挂载

使用客户 mary 挂载 cephfs 并创建 /terry 目录

#  mount -t ceph IPADDR:6789,IPADDR:6789,IPADDR:6789:/ /mnt -o name=mary,secret=AQBNwdFfxxxxxxxxxxxxxxxxxxxXg==
检测一下挂载状态
# mount | grep mnt
IPADDR:6789,IPADDR:6789,IPADDR:6789:/ on /mnt type ceph (rw,relatime,name=mary,secret=<hidden>,acl,wsize=16777216)
创建  noah_fs 中  terry  目录
#  mkdir /mnt/terry
#  umount /mnt

使用 secret key file

假如不希望明文地输入secret key
保存 key

echo "xxxxxxxxxyour_key_stringxxxxxxxxxxxxxx"  > mary.key

挂载命令改变为

# mount -t ceph X.X.X.X:6789,X.X.X.X:6789,X.X.X.X:6789:/  /mnt -o name=mary,secretfile=./ceph.client.mary.keyring

测试其他用户

测试用户 terry 权限

# ceph auth get client.terry -o  ceph.client.terry.keyring
exported keyring for client.terry
# cat /tmp/terry.keyring
[client.terry]
        key =  AQA4vkNigzV9GBAAGHNifIRTCiFMsdzwzZQVmQ==
        caps mds = "allow rw path=/terry, allow rw path=/backupdata"
        caps mon = "allow r"
        caps osd = "allow rw pool=cephfs_data"

创建 keyfile

echo "AQA4vkNigzV9GBAAGHNifIRTCiFMsdzwzZQVmQ==" >  terry.eky

客户端测试连接

# ceph fs ls -c ./ceph.conf  -n client.terry -k ceph.client.terry.keyring
name: noah_fs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

挂载

# mount -t ceph  IPADDR:6789,IPADDR:6789,IPADDR:6789:/terry /mnt/ -o name=terry,secretfile=./terry.key
# mount | grep mnt
IPADDR:6789,IPADDR:6789,IPADDR:6789:/terry on /mnt type ceph (rw,relatime,name=terry,secret=<hidden>,acl,wsize=16777216)

FAQ

cephfs 使用了多少副本，如何管理 cephfs 底层

其实 cephfs 数据都存放在 rados 中
cephfs 对应了 metadata, data pool
对上述两个 pool 进行管理即可

副本

查询当前 pool 副本

# ceph osd pool get cephfs_data size
size: 3
# ceph osd pool get cephfs_metadata size
size: 3

如果希望修改 ( 3 就是副本数量 )

# ceph osd pool set your_pool_name size 3

rule

查询当前 pool 信息

#  ceph osd dump | grep "^pool" | grep "crush"
pool 1 'volumes' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 263 flags hashpspool stripe_width 0 application rbd
pool 2 'rbd' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 59 flags hashpspool stripe_width 0 application rbd
pool 3 'noahpool' replicated size 3 min_size 1 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 350 flags hashpspool stripe_width 0 application rbd
pool 4 'cephfs_data' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 17798 flags hashpspool stripe_width 0 application cephfs
pool 5 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 17798 flags hashpspool stripe_width 0 application cephfs

把 cephfs_data 与 cephfs_metadata 存放到不同的 rule root 下
查询当前 ceph 的 osd tree
计划把 cephfs_data 存放到 noah 下
假话吧 cephfs_metadata 存放到 default 下（默认，不需要修改）

# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME                              STATUS REWEIGHT PRI-AFF
-12       24.00000 root noah
 -9        8.00000     host ns-storage-020100.vclound.com
 12   hdd  4.00000         osd.12                             up  1.00000 1.00000
 13   hdd  4.00000         osd.13                             up  1.00000 1.00000
-10        8.00000     host ns-storage-020101.vclound.com
 14   hdd  4.00000         osd.14                             up  1.00000 1.00000
 15   hdd  4.00000         osd.15                             up  1.00000 1.00000
-11        8.00000     host ns-storage-020102.vclound.com
 16        4.00000         osd.16                             up  1.00000 1.00000
 17        4.00000         osd.17                             up  1.00000 1.00000
 -1       47.63620 root default
 -2       15.63620     host ns-storage-020100
  0   hdd  3.63620         osd.0                              up  1.00000 1.00000
  1   hdd  4.00000         osd.1                              up  1.00000 1.00000
  2   hdd  4.00000         osd.2                              up  1.00000 1.00000
  3   hdd  4.00000         osd.3                              up  1.00000 1.00000
 -3       16.00000     host ns-storage-020101
  4   hdd  4.00000         osd.4                              up  1.00000 1.00000
  5   hdd  4.00000         osd.5                              up  1.00000 1.00000
  6   hdd  4.00000         osd.6                              up  1.00000 1.00000
  7   hdd  4.00000         osd.7                              up  1.00000 1.00000
 -4       16.00000     host ns-storage-020102
  8   hdd  4.00000         osd.8                              up  1.00000 1.00000
  9   hdd  4.00000         osd.9                              up  1.00000 1.00000
 10   hdd  4.00000         osd.10                             up  1.00000 1.00000
 11   hdd  4.00000         osd.11                             up  1.00000 1.00000

之前设定了两个 crush 规则

# ceph osd crush rule ls
replicated_rule    ( 默认 default  root )
noah_rule          ( 使用 noah root )

把 cephfs_data 存放至 noah root 下

# ceph osd pool set cephfs_data crush_rule noah_rule
set pool 4 crush_rule to noah_rule

确认一下规则

#  ceph osd dump | grep "^pool" | grep "crush" | grep cephfs
pool 4 'cephfs_data' replicated size 3 min_size 1 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 256 last_change 17801 flags hashpspool stripe_width 0 application cephfs
pool 5 'cephfs_metadata' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 last_change 17799 flags hashpspool stripe_width 0 application cephfs

当你执行了 CRUSH RULE 迁移操作，那么数据自然会执行 OSD 之间的迁移

ceph -s 
。。。
。。。
  data:
    pools:   5 pools, 1664 pgs
    objects: 1295 objects, 4727 MB
    usage:   53298 MB used, 66954 GB / 67006 GB avail
    pgs:     4726/3885 objects degraded (121.647%)
             1437 active+clean
             223  active+recovery_wait+degraded
             4    active+recovering+degraded
  io:
    recovery: 40888 kB/s, 9 objects/s