部署一主两备一级联集群报错如下
[omm@host0001 script]$ gs_install -X /home/omm/install/cluster_config.xml
Parsing the configuration file.
Successfully checked gs_uninstall on every node.
Check preinstall on every node.
Successfully checked preinstall on every node.
Creating the backup directory.
Successfully created the backup directory.
begin deploy..
Installing the cluster.
begin prepare Install Cluster..
Checking the installation environment on all nodes.
begin install Cluster..
Installing applications on all nodes.
Successfully installed APP.
begin init Instance..
encrypt cipher and rand files for database.
Please enter password for database:
Please repeat for database:
begin to create CA cert files
The sslcert will be generated in /home/omm/cluster/app/share/sslcert/om
Create CA files for cm beginning.
Create CA files on directory [/home/omm/cluster/app_6285c0ef/share/sslcert/cm]. file list: ['cacert.pem', 'server.key', 'server.crt', 'client.key', 'client.crt', 'server.key.cipher', 'server.key.rand', 'client.key.cipher', 'client.key.rand']
Non-dss_ssl_enable, no need to create CA for DSS
Cluster installation is completed.
Configuring.
Deleting instances from all nodes.
Successfully deleted instances from all nodes.
Checking node configuration on all nodes.
Initializing instances on all nodes.
Updating instance configuration on all nodes.
Check consistence of memCheck and coresCheck on database nodes.
Successfully check consistence of memCheck and coresCheck on all nodes.
Configuring pg_hba on all nodes.
Configuration is completed.
Starting cluster.
======================================================================
Successfully started primary instance. Wait for standby instance.
======================================================================
............
............
............
............
........
Failed to start cluster in (300)s.
It will continue to start in the background.
If you want to see the cluster status, please try command gs_om -t status.
If you want to stop the cluster, please try command gs_om -t stop.
======================================================================
Successfully installed application.
end deploy..
查看状态
[omm@host0001 script]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------------------------------
1 host0001 192.168.122.11 1 /home/omm/cluster/data/cmserver/cm_server Standby
2 host0002 192.168.122.12 2 /home/omm/cluster/data/cmserver/cm_server Down
3 host0003 192.168.122.13 3 /home/omm/cluster/data/cmserver/cm_server Down
4 host0004 192.168.122.14 4 /home/omm/cluster/data/cmserver/cm_server Down
cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
每个节点都只显示自己是standby
[omm@host0002 ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------------------------------
1 host0001 192.168.122.11 1 /home/omm/cluster/data/cmserver/cm_server Down
2 host0002 192.168.122.12 2 /home/omm/cluster/data/cmserver/cm_server Standby
3 host0003 192.168.122.13 3 /home/omm/cluster/data/cmserver/cm_server Down
4 host0004 192.168.122.14 4 /home/omm/cluster/data/cmserver/cm_server Down
cm_ctl: can't connect to cm_server.
Maybe cm_server is not running, or timeout expired. Please try again.
官网手册也无法解决
因 Cm不满足多数派原则导致集群无法使用的问题 | openGauss文档 | openGauss社区
一主两备集群停掉两个节点数据库异常 | openGauss文档 | openGauss社区
多次重试后找到CM的日志报错如下:
2026-04-10 17:00:11.239 tid=32592 MAIN ERROR: invalid host(192.168.122.1), sockfd=149, remote_type=7.
2026-04-10 17:00:11.239 tid=32592 MAIN LOG: close connection sock [fd=149], type is 7, nodeid 2.
2026-04-10 17:00:11.251 tid=32592 MAIN LOG: process startup packet, remote_type 7, nodeid 3, node name host0003, postmaster is false.
2026-04-10 17:00:11.251 tid=32592 MAIN ERROR: invalid host(192.168.122.1), sockfd=149, remote_type=7.
2026-04-10 17:00:11.251 tid=32592 MAIN LOG: close connection sock [fd=149], type is 7, nodeid 3.
2026-04-10 17:00:11.414 tid=32592 MAIN LOG: process startup packet, remote_type 7, nodeid 4, node name host0004, postmaster is false.
2026-04-10 17:00:11.414 tid=32592 MAIN ERROR: invalid host(192.168.122.1), sockfd=149, remote_type=7.
2026-04-10 17:00:11.414 tid=32592 MAIN LOG: close connection sock [fd=149], type is 7, nodeid 4.
2026-04-10 17:00:11.442 tid=32592 MAIN LOG: process startup packet, remote_type 7, nodeid 1, node name host0001, postmaster is false.
2026-04-10 17:00:11.442 tid=32592 MAIN LOG: close connection sock [fd=149], type is 7, nodeid 1.
2026-04-10 17:00:11.445 tid=32592 MAIN LOG: process startup packet, remote_type 7, nodeid 2, node name host0002, postmaster is false.
2026-04-10 17:00:11.445 tid=32592 MAIN ERROR: invalid host(192.168.122.1), sockfd=149, remote_type=7.
2026-04-10 17:00:11.446 tid=32592 MAIN LOG: close connection sock [fd=149], type is 7, nodeid 2.
2026-04-10 17:00:11.456 tid=32592 MAIN LOG: process startup packet, remote_type 7, nodeid 3, node name host0003, postmaster is false.
192.168.122.1这是网关地址。
最后发现是我环境问题:
1.我的环境是Linux物理机上使用KVM部署了4台Linux的centos7.9虚拟机
2.为了让虚拟机能联网,于是在物理机上执行了以下代码,
iptables -t nat -A POSTROUTING -j MASQUERADE
貌似是从虚拟机走出去的包会进行地址伪装。
解决方法:
1.物理机上删除上面的那条命令
iptables -t nat -D POSTROUTING -j MASQUERADE
2.增加了新的规则
iptables -t nat -A POSTROUTING -s 192.168.122.0/24 -o br0 -j MASQUERADE
br0是物理机的出口地址。
最终效果:
[omm@host0003 ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------------------------------
1 host0001 192.168.122.11 1 /home/omm/cluster/data/cmserver/cm_server Standby
2 host0002 192.168.122.12 2 /home/omm/cluster/data/cmserver/cm_server Standby
3 host0003 192.168.122.13 3 /home/omm/cluster/data/cmserver/cm_server Standby
4 host0004 192.168.122.14 4 /home/omm/cluster/data/cmserver/cm_server Primary
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
-----------------------------------------------------------------------------------
1 host0001 192.168.122.11 6001 15400 /home/omm/cluster/data/dn P Primary Normal
2 host0002 192.168.122.12 6002 15400 /home/omm/cluster/data/dn S Standby Need repair(System id not matched)
3 host0003 192.168.122.13 6003 15400 /home/omm/cluster/data/dn S Standby Building(66%)
4 host0004 192.168.122.14 6004 15400 /home/omm/cluster/data/dn C Standby Building(67%)
[omm@host0003 ~]$ gs_om -t status --detail
[ CMServer State ]
node node_ip instance state
----------------------------------------------------------------------------------
1 host0001 192.168.122.11 1 /home/omm/cluster/data/cmserver/cm_server Standby
2 host0002 192.168.122.12 2 /home/omm/cluster/data/cmserver/cm_server Standby
3 host0003 192.168.122.13 3 /home/omm/cluster/data/cmserver/cm_server Standby
4 host0004 192.168.122.14 4 /home/omm/cluster/data/cmserver/cm_server Primary
[ Cluster State ]
cluster_state : Normal
redistributing : No
balanced : Yes
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state
-----------------------------------------------------------------------------------
1 host0001 192.168.122.11 6001 15400 /home/omm/cluster/data/dn P Primary Normal
2 host0002 192.168.122.12 6002 15400 /home/omm/cluster/data/dn S Standby Normal
3 host0003 192.168.122.13 6003 15400 /home/omm/cluster/data/dn S Standby Normal
4 host0004 192.168.122.14 6004 15400 /home/omm/cluster/data/dn C Cascade Standby Normal

403

被折叠的 条评论
为什么被折叠?



