声明:
本博客欢迎转载,但请保留原作者信息!
作者:华为云计算工程师 林凯
团队:华为杭州研发中心OpenStack社区团队
本文是在个人学习过程中整理和总结,由于时间和个人能力有限,错误之处在所难免,欢迎指正!
OpenStack Neutron,是专注于为OpenStack提供网络服务的项目。对Neutron各个组件的介绍请看这一篇博客:http://www.openstack.cn/p1745.html。
引用其中对L2 Agent的组件的介绍:L2Agent通常运行在Hypervisor,与neutron-server通过RPC通信,监听并通知设备的变化,创建新的设备来确保网络segment的正确性,应用security groups规则等。例如,OVS Agent,使用Open vSwitch来实现VLAN, GRE,VxLAN来实现网络的隔离,还包括了网络流量的转发控制。
本篇博客将对Neutron中的OVS Agent组件启动源码进行解析。
OVS Agent组件启动大致流程如下图所示:
接下来,让我们真正开始OVS Agent组件启动源码的解析
(1) /neutron/plugins/openvswitch/agent/ovs-neutron-agent.py中的main()
<span style="font-size:14px;">def main():
cfg.CONF.register_opts(ip_lib.OPTS)
common_config.init(sys.argv[1:])
common_config.setup_logging(cfg.CONF)
q_utils.log_opt_values(LOG)
try:
agent_config = create_agent_config_map(cfg.CONF)
except ValueError as e:
LOG.error(_('%s Agent terminated!'), e)
sys.exit(1)
is_xen_compute_host = 'rootwrap-xen-dom0' in agent_config['root_helper']
if is_xen_compute_host:
# Force ip_lib to always use the root helper to ensure that ip
# commands target xen dom0 rather than domU.
cfg.CONF.set_default('ip_lib_force_root', True)
<span style="color:#ff0000;">agent = OVSNeutronAgent(**agent_config) (1)</span>
signal.signal(signal.SIGTERM, agent._handle_sigterm)
# Start everything.
LOG.info(_("Agent initialized successfully, now running... "))
<span style="color:#ff0000;">agent.daemon_loop() (2)</span>
</span>
上述代码中,最重要的函数是(1)函数和(2)函数,(1)函数主要的工作是实例化一个OVSAgent,并完成OVS Agent的一系列初始化工作,(2)函数一直在循环检查一些状态,发现状态发生变化,执行相应的操作。
接下来,首先仔细分析(1)函数中实例化OVS Agent,那么在实例化这个OVS Agent时,它做了哪些初始化工作。<span style="font-size:14px;">def __init__(self, integ_br, tun_br, local_ip,
bridge_mappings, root_helper,
polling_interval, tunnel_types=None,
veth_mtu=None, l2_population=False,
enable_distributed_routing=False,
minimize_polling=False,
ovsdb_monitor_respawn_interval=(
constants.DEFAULT_OVSDBMON_RESPAWN),
arp_responder=False,
use_veth_interconnection=False):
super(OVSNeutronAgent, self).__init__()
self.use_veth_interconnection = use_veth_interconnection
self.veth_mtu = veth_mtu
self.root_helper = root_helper
self.available_local_vlans = set(moves.xrange(q_const.MIN_VLAN_TAG,
q_const.MAX_VLAN_TAG))
self.tunnel_types = tunnel_types or []
self.l2_pop = l2_population
# TODO(ethuleau): Change ARP responder so it's not dependent on the
# ML2 l2 population mechanism driver.
# enable_distributed_routing是否使能分布式路由
self.enable_distributed_routing = enable_distributed_routing
self.arp_responder_enabled = arp_responder and self.l2_pop
self.agent_state = {
'binary': 'neutron-openvswitch-agent',
'host': cfg.CONF.host,
'topic': q_const.L2_AGENT_TOPIC,
'configurations': {'bridge_mappings': bridge_mappings,
'tunnel_types': self.tunnel_types,
'tunneling_ip': local_ip,
'l2_population': self.l2_pop,
'arp_responder_enabled':
self.arp_responder_enabled,
'enable_distributed_routing':
self.enable_distributed_routing},
'agent_type': q_const.AGENT_TYPE_OVS,
'start_flag': True}
# Keep track of int_br's device count for use by _report_state()
self.int_br_device_count = 0
self.int_br = ovs_lib.OVSBridge(integ_br, self.root_helper)
# setup_integration_br:安装整合网桥——int_br
# 创建patch ports,并移除所有现有的流规则
# 添加基本的流规则
<span style="color:#ff0000;">self.setup_integration_br() (1)</span>
# Stores port update notifications for processing in main rpc loop
self.updated_ports = set()
# setup_rpc完成以下任务:
# 设置plugin_rpc,这是用来与neutron-server通信的
# 设置state_rpc,用于agent状态信息上报
# 设置connection,用于接收neutron-server的消息
# 启动状态周期上报
<span style="color:#ff0000;">self.setup_rpc() (2)</span>
self.bridge_mappings = bridge_mappings
# 创建物理网络网桥,并用veth与br-int连接起来
<span style="color:#ff0000;">self.setup_physical_bridges(self.bridge_mappings) <span style="white-space:pre"> </span>(3)</span>
self.local_vlan_map = {}
self.tun_br_ofports = {p_const.TYPE_GRE: {},
p_const.TYPE_VXLAN: {}}
self.polling_interval = polling_interval
self.minimize_polling = minimize_polling
self.ovsdb_monitor_respawn_interval = ovsdb_monitor_respawn_interval
if tunnel_types:
self.enable_tunneling = True
else:
self.enable_tunneling = False
self.local_ip = local_ip
self.tunnel_count = 0
self.vxlan_udp_port = cfg.CONF.AGENT.vxlan_udp_port
self.dont_fragment = cfg.CONF.AGENT.dont_fragment
self.tun_br = None
self.patch_int_ofport = constants.OFPORT_INVALID
self.patch_tun_ofport = constants.OFPORT_INVALID
if self.enable_tunneling:
# The patch_int_ofport and patch_tun_ofport are updated
# here inside the call to setup_tunnel_br
self.setup_tunnel_br(tun_br)
<span style="color:#ff0000;">self.dvr_agent = ovs_dvr_neutron_agent.OVSDVRNeutronAgent(
self.context,
self.plugin_rpc,
self.int_br,
self.tun_br,
self.patch_int_ofport,
self.patch_tun_ofport,
cfg.CONF.host,
self.enable_tunneling,
self.enable_distributed_routing) (4)</span>
self.dvr_agent.setup_dvr_flows_on_integ_tun_br()
# Collect additional bridges to monitor
self.ancillary_brs = self.setup_ancillary_bridges(integ_br, tun_br)
# Security group agent support
<span style="color:#ff0000;">self.sg_agent = OVSSecurityGroupAgent(self.context,
self.plugin_rpc,
root_helper) <span style="white-space:pre"> </span>(5)</span>
# Initialize iteration counter
self.iter_num = 0
<span style="color:#ff0000;">self.run_daemon_loop = True <span style="white-space:pre"> </span>(6)</span>
</span>
在构造函数中,有(1)-(6)等函数完成了重要的初始化工作。首先来看(1)函数self.setup_integration_br()中的内容
<span style="font-size:14px;">def setup_integration_br(self):
"""
安装integration网桥
创建patch ports,并移除所有现有的流规则
添加基本的流规则
"""
# Ensure the integration bridge is created.
# ovs_lib.OVSBridge.create() will run
# ovs-vsctl -- --may-exist add-br BRIDGE_NAME
# which does nothing if bridge already exists.
# 通过执行ovs-vsctl中add-br创建int_br
self.int_br.create()
self.int_br.set_secure_mode()
# del-port删除patch
self.int_br.delete_port(cfg.CONF.OVS.int_peer_patch_port)
# 通过ovs-ofctl移除所有流规则
self.int_br.remove_all_flows()
# switch all traffic using L2 learning
# 增加actions为normal,优先级为1的流规则
# 用L2学习来交换所有通信内容
self.int_br.add_flow(priority=1, actions="normal")
# Add a canary flow to int_br to track OVS restarts
# 添加canary流规则给int_br来跟踪OVS的重启 优先级0级,actions drop
self.int_br.add_flow(table=constants.CANARY_TABLE, priority=0,
actions="drop")
</span>
函数的内容很明显,就是完成安装integration网桥br-int,具体操作内容可以参考代码中的注释。br-int建立完成之后,将原有的流规则删除,并会添加两条基础的流规则,我们来看下这两条流规则的作用是什么?第一条流规则是优先级为1、actions为normal的流规则,这个规则是用来将连接到br-int的网络设备的通信内容进行转发给所有其他网络设备;第二条流规则是优先级为0、actions为drop的流规则,用来跟踪OVS的重启,这个功能在后面循环中会分析到。
之后,我们来看第二个函数self.setup_rpc()的具体内容。
<span style="font-size:14px;">def setup_rpc(self):
self.agent_id = 'ovs-agent-%s' % cfg.CONF.host
self.topic = topics.AGENT
# 设置plugin_rpc,用来与neutron-server通信的
self.plugin_rpc = OVSPluginApi(topics.PLUGIN)
# 设置state_rpc,用于agent状态信息上报
self.state_rpc = agent_rpc.PluginReportStateAPI(topics.PLUGIN)
# 设置connection,并添加consumers,用于接收neutron-server的消息
# RPC network init
self.context = context.get_admin_context_without_session()
# Handle updates from service
self.endpoints = [self]
# Define the listening consumers for the agent
consumers = [[topics.PORT, topics.UPDATE],
[topics.NETWORK, topics.DELETE],
[constants.TUNNEL, topics.UPDATE],
[topics.SECURITY_GROUP, topics.UPDATE],
[topics.DVR, topics.UPDATE]]
if self.l2_pop:
consumers.append([topics.L2POPULATION,
topics.UPDATE, cfg.CONF.host])
self.connection = agent_rpc.create_consumers(self.endpoints,
self.topic,
consumers)
# 启动心跳周期上报
report_interval = cfg.CONF.AGENT.report_interval
if report_interval:
heartbeat = loopingcall.FixedIntervalLoopingCall(
self._report_state)
heartbeat.start(interval=report_interval)
</span>
通过代码的分析,我们可以看到这个函数中分别设置用来与neutron-server通信的plugin_rpc,设置了用于agent状态信息上报的state_rpc,设置用于接收neutron-server的消息connection, 并且启动心跳的周期上报,周期默认为30s。Neutron server端启动了rpc_listeners,对agent发过来的消息进行监听,对于心跳的监听,是如果接收到心跳信号,就会对数据库中的时间戳进行更新,如果一直不更新时间戳,当前时间减去更新的时间戳,如果超过默认的agent_down_time=75s,则认为agent处于down的状态。
接下来解析(3)函数self.setup_physical_bridges(self.bridge_mappings),具体内容如下:
<span style="font-size:14px;">def setup_physical_bridges(self, bridge_mappings):
'''Setup the physical network bridges.
Creates physical network bridges and links them to the
integration bridge using veths.
:param bridge_mappings: map physical network names to bridge names.
'''
"""
安装物理网络网桥
创建物理网络网桥,并用veth/patchs与br-int连接起来
"""
self.phys_brs = {}
self.int_ofports = {}
self.phys_ofports = {}
ip_wrapper = ip_lib.IPWrapper(self.root_helper)
ovs_bridges = ovs_lib.get_bridges(self.root_helper)
for physical_network, bridge in bridge_mappings.iteritems():
LOG.info(_("Mapping physical network %(physical_network)s to "
"bridge %(bridge)s"),
{'physical_network': physical_network,
'bridge': bridge})
# setup physical bridge
if bridge not in ovs_bridges:
LOG.error(_("Bridge %(bridge)s for physical network "
"%(physical_network)s does not exist. Agent "
"terminated!"),
{'physical_network': physical_network,
'bridge': bridge})
sys.exit(1)
br = ovs_lib.OVSBridge(bridge, self.root_helper)
br.remove_all_flows()
br.add_flow(priority=1, actions="normal")
self.phys_brs[physical_network] = br
# 使用veth/patchs使br-eth1与br-int互联
# 删除原有的patchs,创建int-br-eth1和phy-br-eth1
# 使用ovs-vsctl show
# interconnect physical and integration bridges using veth/patchs
int_if_name = self.get_peer_name(constants.PEER_INTEGRATION_PREFIX,
bridge)
phys_if_name = self.get_peer_name(constants.PEER_PHYSICAL_PREFIX,
bridge)
self.int_br.delete_port(int_if_name)
br.delete_port(phys_if_name)
if self.use_veth_interconnection:
if ip_lib.device_exists(int_if_name, self.root_helper):
ip_lib.IPDevice(int_if_name,
self.root_helper).link.delete()
# Give udev a chance to process its rules here, to avoid
# race conditions between commands launched by udev rules
# and the subsequent call to ip_wrapper.add_veth
utils.execute(['/sbin/udevadm', 'settle', '--timeout=10'])
# 通过ip netns exec 'namespace' ip link add veth命令添加veth
int_veth, phys_veth = ip_wrapper.add_veth(int_if_name,
phys_if_name)
int_ofport = self.int_br.add_port(int_veth)
phys_ofport = br.add_port(phys_veth)
else:
# Create patch ports without associating them in order to block
# untranslated traffic before association
int_ofport = self.int_br.add_patch_port(
int_if_name, constants.NONEXISTENT_PEER)
phys_ofport = br.add_patch_port(
phys_if_name, constants.NONEXISTENT_PEER)
self.int_ofports[physical_network] = int_ofport
self.phys_ofports[physical_network] = phys_ofport
# 封锁桥梁之间的所有通信翻译
# block all untranslated traffic between bridges
self.int_br.add_flow(priority=2, in_port=int_ofport,
actions="drop")
br.add_flow(priority=2, in_port=phys_ofport, actions="drop")
if self.use_veth_interconnection:
# 使能veth传递通信
# enable veth to pass traffic
int_veth.link.set_up()
phys_veth.link.set_up()
if self.veth_mtu:
# set up mtu size for veth interfaces
int_veth.link.set_mtu(self.veth_mtu)
phys_veth.link.set_mtu(self.veth_mtu)
else:
# 关联patch ports传递通信
# associate patch ports to pass traffic
self.int_br.set_db_attribute('Interface', int_if_name,
'options:peer', phys_if_name)
br.set_db_attribute('Interface', phys_if_name,
'options:peer', int_if_name)
</span>
在setup_physical_bridges这个函数中,完成了物理网桥br-eth*的创建,创建好网桥之后,与安装br-int一样,首先删除了现有的所有流规则,并添加了同样为normal的流规则,用以转发消息,接下来是与br-int不同的地方,根据use_veth_interconnection决定是否使用veth与br-int进行连接,并配置veth或者patch port,然后通过设置drop流规则,封锁桥之间的通信,然后使能veth或者patch ports进行通信。
(4)函数与(5)函数分别是对DVR Agent(分布式路由代理)和Security Group Agent(安全组代理)的初始化工作,用于处理DVR和security group,这部分的内容将在之后的博客介绍。
最后把run_daemon_loop变量置为True,开始循环查询的工作。当run_daemon_loop变量置为True,main函数调用daemon_loop函数,之后调用rpc_loop函数,我们来看下rpc_loop函数都完成了哪些工作。<span style="font-size:14px;">def rpc_loop(self, polling_manager=None):
if not polling_manager:
polling_manager = polling.AlwaysPoll()
# 初始化设置
sync = True
ports = set()
updated_ports_copy = set()
ancillary_ports = set()
tunnel_sync = True
ovs_restarted = False
# 进入循环
while self.run_daemon_loop:
start = time.time()
port_stats = {'regular': {'added': 0,
'updated': 0,
'removed': 0},
'ancillary': {'added': 0,
'removed': 0}}
LOG.debug(_("Agent rpc_loop - iteration:%d started"),
self.iter_num)
if sync:
LOG.info(_("Agent out of sync with plugin!"))
ports.clear()
ancillary_ports.clear()
sync = False
polling_manager.force_polling()
# 根据之前在br-int中设置canary flow的有无判断是否进行restart操作
ovs_restarted = self.check_ovs_restart()
if ovs_restarted:
......
# Notify the plugin of tunnel IP
if self.enable_tunneling and tunnel_sync:
......
if self._agent_has_updates(polling_manager) or ovs_restarted:
try:
LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d - "
"starting polling. Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
updated_ports_copy = self.updated_ports
self.updated_ports = set()
reg_ports = (set() if ovs_restarted else ports)
<span style="color:#ff0000;">port_info = self.scan_ports(reg_ports, updated_ports_copy) (1)</span>
LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d - "
"port information retrieved. "
"Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
# Secure and wire/unwire VIFs and update their status
# on Neutron server
if (self._port_info_has_changes(port_info) or
self.sg_agent.firewall_refresh_needed() or
ovs_restarted):
LOG.debug(_("Starting to process devices in:%s"),
port_info)
# If treat devices fails - must resync with plugin
<span style="color:#ff0000;">sync = self.process_network_ports(port_info,
ovs_restarted) (2)</span>
LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d -"
"ports processed. Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
port_stats['regular']['added'] = (
len(port_info.get('added', [])))
port_stats['regular']['updated'] = (
len(port_info.get('updated', [])))
port_stats['regular']['removed'] = (
len(port_info.get('removed', [])))
ports = port_info['current']
# Treat ancillary devices if they exist
if self.ancillary_brs:
port_info = self.update_ancillary_ports(
ancillary_ports)
LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d -"
"ancillary port info retrieved. "
"Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
if port_info:
rc = self.process_ancillary_network_ports(
port_info)
LOG.debug(_("Agent rpc_loop - iteration:"
"%(iter_num)d - ancillary ports "
"processed. Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
ancillary_ports = port_info['current']
port_stats['ancillary']['added'] = (
len(port_info.get('added', [])))
port_stats['ancillary']['removed'] = (
len(port_info.get('removed', [])))
sync = sync | rc
polling_manager.polling_completed()
except Exception:
LOG.exception(_("Error while processing VIF ports"))
# Put the ports back in self.updated_port
self.updated_ports |= updated_ports_copy
sync = True
# sleep till end of polling interval
elapsed = (time.time() - start)
LOG.debug(_("Agent rpc_loop - iteration:%(iter_num)d "
"completed. Processed ports statistics: "
"%(port_stats)s. Elapsed:%(elapsed).3f"),
{'iter_num': self.iter_num,
'port_stats': port_stats,
'elapsed': elapsed})
if (elapsed < self.polling_interval):
time.sleep(self.polling_interval - elapsed)
else:
LOG.debug(_("Loop iteration exceeded interval "
"(%(polling_interval)s vs. %(elapsed)s)!"),
{'polling_interval': self.polling_interval,
'elapsed': elapsed})
self.iter_num = self.iter_num + 1
</span>
rpc_loop做的工作很明显就是进行循环地查询一些状态,根据这些状态,进行相应的操作,其中最重要的工作就是扫描数据库中的ports信息,然后对这些信息进行处理,所以我们来看(1)函数,看下它是怎么提取这些ports信息
<span style="font-size:14px;">def scan_ports(self, registered_ports, updated_ports=None):
# 通过ovs-vsctl命令获取数据库中port设置信息
cur_ports = self.int_br.get_vif_port_set()
self.int_br_device_count = len(cur_ports)
port_info = {'current': cur_ports}
if updated_ports is None:
updated_ports = set()
# 获取已经注册的port的更新信息
updated_ports.update(self.check_changed_vlans(registered_ports))
if updated_ports:
# Some updated ports might have been removed in the
# meanwhile, and therefore should not be processed.
# In this case the updated port won't be found among
# current ports.
updated_ports &= cur_ports
# 更新updated_ports的数量
if updated_ports:
port_info['updated'] = updated_ports
# FIXME(salv-orlando): It's not really necessary to return early
# if nothing has changed.
if cur_ports == registered_ports:
# No added or removed ports to set, just return here
return port_info
# 更新added_ports的数量
port_info['added'] = cur_ports - registered_ports
# Remove all the known ports not found on the integration bridge
# 更新removed_ports的数量,移除所有没有在br-int上发现的已知ports
port_info['removed'] = registered_ports - cur_ports
return port_info
</span>
获取到port_info之后就要根据这些信息,对port进行真正的操作,真正的操作就在(2)函数process_network_ports中进行。<span style="font-size:14px;">def process_network_ports(self, port_info, ovs_restarted):
resync_a = False
resync_b = False
# 取出更新和添加的prot信息
devices_added_updated = (port_info.get('added', set()) |
port_info.get('updated', set()))
if devices_added_updated:
start = time.time()
try:
# treat_devices_added_or_updated根据是否已经存在这个port分别进行添加和更新的操作
# 添加:skipped_devices.append(device)进行添加之后,将做与update一样的操作
# 更新:通过treat_vif_port将port添加并且绑定到net_uuid/lsw_id并且 为没有绑定的通信设置流规则
skipped_devices = self.treat_devices_added_or_updated(
devices_added_updated, ovs_restarted)
LOG.debug(_("process_network_ports - iteration:%(iter_num)d -"
"treat_devices_added_or_updated completed. "
"Skipped %(num_skipped)d devices of "
"%(num_current)d devices currently available. "
"Time elapsed: %(elapsed).3f"),
{'iter_num': self.iter_num,
'num_skipped': len(skipped_devices),
'num_current': len(port_info['current']),
'elapsed': time.time() - start})
# Update the list of current ports storing only those which
# have been actually processed.
port_info['current'] = (port_info['current'] -
set(skipped_devices))
except DeviceListRetrievalError:
# Need to resync as there was an error with server
# communication.
LOG.exception(_("process_network_ports - iteration:%d - "
"failure while retrieving port details "
"from server"), self.iter_num)
resync_a = True
if 'removed' in port_info:
start = time.time()
# 完成移除port的功能,通过发送RPC命令给Neutron server完成
resync_b = self.treat_devices_removed(port_info['removed'])
LOG.debug(_("process_network_ports - iteration:%(iter_num)d -"
"treat_devices_removed completed in %(elapsed).3f"),
{'iter_num': self.iter_num,
'elapsed': time.time() - start})
# If one of the above operations fails => resync with plugin
return (resync_a | resync_b)
</span>
从代码的解释可以看到,process_network_ports完成了port的添加,删除和更新的操作。之后循环检测是否已经到了循环间隔,如果还没有到间隔时间就sleep到那个时间,然后继续循环工作。
至此,我们也就完成OVS Agent的启动源码解析。

本文详细解析了OpenStack Neutron中OVS Agent的启动源码,包括main()函数的初始化工作,如创建integration网桥br-int,设置基础流规则,以及setup_physical_bridges()函数中物理网桥的创建与配置。此外,还介绍了对DVR Agent和Security Group Agent的初始化。整个启动过程涉及网络设备的连接、流规则的设定以及与Hypervisor的交互。

2225

被折叠的 条评论
为什么被折叠?



