nacos的心跳机制

本文详细介绍了Nacos服务注册过程中的心跳机制。从NacosServiceRegistry.register()方法开始,阐述了如何组装心跳包BeatInfo,通过BeatReactor线程、ScheduledExecutorService以及BeatTask线程实现心跳发送。接着分析了如何通过心跳判断实例存活,包括HealthCheckReactor的定时任务ClientBeatCheckTask,检查实例心跳并根据超时情况进行实例健康状态更新和删除操作。此外,还提及了InstanceController.beat()方法在实例不存在时自动创建实例并启动心跳检查的逻辑。

了解nacos的心跳机制,需要先了解nacos的服务注册原理;可先阅读https://blog.csdn.net/LiaoHongHB/article/details/103993074

当nacos进行服务注册的时候,NacosServiceRegistry.class会调用register()方法进行服务注册,该方法中调用了namingService.registerInstance()方法进行服务注册的逻辑。

@Override
	public void register(Registration registration) {

		if (StringUtils.isEmpty(registration.getServiceId())) {
			log.warn("No service to register for nacos client...");
			return;
		}

		String serviceId = registration.getServiceId();

		Instance instance = new Instance();
		instance.setIp(registration.getHost());
		instance.setPort(registration.getPort());
		instance.setWeight(nacosDiscoveryProperties.getWeight());
		instance.setClusterName(nacosDiscoveryProperties.getClusterName());
		instance.setMetadata(registration.getMetadata());

		try {
			namingService.registerInstance(serviceId, instance);
			log.info("nacos registry, {} {}:{} register finished", serviceId,
					instance.getIp(), instance.getPort());
		}
		catch (Exception e) {
			log.error("nacos registry, {} register failed...{},", serviceId,
					registration.toString(), e);
		}
	}

NacosNamingService实现了NamingService的接口;然后在namingService.registerInstance()方法中,会做两件事情,第一件事就是组装心跳包BeatInfo,并且发送心跳:

public void registerInstance(String serviceName, String groupName, Instance instance) throws NacosException {
        if (instance.isEphemeral()) {
            BeatInfo beatInfo = new BeatInfo();
            beatInfo.setServiceName(NamingUtils.getGroupedName(serviceName, groupName));
            beatInfo.setIp(instance.getIp());
            beatInfo.setPort(instance.getPort());
            beatInfo.setCluster(instance.getClusterName());
            beatInfo.setWeight(instance.getWeight());
            beatInfo.setMetadata(instance.getMetadata());
            beatInfo.setScheduled(false);
            this.beatReactor.addBeatInfo(NamingUtils.getGroupedName(serviceName, groupName), beatInfo);
        }

        this.serverProxy.registerService(NamingUtils.getGroupedName(serviceName, groupName), groupName, instance);
    }

NacosNamingService中的构造函数,会调用init()方法,然后在init方法中会执行一个BeatReactor线程

NacosNamingService中的构造函数和init()方法:

public NacosNamingService(Properties properties) {
        this.init(properties);
    }

    private void init(Properties properties) {
        this.serverList = properties.getProperty("serverAddr");
        this.initNamespace(properties);
        this.initEndpoint(properties);
        this.initWebRootContext();
        this.initCacheDir();
        this.initLogName(properties);
        this.eventDispatcher = new EventDispatcher();
        this.serverProxy = new NamingProxy(this.namespace, this.endpoint, this.serverList);
        this.serverProxy.setProperties(properties);
        //执行心跳的线程
        this.beatReactor = new BeatReactor(this.serverProxy, this.initClientBeatThreadCount(properties));
        this.hostReactor = new HostReactor(this.eventDispatcher, this.serverProxy, this.cacheDir, this.isLoadCacheAtStart(properties), this.initPollingThreadCount(properties));
    }

BeatReactor的构造函数中创建了一个ScheduledExecutorService线程操作对象,然后执行的方法是BeatReactor.BeatProcessor();在BeatProcessor()方法中又执行了一个线程操作,BeatTask线程,然后在BeatTask线程中调用了sendBeat()方法,将心跳包作为参数;

BeatReactor的构造函数:创建一个线程执行类,并执行BeatProcessor()方法

public BeatReactor(NamingProxy serverProxy, int threadCount) {
        this.clientBeatInterval = 5000L;
        this.dom2Beat = new ConcurrentHashMap();
        this.serverProxy = serverProxy;
        //创建一个线程执行类,并执行BeatProcessor()方法
        this.executorService = new ScheduledThreadPoolExecutor(threadCount, new ThreadFactory() {
            public Thread newThread(Runnable r) {
                Thread thread = new Thread(r);
                thread.setDaemon(true);
                thread.setName("com.alibaba.nacos.naming.beat.sender");
                return thread;
            }
        });
        this.executorService.schedule(new BeatReactor.BeatProcessor(), 0L, TimeUnit.MILLISECONDS);
    }

BeatProcessor类中的线程操作:执行一个BeatTask线程

public void run() {
            try {
                Iterator var1 = BeatReactor.this.dom2Beat.entrySet().iterator();

                while(var1.hasNext()) {
                    Entry<String, BeatInfo> entry = (Entry)var1.next();
                    BeatInfo beatInfo = (BeatInfo)entry.getValue();
                    if (!beatInfo.isScheduled()) {
                        beatInfo.setScheduled(true);
                        //执行一个BeatTask线程
                        BeatReactor.this.executorService.schedule(BeatReactor.this.new BeatTask(beatInfo), 0L, TimeUnit.MILLISECONDS);
                    }
                }
            } catch (Exception var7) {
                LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] Exception while scheduling beat.", var7);
            } finally {
                BeatReactor.this.executorService.schedule(this, BeatReactor.this.clientBeatInterval, TimeUnit.MILLISECONDS);
            }

        }

BeatTask线程操作:调用sendBeat()方法

class BeatTask implements Runnable {
        BeatInfo beatInfo;

        public BeatTask(BeatInfo beatInfo) {
            this.beatInfo = beatInfo;
        }

        public void run() {
            //调用sendBeat()方法
            long result = BeatReactor.this.serverProxy.sendBeat(this.beatInfo);
            this.beatInfo.setScheduled(false);
            if (result > 0L) {
                BeatReactor.this.clientBeatInterval = result;
            }

        }
    }

在sendBeat()方法中,通过http服务,调用了InstanceController.beat()方法,进行心跳的确认:

public long sendBeat(BeatInfo beatInfo) {
        try {
            LogUtils.NAMING_LOGGER.info("[BEAT] {} sending beat to server: {}", this.namespaceId, beatInfo.toString());
            Map<String, String> params = new HashMap(4);
            params.put("beat", JSON.toJSONString(beatInfo));
            params.put("namespaceId", this.namespaceId);
            params.put("serviceName", beatInfo.getServiceName());
            //http远程调用
            String result = this.reqAPI(UtilAndComs.NACOS_URL_BASE + "/instance/beat", params, (String)"PUT");
            JSONObject jsonObject = JSON.parseObject(result);
            if (jsonObject != null) {
                return jsonObject.getLong("clientBeatInterval").longValue();
            }
        } catch (Exception var5) {
            LogUtils.NAMING_LOGGER.error("[CLIENT-BEAT] failed to send beat: " + JSON.toJSONString(beatInfo), var5);
        }

        return 0L;
    }

InstanceController.beat()方法

在InstanceController.beat()方法中,调用了service.processClientBeat(clientBeat)方法;在该方法中调用了HealthCheckReactor.scheduleNow(clientBeatProcessor)方法执行clientBeatProcessor的线程操作;在clientBeatProcessor线程操作中,会通过当前的ip+port找到对应的当前实例,然后调用setLastBeat()方法,最后将当前发送心跳的时间赋值到对应的属性中:

InstanceController.beat():

service.processClientBeat(clientBeat);

service.processClientBeat():

public void processClientBeat(final RsInfo rsInfo) {
        ClientBeatProcessor clientBeatProcessor = new ClientBeatProcessor();
        clientBeatProcessor.setService(this);
        clientBeatProcessor.setRsInfo(rsInfo);
        //执行一个clientBeatProcessor线程对象
        HealthCheckReactor.scheduleNow(clientBeatProcessor);
    }

 HealthCheckReactor.scheduleNow:

 public static ScheduledFuture<?> scheduleNow(Runnable task) {
        return EXECUTOR.schedule(task, 0, TimeUnit.MILLISECONDS);
    }

clientBeatProcessor线程操作:

public void run() {
        Service service = this.service;
        if (Loggers.EVT_LOG.isDebugEnabled()) {
            Loggers.EVT_LOG.debug("[CLIENT-BEAT] processing beat: {}", rsInfo.toString());
        }

        String ip = rsInfo.getIp();
        String clusterName = rsInfo.getCluster();
        int port = rsInfo.getPort();
        Cluster cluster = service.getClusterMap().get(clusterName);
        List<Instance> instances = cluster.allIPs(true);

        for (Instance instance : instances) {
            //根据ip+port获取当前的实例
            if (instance.getIp().equals(ip) && instance.getPort() == port) {
                if (Loggers.EVT_LOG.isDebugEnabled()) {
                    Loggers.EVT_LOG.debug("[CLIENT-BEAT] refresh beat: {}", rsInfo.toString());
                }
                //设置当前发送心跳的时间
                instance.setLastBeat(System.currentTimeMillis());
                if (!instance.isMarked()) {
                    if (!instance.isHealthy()) {
                        instance.setHealthy(true);
                        Loggers.EVT_LOG.info("service: {} {POS} {IP-ENABLED} valid: {}:{}@{}, region: {}, msg: client beat ok",
                            cluster.getService().getName(), ip, port, cluster.getName(), UtilsAndCommons.LOCALHOST_SITE);
                        getPushService().serviceChanged(service);
                    }
                }
            }
        }
    }

至此,nacos发送心跳的过程就到此结束。

接下俩还要分析的是,nacos是如何定时通过心跳机制判断实例是否存活的原理。

前面说到,namingService.registerInstance()方法中,会做两件事情,第一件事就是组装心跳包BeatInfo,并且发送心跳:

那么第二件事情就是向nacos注册实例,也是通过http调用的方式,将请求发送到InstanceController.register()方法中:

@PostMapping
    public String register(HttpServletRequest request) throws Exception {

        String serviceName = WebUtils.required(request, CommonParams.SERVICE_NAME);
        String namespaceId = WebUtils.optional(request, CommonParams.NAMESPACE_ID, Constants.DEFAULT_NAMESPACE_ID);

        serviceManager.registerInstance(namespaceId, serviceName, parseInstance(request));
        return "ok";
    }

该方法中调用了serviceManager.registerInstance方法,registerInstance方法中的逻辑如下:

public void registerInstance(String namespaceId, String serviceName, Instance instance) throws NacosException {
        //创建service对象
        createEmptyService(namespaceId, serviceName, instance.isEphemeral());

        Service service = getService(namespaceId, serviceName);

        if (service == null) {
            throw new NacosException(NacosException.INVALID_PARAM,
                "service not found, namespace: " + namespaceId + ", service: " + serviceName);
        }
        //将创建好的service对象放入到内存中
        addInstance(namespaceId, serviceName, instance.isEphemeral(), instance);
    }

首先会创建一个service对象,然后将该对象放入到内存中;在创建service对象的时候,逻辑如下:

public void createEmptyService(String namespaceId, String serviceName, boolean local) throws NacosException {
        createServiceIfAbsent(namespaceId, serviceName, local, null);
    }
public void createServiceIfAbsent(String namespaceId, String serviceName, boolean local, Cluster cluster) throws NacosException {
        Service service = getService(namespaceId, serviceName);
        if (service == null) {

            Loggers.SRV_LOG.info("creating empty service {}:{}", namespaceId, serviceName);
            service = new Service();
            service.setName(serviceName);
            service.setNamespaceId(namespaceId);
            service.setGroupName(NamingUtils.getGroupName(serviceName));
            // now validate the service. if failed, exception will be thrown
            service.setLastModifiedMillis(System.currentTimeMillis());
            service.recalculateChecksum();
            if (cluster != null) {
                cluster.setService(service);
                service.getClusterMap().put(cluster.getName(), cluster);
            }
            service.validate();

            putServiceAndInit(service);
            if (!local) {
                addOrReplaceService(service);
            }
        }
    }

创建完service对象之后,调用了putServiceAndInit方法:

private void putServiceAndInit(Service service) throws NacosException {
        putService(service);
        service.init();
        consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), true), service);
        consistencyService.listen(KeyBuilder.buildInstanceListKey(service.getNamespaceId(), service.getName(), false), service);
        Loggers.SRV_LOG.info("[NEW-SERVICE] {}", service.toJSON());
    }

主要看service.init()方法:

public void init() {

        HealthCheckReactor.scheduleCheck(clientBeatCheckTask);

        for (Map.Entry<String, Cluster> entry : clusterMap.entrySet()) {
            entry.getValue().setService(this);
            entry.getValue().init();
        }
    }

该方法中通过HealthCheckReactor.scheduleCheck(clientBeatCheckTask)调用了一个clientBeatCheckTask任务线程,进入到

scheduleCheck方法中:

 public static void scheduleCheck(ClientBeatCheckTask task) {
        futureMap.putIfAbsent(task.taskKey(), EXECUTOR.scheduleWithFixedDelay(task, 5000, 5000, TimeUnit.MILLISECONDS));
    }

发现,该方法中是开启了一个定时任务,这个任务是每隔5s就执行一次ClientBeatCheckTask线程操作;接下来看ClientBeatCheckTask线程操作:

@Override
    public void run() {
        try {
            if (!getDistroMapper().responsible(service.getName())) {
                return;
            }

            if (!getSwitchDomain().isHealthCheckEnabled()) {
                return;
            }

            List<Instance> instances = service.allIPs(true);

            // first set health status of instances:
            for (Instance instance : instances) {
                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getInstanceHeartBeatTimeOut()) {
                    if (!instance.isMarked()) {
                        if (instance.isHealthy()) {
                            instance.setHealthy(false);
                            Loggers.EVT_LOG.info("{POS} {IP-DISABLED} valid: {}:{}@{}@{}, region: {}, msg: client timeout after {}, last beat: {}",
                                instance.getIp(), instance.getPort(), instance.getClusterName(), service.getName(),
                                UtilsAndCommons.LOCALHOST_SITE, instance.getInstanceHeartBeatTimeOut(), instance.getLastBeat());
                            getPushService().serviceChanged(service);
                            SpringContext.getAppContext().publishEvent(new InstanceHeartbeatTimeoutEvent(this, instance));
                        }
                    }
                }
            }

            if (!getGlobalConfig().isExpireInstance()) {
                return;
            }

            // then remove obsolete instances:
            for (Instance instance : instances) {

                if (instance.isMarked()) {
                    continue;
                }

                if (System.currentTimeMillis() - instance.getLastBeat() > instance.getIpDeleteTimeout()) {
                    // delete instance
                    Loggers.SRV_LOG.info("[AUTO-DELETE-IP] service: {}, ip: {}", service.getName(), JSON.toJSONString(instance));
                    deleteIP(instance);
                }
            }

        } catch (Exception e) {
            Loggers.SRV_LOG.warn("Exception while processing client beat time out.", e);
        }

    }

发现ClientBeatCheckTask线程操作主要有两个事情:

一个是遍历所有的实例对象,判断最后一次心跳发送的时间距离当前时间是否超过了设定的值,如果是,则将该实例的health属性改为false,

第二个事情是遍历所有的实例对象,判断最后一次心跳发送的时间距离当前时间是否超过了可删除时间的值,如果是,则将该实例从内存中删除。

需要注意的是,在InstanceController.beat方法中,如果instance不存在,也会自动的去创建一个instance,调用的方法同InstanceController.register()方法,所以这里也是启动定时线程检查心跳机制的一个入口。

Instance instance = serviceManager.getInstance(namespaceId, serviceName, clusterName, ip, port);

        if (instance == null) {
            if (clientBeat == null) {
                result.put(CommonParams.CODE, NamingResponseCode.RESOURCE_NOT_FOUND);
                return result;
            }
            instance = new Instance();
            instance.setPort(clientBeat.getPort());
            instance.setIp(clientBeat.getIp());
            instance.setWeight(clientBeat.getWeight());
            instance.setMetadata(clientBeat.getMetadata());
            instance.setClusterName(clusterName);
            instance.setServiceName(serviceName);
            instance.setInstanceId(instance.getInstanceId());
            instance.setEphemeral(clientBeat.isEphemeral());

            serviceManager.registerInstance(namespaceId, serviceName, instance);
        }

        Service service = serviceManager.getService(namespaceId, serviceName);

        if (service == null) {
            throw new NacosException(NacosException.SERVER_ERROR,
                "service not found: " + serviceName + "@" + namespaceId);
        }
        if (clientBeat == null) {
            clientBeat = new RsInfo();
            clientBeat.setIp(ip);
            clientBeat.setPort(port);
            clientBeat.setCluster(clusterName);
        }
        service.processClientBeat(clientBeat);

        result.put(CommonParams.CODE, NamingResponseCode.OK);
        result.put("clientBeatInterval", instance.getInstanceHeartBeatInterval());
        result.put(SwitchEntry.LIGHT_BEAT_ENABLED, switchDomain.isLightBeatEnabled());
        return result;

 

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值