跟我学C++中级篇——编译期和运行时的性能处理分析

最新推荐文章于 2026-06-16 20:19:21 发布

原创最新推荐文章于 2026-06-16 20:19:21 发布 · 1.1k 阅读

25 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#c++

C++ 同时被 2 个专栏收录

631 篇文章

订阅专栏

C++11

468 篇文章

订阅专栏

一、性能的处理

开发者在对性能的优化过程中，一般更多的是在运行时进行处理。比如重用内存或减少CPU的占用等。但其实有不少的性能的优化其实是可以在编译期展开的，最常见的就是一些算法的展开可以在编译期直接展开计算。特别是随着C++新标准对编译期的支持处理越来越广泛（比如constexpr的引入），使得很多需要复杂的技术（如模板或元编程等）的场景下，也可以转到编译期优化。一般来说，可以把优化分成两类，一类是运行时的优化；另外一类是编译期的优化。当然，这二者不会是互斥的，开发者可以根据实际的情况确定应用哪种或二者一起使用。

二、运行时的性能及相关问题

比如有一个状态分发的应用场景，对于开发者来说，可能有以下几种方法来处理：

使用多态形成动态分发
强制转换状态，然后使用条件语句（if或switch等）来处理
使用表（map之类），然后绑定自动跳转
上述的几种方法，均是在运行时进行处理的，它们的实现有很多种方式（随机组合），但本质基本都类似。正如前面分析过if和switch的性能的问题，只要固定在某种具体的解决方式下，其基本的性能上下限就确定了。以上述三种情况为例，分析一下其特点：
多态处理
编写复杂度较高；虚函数的应用明显的会降低分发的性能，且可能需要较多的内存开销；而且其作为基于运行时的动态联编，基本也无法在编译期进行优化；最重要的是，其没有办法使用缓存
使用条件语句
应用相对复杂，特别是在多条件处理时；可能需要强制转换，导致数据类型不安全；条件语句的处理不当可能引发性能的降低（如使用大量的if elseif）；扩展不方便，容易引发各种误判和漏判等情况；很多情况下对分支预测比较麻烦；对缓存的使用一般（如果分支预测准确则会大幅提高）；对编译期优化支持也一般
表处理
表处理无论是switch内部最终形成的表，还是使用map等形成的表，性能都非常好；对缓存的支持也相对友好；编译期可以使用一些常量进行优化；但其可能对开发者的要求相对高一些。

三、编译期的优势

在分析了运行时的一些场景后，其实有些解决方式已经呼之欲出。比如上面的表处理的使用常量表达式进行优化，其实就是可以把一些运行时的处理转到编译期。在早期的C++编程中，编译期处理主要有两种情况：

宏处理
这种处理方式简单的应用情况下还好，复杂的情况几乎不是开发者能够短时间内掌握的，无论是学习成本还是应用成本都相当高。典型的就是微软的MFC中的例子。
模板特别是模板的元编程
多态有动多态和静多态。模板就可以认为是静多态(比如前面提到的CRTP)。还有包括分析过的SFINAE技术进行分发都可以实现具体的功能。这些技术最大的问题不是各种细节的问题而复杂。大多数的C++程序员可能根据没接触模板编程的实际应用，更不用提元编程技术了。

在C++11后，陆续推出了更多的对编译期友好的支持接口。如 constexpr,if constexpr，consteval,if consteval,concept,变量模板，非类型模板参数，编译期容器，内联变量，折叠表达式以及反射等。这里面其实最容易为大家理解和接受的就是constexpr的应用。看一个简单简便的例子：

template <typename T>
auto to_string(T t) {
   if constexpr(std::is_integral<T>::value) {
     return std::to_string(t);
   } else {
     return t;
   }
 }

上面的这些应用，包括if consteval等的示例代码，在前面写过很多，有兴趣的可以回去头去看一看，可以更好的加深一下印象。

四、标准库中典型的用法std::visit

在STL中有一个类型接口就体现了这种编译期优化的情况，它就是std::visit。其核心的特点在于：

静多态（编译期多态）
表机制的访问
内联优化
缓存友好
死分支消除和常量传播处理

来看一下其内部的实现：


  template<typename _Visitor, typename... _Variants>
    constexpr __detail::__variant::__visit_result_t<_Visitor, _Variants...>
    visit(_Visitor&& __visitor, _Variants&&... __variants)
    {
      namespace __variant = std::__detail::__variant;

      if ((__variant::__as(__variants).valueless_by_exception() || ...))
	__throw_bad_variant_access("std::visit: variant is valueless");

      using _Result_type
	= __detail::__variant::__visit_result_t<_Visitor, _Variants...>;

      using _Tag = __detail::__variant::__deduce_visit_result<_Result_type>;

      if constexpr (sizeof...(_Variants) == 1)
	{
	  using _Vp = decltype(__variant::__as(std::declval<_Variants>()...));

	  constexpr bool __visit_rettypes_match = __detail::__variant::
	    __check_visitor_results<_Visitor, _Vp>(
	      make_index_sequence<variant_size_v<remove_reference_t<_Vp>>>());
	  if constexpr (!__visit_rettypes_match)
	    {
	      static_assert(__visit_rettypes_match,
			  "std::visit requires the visitor to have the same "
			  "return type for all alternatives of a variant");
	      return;
	    }
	  else
	    return std::__do_visit<_Tag>(
	      std::forward<_Visitor>(__visitor),
	      static_cast<_Vp>(__variants)...);
	}
      else
	return std::__do_visit<_Tag>(
	  std::forward<_Visitor>(__visitor),
	  __variant::__as(std::forward<_Variants>(__variants))...);
    }
      template<typename _Result_type, typename _Visitor, typename... _Variants>
    constexpr decltype(auto)
    __do_visit(_Visitor&& __visitor, _Variants&&... __variants)
    {
      // Get the silly case of visiting no variants out of the way first.
      if constexpr (sizeof...(_Variants) == 0)
	{
	  if constexpr (is_void_v<_Result_type>)
	    return (void) std::forward<_Visitor>(__visitor)();
	  else
	    return std::forward<_Visitor>(__visitor)();
	}
      else
	{
	  constexpr size_t __max = 11; // "These go to eleven."

	  // The type of the first variant in the pack.
	  using _V0 = typename _Nth_type<0, _Variants...>::type;
	  // The number of alternatives in that first variant.
	  constexpr auto __n = variant_size_v<remove_reference_t<_V0>>;

	  if constexpr (sizeof...(_Variants) > 1 || __n > __max)
	    {
	      // Use a jump table for the general case.
	      constexpr auto& __vtable = __detail::__variant::__gen_vtable<
		_Result_type, _Visitor&&, _Variants&&...>::_S_vtable;

	      auto __func_ptr = __vtable._M_access(__variants.index()...);
	      return (*__func_ptr)(std::forward<_Visitor>(__visitor),
				   std::forward<_Variants>(__variants)...);
	    }
	  else // We have a single variant with a small number of alternatives.
	    {
	      // A name for the first variant in the pack.
	      _V0& __v0
		= [](_V0& __v, ...) -> _V0& { return __v; }(__variants...);

	      using __detail::__variant::_Multi_array;
	      using __detail::__variant::__gen_vtable_impl;
	      using _Ma = _Multi_array<_Result_type (*)(_Visitor&&, _V0&&)>;

#ifdef _GLIBCXX_DEBUG
# define _GLIBCXX_VISIT_UNREACHABLE __builtin_trap
#else
# define _GLIBCXX_VISIT_UNREACHABLE __builtin_unreachable
#endif

#define _GLIBCXX_VISIT_CASE(N)						\
  case N:								\
  {									\
    if constexpr (N < __n)						\
      {									\
	return __gen_vtable_impl<_Ma, index_sequence<N>>::		\
	  __visit_invoke(std::forward<_Visitor>(__visitor),		\
			 std::forward<_V0>(__v0));		\
      }									\
    else _GLIBCXX_VISIT_UNREACHABLE();					\
  }

	      switch (__v0.index())
		{
		  _GLIBCXX_VISIT_CASE(0)
		  _GLIBCXX_VISIT_CASE(1)
		  _GLIBCXX_VISIT_CASE(2)
		  _GLIBCXX_VISIT_CASE(3)
		  _GLIBCXX_VISIT_CASE(4)
		  _GLIBCXX_VISIT_CASE(5)
		  _GLIBCXX_VISIT_CASE(6)
		  _GLIBCXX_VISIT_CASE(7)
		  _GLIBCXX_VISIT_CASE(8)
		  _GLIBCXX_VISIT_CASE(9)
		  _GLIBCXX_VISIT_CASE(10)
		case variant_npos:
		  using __detail::__variant::__variant_idx_cookie;
		  using __detail::__variant::__variant_cookie;
		  if constexpr (is_same_v<_Result_type, __variant_idx_cookie>
				|| is_same_v<_Result_type, __variant_cookie>)
		    {
		      using _Npos = index_sequence<variant_npos>;
		      return __gen_vtable_impl<_Ma, _Npos>::
			__visit_invoke(std::forward<_Visitor>(__visitor),
				       std::forward<_V0>(__v0));
		    }
		  else
		    _GLIBCXX_VISIT_UNREACHABLE();
		default:
		  _GLIBCXX_VISIT_UNREACHABLE();
		}
#undef _GLIBCXX_VISIT_CASE
#undef _GLIBCXX_VISIT_UNREACHABLE
	    }
	}
    }

看上去是不是很吓人，其实它的实现还是通过表的查找跳转来实现，看一下简化的代码：

template<typename Visitor, typename Variant>
auto visit_impl(Visitor&& vistor, Variant&& var) {
    static constexpr std::array<void(*)(Visitor&&, Variant&&), variant_size_v<Variant>> jump_table = {
        [](Visitor&& vistor, Variant&& var) { 
            std::invoke(std::forward<Visitor>(vistor), 
                       std::get<0>(std::forward<Variant>(var))); 
        },
        [](Visitor&& vistor, Variant&& var) { 
            std::invoke(std::forward<Visitor>(vistor), 
                       std::get<1>(std::forward<Variant>(var))); 
        },
        // ... gen lambda
    };
    

    return jump_table[var.index()](
        std::forward<Visitor>(vistor), 
        std::forward<Variant>(var)
    );
}

其实对于绝大多数的开发者来说，看看上面的代码即可，不必纠结深入其中。意义确实不大。
通过上面的分析，其实可以清楚的看到，std::visit由于采用了编译期的静态展开，基本可以实现内联调用，这就意味着直接函数调用将会显著的提高性能；同时，由于其采用了缓存友好的数据内存分布，也可以提高数据操作的效率。
当然，看上面的代码也明白，std::visit的缺点同样明显，不管是模板还是内联，都可能造成代码的膨胀，从而引发各种问题；另外一个就是造成了编译速度可能会下降。技术也是平衡的，这也是体现出来的一个例子。