C++高性能编码-CSDN博客

#include <stdio.h>
#include "time.h"

#define N 30000
#define M 1000

typedef struct
{
	int a[N];
}Node;

#define OUTCLOCK \
    printf("%d ",clock()-theClock); \
    theClock=clock();

int main()
{
	clock_t theClock = clock();
	Node *p = (Node *)malloc(sizeof(Node)*M);
OUTCLOCK
	for (int j = 0; j < N; j++)for (int i = 0; i < M; i++)p[i].a[j] = i * j + 1;
OUTCLOCK
	for (int i = 0; i < M; i++)for (int j = 0; j < N; j++)p[i].a[j] = i * j + 1;
OUTCLOCK
	return 0;
}

debug模式运行，输出：17 81 68

release模式运行，输出：0 45 51

分析：

一般来说，for (int i = 0; i < M; i++)for (int j = 0; j < N; j++)这种遍历更快，因为符号cache友好。debug模式是符合结论的。

但是，release模式下，for (int j = 0; j < N; j++)for (int i = 0; i < M; i++)这种写法触发了编译器优化，反而优化后的性能比for (int i = 0; i < M; i++)for (int j = 0; j < N; j++)这种遍历更快。

2，大批量内存拷贝

大批量内存拷贝，用memcpy代替赋值语句

int main()
{
    clock_t theClock=clock();
    Node *p=(Node *)malloc(sizeof(Node)*M);
    int *p2=(int *)malloc(sizeof(int)*N*M);
    OUTCLOCK
    for(int i=0;i<M;i++)for(int j=0;j<N;j++)p2[i*N+j]=p[i].a[j];
    OUTCLOCK
    memcpy(p2,p, sizeof(int)*N*M);
    OUTCLOCK
    return 0;
}

运行结果：

0 2811 276

三，内存

1，内存模型

参考存储区

2，new和malloc的区别

（1）malloc 配合 ‌realloc可直观扩充内存；new 无直接对应机制，需重新分配并拷贝。
（2）operator new/delete 可以被重载，自定义内存分配逻辑；malloc/free不可重载

四，多线程并发

1，伪共享

伪共享是指：多个线程分别修改彼此独立但位于同一缓存行（Cache Line）中的不同变量‌，导致缓存行频繁失效，从而引发不必要的主存访问，降低程序性能。

2，协程（待更新）

C++ 20的协程是一个特殊函数。只是这个函数具有挂起和恢复的能力，可以被挂起，而后可以继续恢复其执行。

3，锁、无锁编程

参考下文

4，条件变量

条件变量允许一个或多个线程在特定条件发生时被唤醒。

有2种条件变量：std::condition_variable、std::condition_variable_any

std::condition_variable 只能和std::mutex搭配使用，而std::condition_variable_any可以和任何锁搭配使用，用法更灵活，但是性能不如std::condition_variable。

用法参考下文“线程安全队列”

需要注意的坑：虚假唤醒、唤醒丢失

这部分内容参考知乎网友文章

5，虚假唤醒、唤醒丢失

（1）虚假唤醒

虚假唤醒的定义：当你对线程进行唤醒时，你不希望被唤醒的线程也被唤醒的现象。

虚假唤醒的产生原因有2类：内核层面（操作系统导致）、应用层（用户代码导致）

内核层面导致的虚假唤醒的意思是，当你调用notify_one/signal_one等方法时，操作系统并不保证只唤醒一个线程，之所以这样，是出于性能考虑的设计。

应用层不正确的代码实现同样会引起虚假唤醒问题，比如生产者只产生了一个元素，却用notify_all唤醒了多个消费者

（2）唤醒丢失

唤醒丢失简而言之就是我曾经唤醒了你，但是没有收到。

唤醒丢失可能造成死锁。

造成死锁的代码示例：

#include <condition_variable>
#include <iostream>
#include <thread>
using namespace std::chrono_literals;
std::condition_variable cv;
std::mutex mtx;

void Producer()
{
     std::cout << "Ready Send notification." << std::endl;
     cv.notify_one();   // 发送通知
 }

void Consumer()
{
    std::this_thread::sleep_for(2000ms);
     std::cout << "Wait for notification." << std::endl;
     std::unique_lock<std::mutex> lck(mtx);
     cv.wait(lck);    // 等待通知并唤醒继续执行下面的指令
     std::cout << "Process." << std::endl;
}

 int main()
 {
     std::thread producer(Producer);
     std::thread consumer(Consumer);
     producer.join();
     consumer.join();
     return 0;
}

6，线程接口

参考下文

7，线程安全队列

只使用锁：

#include <queue>
#include <mutex>
#include <thread>

template<typename T>
class ThreadSafeQueue {
private:
    std::queue<T> queue;
    mutable std::mutex mutex;

public:
    ThreadSafeQueue() {}

    void push(const T& value) {
        std::lock_guard<std::mutex> lock(mutex);
        queue.push(value);
    }

    void pop(T& value) {
        std::lock_guard<std::mutex> lock(mutex);
        if (!queue.empty()) {
            value = std::move(queue.front());
            queue.pop();
        }
    }

    bool empty() const {
        std::lock_guard<std::mutex> lock(mutex);
        return queue.empty();
    }
};

使用锁和条件变量，把queue封装成线程安全队列：

简洁版：

template<typename T>
class ThreadSafeQueue {
    std::queue<T> data_queue;
    mutable std::mutex mtx;
    std::condition_variable cv;
public:
    void push(T value) {
        std::lock_guard<std::mutex> lock(mtx);
        data_queue.push(std::move(value));
        cv.notify_one();
    }
    bool try_pop(T& value) {
        std::lock_guard<std::mutex> lock(mtx);
        if (data_queue.empty()) return false;
        value = std::move(data_queue.front());
        data_queue.pop();
        return true;
    }
    void wait_and_pop(T& value) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, [this] { return !data_queue.empty(); });
        value = std::move(data_queue.front());
        data_queue.pop();
    }
};

这里的cv.wait(lock,func)，其中func是一个返回bool的lambda表达式，含义就是在lock之后判断func是否成立，如果不成立则unlock然后线程挂起，直到被唤醒，唤醒之后重新lock

完整版：

#include <queue>
#include <mutex>
#include <condition_variable>
template <typename T>
class ThreadSafeQueue {
private:
   std::queue<T> queue_;
   mutable std::mutex mutex_;
   std::condition_variable cond_var_;
public:
   // 添加元素到队列
   void push(const T& value) {
       std::lock_guard<std::mutex> lock(mutex_);
       queue_.push(value);
       cond_var_.notify_one(); // 通知等待的线程
   }
   // 阻塞式弹出元素
   T wait_and_pop() {
       std::unique_lock<std::mutex> lock(mutex_);
       cond_var_.wait(lock, [this] { return !queue_.empty(); });
       T value = queue_.front();
       queue_.pop();
       return value;
   }
   // 尝试弹出元素（非阻塞）
   bool try_pop(T& value) {
       std::lock_guard<std::mutex> lock(mutex_);
       if (queue_.empty()) {
           return false;
       }
       value = queue_.front();
       queue_.pop();
       return true;
   }
   // 检查队列是否为空
   bool empty() const {
       std::lock_guard<std::mutex> lock(mutex_);
       return queue_.empty();
   }
   // 获取队列大小
   size_t size() const {
       std::lock_guard<std::mutex> lock(mutex_);
       return queue_.size();
   }
};

五，锁

1，锁变量（待更新）

（1）互斥锁（std::mutex）

提供3个接口：

lock() 阻塞等待加锁（如果当前互斥量被其他线程锁住，则当前的调用线程被阻塞住）

unlock() 解锁

try_lock() 非阻塞尝试加锁

（2）递归互斥锁（std::recursive_mutex）

（3）定时互斥锁（std::time_mutex）

（4）递归定时互斥锁（std::recursive_timed_mutex）

（5）读写锁（shared_mutex）

（6）自旋锁

自旋锁的核心思想是忙等待：线程在尝试获取锁时，如果锁被占用，会反复检查锁状态（自旋），而不是让出CPU。这适用于锁持有时间短、线程切换开销大的场景。

2，lock_guard

使用RAII方式，只支持自动加锁、解锁，不支持手动解锁。

3，unique_lock

自动加锁、解锁，比lock_guard更灵活，支持手动解锁、重新加锁。

六，线程接口（待更新）

线程管理有2套接口：pthread、std::thread

其中，pthread的使用更灵活，可以更精准的控制，但是只能在linux上使用，而std::thread是跨平台的。

1，常见接口

（1）pthread

创建线程：pthread_create

退出本线程：pthread_exit

取消其他线程：pthread_cancel

互斥锁：pthread_mutex_t

条件变量：pthread_cond_t

加锁解锁：pthread_mutex_lock、pthread_mutex_unlock

线程回收与分离：pthread_join、pthread_detach

让出执行权：pthread_yield

唤醒：pthread_cond_signal、pthread_cond_broadcast

（2）std::thread

创建线程：std::thread t(myFunction) （创建对象并自动join）

退出本线程：return

取消其他线程：不涉及

互斥锁：std::mutex

条件变量：std::condition_variable

加锁解锁：lock_guard

线程回收与分离：join、detach

让出执行权：yield

唤醒：notify_one、notify_all

2，yield

yield()是线程主动让出CPU执行权的机制，但它并不保证线程立即暂停。它只是向线程调度器发出“我愿意放弃当前执行机会”的请求。

该方法适用于平衡多线程资源竞争，但不能用于精确控制执行顺序。

七，无锁编程（lock-free）

1，原子操作（std::atomic）

使用原子操作代替锁，可以避免线程阻塞和上下文切换，从而大幅度提高性能。

2，内存序

从弱到强的6个内存序：

（1）memory_order_relaxed

只保证原子操作本身的原子性，不提供任何跨线程的同步或顺序保证。

（2）memory_order_consume

支持依赖顺序。不推荐使用，直接替换成memory_order_acquire去用。

（3）memory_order_acquire

用于读操作（如 load）。保证本线程中所有后续读/写操作（在代码顺序中位于此 load 之后）不会被重排到该 load 之前。

（4）memory_order_release

用于写操作（如 store）。保证本线程中所有之前的读/写操作（在代码顺序中位于此 store 之前）不会被重排到该 store 之后。

（5）memory_order_acq_rel

同时具有 acquire 和 release 语义

（6）memory_order_seq_cst

最强内存序。

3，内存屏障（std::atomic_thread_fence）

不直接操作数据，而是通过指定内存顺序参数来限制编译器和处理器对内存访问的重排序。‌‌

4，生产者-消费者模型

这个模型很直观的展示了内存屏障的读写操作是什么含义。

#include <atomic>
#include <thread>
#include <iostream>

std::atomic<bool> ready{false};
int data = 0;

// 生产者线程
void writer() {
    data = 42;                    // 非原子写入
    ready.store(true, std::memory_order_release); // release: 确保 data=42 不被重排到此之后
}

// 消费者线程
void reader() {
    while (!ready.load(std::memory_order_acquire)) { // acquire: 确保后续读取不会被重排到此之前
        // 等待
    }
    std::cout << "Data: " << data << std::endl; // 安全读取， guaranteed to see 42
}

int main() {
    std::thread t1(writer);
    std::thread t2(reader);
    t1.join();
    t2.join();
    return 0;
}

5，CAS操作

CAS即Compare And Swap，比较并交换

（1）接口

有2种接口：compare_exchange_weak、compare_exchange_strong

（2）语义

x.compare_exchange_weak(a, b)的语义：

bool weakFunc(x,a,b)
{
	if (x == a) {
		if (伪失败) {
			return false;
		}
		x = b;
		return true;
	}
	return false;
}

x.compare_exchange_strong(a, b)的语义：

bool strongFunc(x,a,b)
{
	if (x == a) {
		x = b;
		return true;
	}
	return false;
}

区别就是compare_exchange_weak会概率性的出现伪失败

compare_exchange_strong的功能更强，但是性能差一点

（3）伪失败

伪失败只有compare_exchange_weak涉及，就是本该成功，但是由于硬件的某种限制导致失败。

使用过程中如何避免伪失败的影响呢？

方案就是重试，直到伪失败不出现。

（4）使用场景

一般compare_exchange_weak用于循环结构中，compare_exchange_strong用于单次场景。

（5）内存序

以compare_exchange_weak为例，有2个重载版本：

bool compare_exchange_weak(T& expected, T val, memory_order sync = memory_order_seq_cst) noexcept;

bool compare_exchange_weak(T& expected, T val, memory_order success, memory_order failure) noexcept;

传1个或者2个内存序参数，属于单内存序、双内存序用法。不传内存序就是使用默认的memory_order_seq_cst，也属于单内存序。

八，零拷贝（待更新）

参考这里