模拟大模型训练时，单双精度输出不一致？从而加剧幻觉？或导致幻觉？

原创已于 2024-11-01 12:48:59 修改 · 1.7k 阅读

15 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#大模型 #幻觉 #自证错误 #词向量 #GPU

于 2024-11-01 10:32:06 首次发布

下面是 Python 代码。就同样的随机数据，分别在单精度、双精度下做模拟训练与预测，最后比较它们预测的值，发现不一致。

大家看看，代码是否有问题？

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers.legacy import SGD
import random
import os

# 固定所有随机种子
np.random.seed(42)
tf.random.set_seed(42)
random.seed(42)

# 禁用 GPU 的非确定性行为
tf.config.experimental.enable_op_determinism()

# 确保TensorFlow使用单线程
tf.config.threading.set_inter_op_parallelism_threads(1)
tf.config.threading.set_intra_op_parallelism_threads(1)

# 确保NumPy使用单线程
os.environ['OMP_NUM_THREADS'] = '1'

# 确保TensorFlow使用CPU进行计算
os.environ['CUDA_VISIBLE_DEVICES'] = ''

# 生成模拟数据
def generate_data(num_samples, sequence_length):
int_x = np.random.randint(0, 10, size=(num_samples, sequence_length))
x = int_x.astype(np.float64) # 生成双精度数据
y = np.zeros((num_samples, sequence_length), dtype=np.float64) # 生成双精度数据
for i in range(num_samples):
for j in range(sequence_length):
if int_x[i][j] == 1:
y[i][j] = 1
if j + 1 < sequence_length:
y[i][j + 1] = 1
return x, y

# 超参数设置
num_samples = 10000
sequence_length = 10
batch_size = 32
epochs = 10
learning_rate = 0.001

# 构建模型
def build_model(dtype):
tf.keras.backend.clear_session()
tf.random.set_seed(42)
model = Sequential()
model.add(LSTM(128, input_shape=(sequence_length, 1), return_sequences=True, dtype=dtype,
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=42),
recurrent_initializer=tf.keras.initializers.Orthogonal(seed=42),
bias_initializer=tf.keras.initializers.Zeros()))
model.add(Dense(1, activation='sigmoid', dtype=dtype,
kernel_initializer=tf.keras.initializers.GlorotUniform(seed=42),
bias_initializer=tf.keras.initializers.Zeros()))
return model

# 使用确定性的优化器
optimizer = SGD(learning_rate=learning_rate, momentum=0.0, nesterov=False)

# 准备数据
x, y = generate_data(num_samples, sequence_length)
x_float64 = x.reshape(num_samples, sequence_length, 1)
y_float64 = y.reshape(num_samples, sequence_length, 1)

# 双精度训练
model_float64 = build_model(tf.float64)
model_float64.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history_float64 = model_float64.fit(x_float64, y_float64, batch_size=batch_size, epochs=epochs, verbose=1)
predictions_float64 = model_float64.predict(x_float64)

# 单精度训练
x_float32 = x_float64.astype(np.float32)
y_float32 = y_float64.astype(np.float32)
model_float32 = build_model(tf.float32)
model_float32.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
history_float32 = model_float32.fit(x_float32, y_float32, batch_size=batch_size, epochs=epochs, verbose=1)
predictions_float32 = model_float32.predict(x_float32)

# 比较预测结果
print("First few elements of third column in double precision predictions:")
print(predictions_float64.flatten()[:5])
print("First few elements of third column in single precision predictions:")
print(predictions_float32.flatten()[:5])

# 检查是否完全相同
if np.allclose(predictions_float64, predictions_float32, atol=1e-7):
print("Predictions are consistent between double and single precision.")
else:
print("Predictions are not consistent between double and single precision.")

运行后输出：

点评：这段代码有问题吗？若没问题，那么最后结果误差比较大：单精度与双精度的结果的有效数字中，只有1位或2位相同数字。这意味着什么呢？错误的词向量。

假设大模型给出的正确答案是100个字（即大模型要一个接一个地吐出100个字），若第20个字的计算有一点点误差，那么第20个字就错了，第21个就更错了，第22个就更更错了，...，所以后面的80个字就全错了（错的一塌糊涂，即一本正经的胡说八道）。本来是笔直的一条路，可是在第20个字那儿，稍微拐了一下，后果就是越到后面，偏的越厉害。