ffplay源码分析__audio_decode_frame()

原创已于 2025-06-18 15:52:18 修改 · 945 阅读

14 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#ffmpeg #音视频 #ffplay

于 2025-05-02 17:45:08 首次发布

ffplay源码分析专栏收录该内容

13 篇文章

订阅专栏

前言

audio_decode_frame函数主要功能是从FrameQueue中取出解码后的音频数据，判断是否需要重采样。如果需要，把音频数据重采样后再给SDL使用。audio_decode_frame()可以分为三部分。第一部分从音频FrameQueue中读取解码后的音频数据，第二部分判断是否需要重采样，第三部分是重采样功能。

一、重采样函数

1、什么是重采样

所谓的重采样，就是改变⾳频的采样率、样本格式、声道数或声道布局等参数，使之按照我们期望的参数输出。需要注意的是，重采样后是不会影响音频流的播放时长。

2、为什么要重采样

原有的⾳频参数不满⾜我们的需求，⽐如在将⾳频进⾏SDL播放时候，因为SDL2.0不⽀持planar格式，也不⽀持浮点型的，⽽16年后FFMPEG会将⾳频解码为AV_SAMPLE_FMT_FLTP格式，因此就需要我们对其重采样，使之可以在SDL2.0上进⾏播放。

ffplay用到的重采样函数有swr_alloc_set_opts()，swr_init()，swr_convert()和swr_free()

3、swr_alloc_set_opts()

该函数的功能是分配SwrContext，并设置或重置重采样的参数，函数原型如下：

truct SwrContext *swr_alloc_set_opts(
    struct SwrContext *s, // ⾳频重采样上下⽂，为NULL时，内部会申请一块内存
	int64_t out_ch_layout, // 输出的layout, 如：5.1声道
	enum AVSampleFormat out_sample_fmt, // 输出的采样格式。Float, S16,⼀般选⽤是s16 绝⼤部分声卡⽀持
	int out_sample_rate, //输出采样率
	int64_t in_ch_layout, // 输⼊的layout
	enum AVSampleFormat in_sample_fmt, // 输⼊的采样格式
	int in_sample_rate, // 输⼊的采样率
	int log_offset, // ⽇志相关，不⽤管先，直接为0
	void *log_ctx // ⽇志相关，不⽤管先，直接为NULL
);

参数解释如下：

struct SwrContext *s，重采样上下文，该参数如果传 NULL，内部会申请内存，传非NULL可以复用之前的内存，不用申请。
int64_t out_ch_layout，目标声道布局
enum AVSampleFormat out_sample_fmt，目标采样格式
int out_sample_rate，目标采样率
int64_t in_ch_layout，原始声道布局
enum AVSampleFormat in_sample_fmt，原始采样格式
int in_sample_rate，原始采样率
int log_offset，⽇志相关，不⽤管先，直接为0。
void *log_ctx，⽇志相关，不⽤管先，直接为NULL。

调用swr_alloc_set_opts()后，需要调用swr_init()使设置的参数生效

4、swr_init()和swr_free()

swr_init()为初始化重采样上下文的函数，如果改变了重采样上下文的参数选项，必须调 swr_init() 才能生效，函数原型如下:

/**
 * Initialize context after user parameters have been set.
 * @note The context must be configured using the AVOption API.
 *
 * @see av_opt_set_int()
 * @see av_opt_set_dict()
 *
 * @param[in,out]   s Swr context to initialize
 * @return AVERROR error code in case of failure.
 */
int swr_init(struct SwrContext *s);

swr_free()为释放掉SwrContext结构体并将此结构体置为NULL

5、swr_convert()

通过重复调⽤swr_convert（）来完成重采样的转换

/** Convert audio.
 *
 * in and in_count can be set to 0 to flush the last few samples out at the
 * end.
 *
 * If more input is provided than output space, then the input will be buffered.
 * You can avoid this buffering by using swr_get_out_samples() to retrieve an
 * upper bound on the required number of output samples for the given number of
 * input samples. Conversion will run directly without copying whenever possible.
 *
 * @param s         allocated Swr context, with parameters set
 * @param out       output buffers, only the first one need be set in case of packed audio
 * @param out_count amount of space available for output in samples per channel
 * @param in        input buffers, only the first one need to be set in case of packed audio
 * @param in_count  number of input samples available in one channel
 *
 * @return number of samples output per channel, negative value on error
 */
int swr_convert(struct SwrContext *s, uint8_t **out, int out_count,
                                const uint8_t **in , int in_count);

参数解释如下：

struct SwrContext *s，重采样上下文
uint8_t **out，输出的内存地址。
int out_count，输出缓冲区可容纳的样本数，这个值通常建议设置得大一点，避免内存空间不够，不够空间写入，就会缓存在重采样实例里面，越积越多。
const uint8_t **in，输入的内存地址。
int in_count，输入的音频流，每声道有多少个样本。

swr_convert()函数的返回值是重采样后实际输出的样本数。

二、从音频FrameQueue中读取数据

do {
#if defined(_WIN32)
    while (frame_queue_nb_remaining(&is->sampq) == 0) {
        if ((av_gettime_relative() - audio_callback_time) > 1000000LL * is->audio_hw_buf_size / is->audio_tgt.bytes_per_sec / 2)
            return -1;
        av_usleep (1000);
    }
#endif
    if (!(af = frame_queue_peek_readable(&is->sampq)))
        return -1;
    frame_queue_next(&is->sampq);
} while (af->serial != is->audioq.serial);

这一部分的代码在do{}while()循环中，首先用frame_queue_nb_remaining()判断队列是否有数据。如果没有数据，休眠1000微妙(1毫秒)不断等待。等待时有个if语句判断如下：

if ((av_gettime_relative() - audio_callback_time) > 1000000LL * is->audio_hw_buf_size / is->audio_tgt.bytes_per_sec / 2)

av_gettime_relative()可以看作当前的系统时间戳，audio_callback_time是进入回调函数时的系统时间戳，is->audio_hw_buf_size是SDL缓冲区的字节大小，is->audio_tgt.bytes_per_sec是每秒播放的字节数。所以is->audio_hw_buf_size / is->audio_tgt.bytes_per_sec代表缓冲区的音频播放完需要的秒数。那么is->audio_hw_buf_size / is->audio_tgt.bytes_per_sec / 2代表播放一半缓冲区的时间，所以下面的代码表示：

if ((av_gettime_relative() - audio_callback_time) > 1000000LL * is->audio_hw_buf_size / is->audio_tgt.bytes_per_sec / 2)
            return -1;

若等待播放一半缓冲区的时间了，还拿不到数据，就不等了返回。

如果有数据，则用frame_queue_peek_readable()取出Frame数据。

frame_queue_next()的作用是更新队列的读索引和队列大小。初始化音频FrameQueue时，第四个参数为1代表保留最后一帧数据，所以frame_queue_next()第一次执行时只是f->rindex_shown变成1，再次执行frame_queue_next()后才会偏移读指针和队列大小。

static void frame_queue_next(FrameQueue *f)
{
    if (f->keep_last && !f->rindex_shown) {
        f->rindex_shown = 1;
        return;
    }
    frame_queue_unref_item(&f->queue[f->rindex]);
    if (++f->rindex == f->max_size)
        f->rindex = 0;
    SDL_LockMutex(f->mutex);
    f->size--;
    SDL_CondSignal(f->cond);
    SDL_UnlockMutex(f->mutex);
}

接下来在while()条件的地方有个判断语句如下

do {
......
} while (af->serial != is->audioq.serial);

这是判断读取的Frame的序列号和PacketQueue序列号是否一致，不一致时继续读取数据，直到序列号一致为止。序列号是用于跳转功能的，如果不进行跳转的话，此处序列号都是1。

序列号的赋值可以查看链接：

ffplay源码分析__解码函数decoder_decode_frame-CSDN博客

三、判断是否需要重采样

判断AVFrame的音频格式，声道布局以及采样率等是否和SDL要求的一致，不一致则需要重采样。

1、计算一帧音频的字节数

data_size = av_samples_get_buffer_size(NULL, af->frame->channels,
                                           af->frame->nb_samples,
                                           af->frame->format, 1);

用av_samples_get_buffer_size()计算一帧音频的字节数，af->frame->nb_samples代表该帧音频包含的样本数，所以data_size代表一帧音频占用的字节数

2、计算声道布局

dec_channel_layout =
        (af->frame->channel_layout && af->frame->channels == av_get_channel_layout_nb_channels(af->frame->channel_layout)) ?
        af->frame->channel_layout : av_get_default_channel_layout(af->frame->channels);
wanted_nb_samples = synchronize_audio(is, af->frame->nb_samples);

如果AVFrame中有声道布局信息，并且AVFrame声道布局和声道数一致，则用AVFrame的声道布局，否则用默认的声道布局。

wanted_nb_samples ：需要重采样的样本数。音视频同步方式为视频同步或外部时钟同步时，synchronize_audio()有效，这里只分析音频同步方式，所以wanted_nb_samples和AVFrame的样本数相等。

3、判断是否重采样

if (af->frame->format        != is->audio_src.fmt            ||
    dec_channel_layout       != is->audio_src.channel_layout ||
    af->frame->sample_rate   != is->audio_src.freq           ||
    (wanted_nb_samples       != af->frame->nb_samples && !is->swr_ctx)) 
    {...}

is->swr_ctx是重采样上下文，初始为NULL。

is->audio_src结构体来源于is->audio_tgt，is->audio_tgt是在audio_open()调用时赋值的，都是AudioParams结构体，保存了打开音频设备时的参数，所以这个if语句是判断AVFrame的参数和SDL音频设备的参数是否一致。比较的参数有音频存储格式，声道布局，采样率，样本个数，这四个参数中任何一项不一致都需要重采样。如果需要重采样，则初始化重采样上下文。

swr_free(&is->swr_ctx);
is->swr_ctx = swr_alloc_set_opts(NULL,
                                 is->audio_tgt.channel_layout, is->audio_tgt.fmt, is->audio_tgt.freq,
                                 dec_channel_layout,           af->frame->format, af->frame->sample_rate,
                                 0, NULL);
if (!is->swr_ctx || swr_init(is->swr_ctx) < 0) {
    av_log(NULL, AV_LOG_ERROR,
           "Cannot create sample rate converter for conversion of %d Hz %s %d channels to %d Hz %s %d channels!\n",
            af->frame->sample_rate, av_get_sample_fmt_name(af->frame->format), af->frame->channels,
            is->audio_tgt.freq, av_get_sample_fmt_name(is->audio_tgt.fmt), is->audio_tgt.channels);
    swr_free(&is->swr_ctx);
    return -1;
}

设置重采样参数并初始化 is->swr_ctx后，接下来对音频数据进行重采样。

四、重采样

1、计算重采样参数

const uint8_t **in = (const uint8_t **)af->frame->extended_data;
uint8_t **out = &is->audio_buf1;

const uint8_t **in：AVFrame的extended_data存储所有通道的音频数据，作为输入源。

uint8_t **out：is->audio_buf1作为输出缓冲区。

int out_count = (int64_t)wanted_nb_samples * is->audio_tgt.freq / af->frame->sample_rate + 256;
int out_size  = av_samples_get_buffer_size(NULL, is->audio_tgt.channels, out_count, is->audio_tgt.fmt, 0);
av_fast_malloc(&is->audio_buf1, &is->audio_buf1_size, out_size);

int out_count：计算转换后的样本数。比如源采样率为48000，目标采样率为44100，所以1024个样本转换后为1024*44100/48100=938.84个样本。有时swr_convert()重采样后会多于938个样本，所以这个值通常建议设置得大一点，避免内存空间不够，缓存在重采样实例里面，越积越多。所以ffplay在后面加上256。

out_size：根据目标参数，计算out_count个样本需要的缓冲区大小，用av_fast_malloc()分配重采样后的缓冲区内存

2、重采样

len2 = swr_convert(is->swr_ctx, out, out_count, in, af->frame->nb_samples);
if (len2 < 0) {
    av_log(NULL, AV_LOG_ERROR, "swr_convert() failed\n");
    return -1;
}
if (len2 == out_count) {
    av_log(NULL, AV_LOG_WARNING, "audio buffer is probably too small\n");
    if (swr_init(is->swr_ctx) < 0)
        swr_free(&is->swr_ctx);
}

每次重采样都要用swr_convert()转换，函数返回值是转换后的样本数。

if (len2 == out_count) {
    av_log(NULL, AV_LOG_WARNING, "audio buffer is probably too small\n");
    if (swr_init(is->swr_ctx) < 0)
        swr_free(&is->swr_ctx);
}

len2为重采样后的样本数，out_count是输出缓冲区可容纳的样本数，如果两者相等，说明重采样的输出缓冲区分配小了，需要重新初始化重采样上下文，不过这种情况一般不会发生。

is->audio_buf = is->audio_buf1;
resampled_data_size = len2 * is->audio_tgt.channels * av_get_bytes_per_sample(is->audio_tgt.fmt);

audio_buf和audio_buf1指向同一内存，resampled_data_size为重采样后占用的字节数，并作为返回值返回。

如果不需要重采样，直接返回AVFrame的数据缓冲区和字节数。af->frame->data[0]和af->frame->extended_data指向同一个内存地址，所以这里用了af->frame->data[0]，data_size是前面计算一帧音频数据占用的字节数。

else { //如果不需要重采样
        is->audio_buf = af->frame->data[0];
        resampled_data_size = data_size;
    }

3、计算音频时间戳

if (!isnan(af->pts))
        is->audio_clock = af->pts + (double) af->frame->nb_samples / af->frame->sample_rate;
    else
        is->audio_clock = NAN;
is->audio_clock_serial = af->serial;
return resampled_data_size;

af->pts为该帧的播放时刻（单位为秒），(double) af->frame->nb_samples / af->frame->sample_rate为该帧播放完需要的时间，所以is->audio_clock表示该帧播放完后的时刻。

is->audio_clock_serial：音频时钟序列号，把Frame的序列号赋值给audio_clock_serial

resampled_data_size：返回重采样后的字节数。