ffmpeg实时传输视频_使用ffmpeg和DirectX 11流式传输视频

最新推荐文章于 2026-04-03 09:55:28 发布

翻译最新推荐文章于 2026-04-03 09:55:28 发布 · 3.4k 阅读

2 ·

本内容遵循CC 4.0 BY-SA版权协议

原文链接：https://medium.com/@nehoraigold/streaming-video-with-ffmpeg-and-directx-11-7395fcb372c4

标签

#ffmpeg #人工智能 #webrtc #rtmp #leetcode

本文介绍了如何结合ffmpeg和DirectX 11技术实现视频的实时传输，提供了相关资源链接供参考。

ffmpeg实时传输视频

A few months ago at work, I was tasked with developing a custom, low-latency video player. Prior to this, I had worked only briefly with FFmpeg and not at all with DirectX 11. But I figured it shouldn’t be that hard. FFmpeg is pretty popular, DirectX 11 has been around for a while now, and it’s not like I needed to create intense 3D graphics or anything (yet).

几个月前，在工作中，我的任务是开发定制的低延迟视频播放器。在此之前，我工作过短暂只通过FFmpeg而不是在所有与DirectX 11，但我想它不应该是很难。 FFmpeg非常受欢迎，DirectX 11已经存在了一段时间，这还不像我需要创建密集的3D图形或其他任何东西(尚未)。

Surely there would be tons of examples on how to do something basic like decode and render video, right?

当然会有大量的示例说明如何做一些基本的事情，例如解码和渲染视频，对吗？

Nope. Hence this article.

不。因此，本文。

So that the next poor soul who needs to do this without experience in FFmpeg or DirectX 11 won’t have to bash their head into a wall just to spit out some video onto a screen.

这样，下一个不需要FFmpeg或DirectX 11经验就不需要这样做的可怜的灵魂，不必为了将一些视频吐到屏幕上而将头撞墙。

Okay. Just a few most housekeeping things before we get to the juicy stuff.

好的。在我们获得多汁的东西之前，只需做一些最基本的家务事。

The code samples provided are very simplified. I’ve left out return code checking, error handling, and, well, a bunch of stuff. My point is that the code samples are just that: samples. (I would have provided more fleshed-out examples, but you know. Intellectual property and all that.)
提供的代码示例非常简化。我省去了返回码检查，错误处理以及很多东西。我的观点是，代码样本就是：样本。 (我会提供更多充实的示例，但您知道。知识产权以及所有这些。)
I won’t cover the principles of hardware-accelerated video decoding/rendering because it’s a little outside of the scope of this article. Besides, there are plenty of other resources that explain it far better than I could.
我将不介绍硬件加速视频解码/渲染的原理，因为这超出了本文的范围。此外，还有很多其他资源可以比我更好地解释它。
FFmpeg supports pretty much all protocols and encoding formats. Both RTSP and UDP worked with these samples, as well as video encoded in H264 and H265. I’m sure tons of others will work, too.
FFmpeg支持几乎所有协议和编码格式。 RTSP和UDP都可以使用这些样本，以及使用H264和H265编码的视频。我敢肯定，其他很多人也会工作。
The project I created was CMake-based and doesn’t rely on Visual Studio’s build system (since we need to support non-DX renderers as well). It made things a tad more difficult, which is why I thought I’d mention it.
我创建的项目基于CMake，并且不依赖Visual Studio的构建系统(因为我们也需要支持非DX渲染器)。这使事情变得有点困难，这就是为什么我认为我会提到它。

Without further ado, let’s get started!

事不宜迟，让我们开始吧！

步骤＃1：设置流源和视频解码器。 (Step #1: Set up the stream source and video decoder.)

This is pretty much exclusively FFmpeg stuff. Just a matter of setting up the format context, codec context, and all the other structs that FFmpeg needs you to. For the setup, I relied pretty heavily on this example and the source code from another project called Moonlight.

这几乎完全是FFmpeg的东西。只需设置格式上下文，编解码器上下文以及FFmpeg需要的所有其他结构即可。对于设置，我非常依赖此示例以及另一个名为Moonlight的项目的源代码。

Note that you have to provide the hardware device type in some way to the AVCodecContext. I opted to do this the same way the FFmpeg example does: a basic string.

请注意，您必须以某种方式向AVCodecContext提供硬件设备类型。我选择这样做与FFmpeg示例相同：基本字符串。

// initialize streamconst std::string hw_device_name = "d3d11va";
AVHWDeviceType device_type = av_hwdevice_find_type_by_name(hw_device_name.c_str());// set up codec contextAVBufferRef* hw_device_ctx;
av_hwdevice_ctx_create(&hw_device_ctx, device_type, nullptr, nullptr, 0);
codec_ctx->hw_device_ctx = av_buffer_ref(hw_device_ctx);// open stream

Once the setup is done, the actual decoding is pretty straightforward; It’s just a matter of retrieving AVPackets from the stream source, and decoding them into AVFrames with the codec.

一旦设置完成，实际的解码就非常简单。只需从流源中获取AVPackets，然后使用编解码器将它们解码为AVFrame。

AVPacket* packet = av_packet_alloc();
av_read_frame(format_ctx, packet);
avcodec_send_packet(codec_ctx, packet);AVFrame* frame = av_frame_alloc();
avcodec_receive_frame(codec_ctx, frame);

These are simplifications, but still, it didn’t take much to cobble something together. While I couldn’t render anything to the screen yet, I wanted to verify that I was producing valid decoded frames, so I thought I’d just write them to a bitmap file and check them that way.

这些只是简化，但仍然不需要花费很多时间将一些东西拼凑在一起。尽管我还无法在屏幕上呈现任何内容，但我想验证自己是否在生成有效的解码帧，所以我想我只是将它们写入位图文件并进行检查。

There was one slight problem.

有一个小问题。

步骤2：将NV12转换为RGBA。 (Step #2: Converting NV12 to RGBA.)

To create a bitmap (and, as it turns out, to render to a DX11 swapchain), I needed the frames to be in RGBA format. The decoder, however, was spitting out frames in NV12 format, so I used FFmpeg’s swscale to convert from AV_PIX_FMT_NV12 to AV_PIX_FMT_RGBA.

为了创建位图(事实证明，渲染为DX11交换链)，我需要帧为RGBA格式。但是，解码器以NV12格式吐出帧，因此我使用FFmpeg的swscale将AV_PIX_FMT_NV12转换为AV_PIX_FMT_RGBA 。

Setting up the SwsContext can be as easy as a single function call.

设置SwsContext可以像单个函数调用一样容易。

SwsContext* conversion_ctx = sws_getContext(
        SRC_WIDTH, SRC_HEIGHT, AV_PIX_FMT_NV12,
        DST_WIDTH, DST_HEIGHT, AV_PIX_FMT_RGBA,
        SWS_BICUBLIN | SWS_BITEXACT, nullptr, nullptr, nullptr);

Of course, in order to use sws_scale(), we need to transfer the frame from the GPU to the CPU. I did this with FFmpeg’s built-in av_hwframe_transfer_data(). There are loads of examples of this.

当然，为了使用sws_scale() ，我们需要将帧从GPU传输到CPU。我使用FFmpeg的内置av_hwframe_transfer_data()做到了这一点。有很多这样的例子。

// decode frameAVFrame* sw_frame = av_frame_alloc();
av_hwframe_transfer_data(sw_frame, frame, 0);sws_scale(conversion_ctx, sw_frame->data, sw_frame->linesize, 
          0, sw_frame->height, dst_data, dst_linesize);sw_frame->data = dst_data
sw_frame->linesize = dst_linesize
sw_frame->pix_fmt = AV_PIX_FMT_RGBA
sw_frame->width = DST_WIDTH
sw_frame->height = DST_HEIGHT

This worked fine for the time being, but there were two main issues with this as a long-term solution.

暂时这样做还不错，但是作为长期解决方案有两个主要问题。

At this point, what I wanted from the AVFrame was a straightforward, no-nonsense byte array; using “d3d11va” as the hardware device name gives us something other than a simple byte array. So instead, I changed the hardware device name to “dxva2”. That way, the frame->data is just a bitmap in uint8_t* form. It works for now, but as a long-term solution, not using “d3d11va” basically misses the point.
在这一点上，我希望从AVFrame获得的是一个简单易懂的字节数组。使用“d3d11va”作为硬件设备名称，除了简单的字节数组之外，还为我们提供了其他功能。因此，我改为将硬件设备名称更改为“dxva2” 。这样， frame->data只是uint8_t*形式的位图。它现在可以使用，但是作为长期解决方案，不使用“d3d11va”基本上是不“d3d11va”的。
In order to call sws_scale() and convert the frame to RGBA, we need to move the frame from the GPU to the CPU. Again, fine for now, but definitely something we want to remove in the future.
为了调用sws_scale()并将帧转换为RGBA，我们需要将帧从GPU移到CPU。同样，目前还可以，但绝对是我们将来希望删除的内容。

So not perfect by any means, but at least we now have decoded frames that we can throw onto a bitmap and see with our own eyes.

因此无论如何都不是完美的，但是至少我们现在已经解码了帧，可以将它们放到位图上并亲眼看到。

That’s it for the FFmpeg portion (for now). On to rendering in DirectX 11.

FFmpeg部分就是这样(现在)。在DirectX 11中进行渲染。

步骤＃3：设置DirectX 11呈现。 (Step #3: Setting up DirectX 11 rendering.)

In case you don’t already know, here’s your warning: DX11 is nothing like DX9. Nothing. At. All.

如果您还不知道，请注意以下警告：DX11与DX9完全不同。没有。在。所有。

After many failed attempts to display anything other than a green or black screen, I copied and pasted this example just so I could start out with working code. After that, the disproportionately complicated task of turning the triangle into a square. (I went for the four-vertices-six-indices option.)

在尝试进行多次尝试以显示绿色或黑色屏幕以外的内容之后，我复制并粘贴了此示例，以便从工作代码开始。在那之后，将三角形变成正方形的任务变得异常复杂。 (我选择了4个顶点，6个索引的选项。)

Additionally, rather than compile the shaders at runtime, I opted to compile them during, well, compile time. For a second, I thought I’d have to include a third party library to do this, but all it required was a couple of lines in the CMakeLists.txt file. Find the fxc.exe executable, and execute the command with the appropriate options to compile your shaders. (I used /Fh to compile them into autogenerated headers.)

另外，我没有在运行时编译着色器，而是选择在编译时进行编译。有一秒钟，我认为我必须包括一个第三方库来执行此操作，但是所需要做的只是在CMakeLists.txt文件中几行。查找fxc.exe可执行文件，并使用适当的选项执行命令以编译着色器。 (我使用/Fh将它们编译为自动生成的标头。)

步骤＃4：交换颜色以获得纹理。 (Step #4: Swapping color for texture.)

Once I got a rainbow square working, it was just a matter of switching COLOR for TEXCOORD in the defined input layout. Obviously, this meant changing a few things:

一旦完成彩虹方块的工作，只需在定义的输入布局中为TEXCOORD切换COLOR TEXCOORD 。显然，这意味着需要更改一些内容：

The vertex struct now has an XMFLOAT2 (x, y) for the texture coordinate instead of XMFLOAT4 (r, g, b, a) for color.
顶点结构现在具有XMFLOAT2 ( x，y )作为纹理坐标，而不是XMFLOAT4 ( r ， g ， b ， a )作为颜色坐标。
The pixel shader needs to sample the color from the texture rather than just using the provided color. This means needing a sampler.
像素着色器需要从纹理中采样颜色，而不仅仅是使用提供的颜色。这意味着需要一个采样器。
Also, keep in mind that texture coordinates and position coordinates are different. I didn’t know this initially, and it caused me a ton of needless grief.
另外，请记住，纹理坐标和位置坐标是不同的。最初我并不知道，这给我带来了很多不必要的痛苦。

Once I was able to render a basic, static JPEG image, I knew I was getting close. All that remained was transferring the actual bitmap from the frame to the shared texture.

一旦能够渲染基本的静态JPEG图像，我就知道自己已经接近了。剩下的就是将实际的位图从帧传输到共享纹理。

步骤5：渲染实际帧。 (Step #5: Rendering actual frames.)

Since our frames are still straightforward byte arrays in RGBA format, and our ID3D11Texture2D was in DXGI_FORMAT_R8G8B8A8_UNORM format, a simple memcpy did the trick. The array length we need to copy is just a calculation of bytes in our frame: width_in_pixels * height_in_pixels * bytes_per_pixel.

由于我们的帧仍然是RGBA格式的简单字节数组，而我们的ID3D11Texture2D则是DXGI_FORMAT_R8G8B8A8_UNORM格式，因此简单的memcpy DXGI_FORMAT_R8G8B8A8_UNORM 。我们需要复制的数组长度只是帧中字节的计算： width_in_pixels * height_in_pixels * bytes_per_pixel 。

Note that we also need to call the device context’s Map() to get a pointer that allows us to access the texture’s underlying data.

注意，我们还需要调用设备上下文的Map()来获取一个指针，该指针使我们能够访问纹理的基础数据。

// decode and convert framestatic constexpr int BYTES_IN_RGBA_PIXEL = 4;D3D11_MAPPED_SUBRESOURCE ms;
device_context->Map(m_texture.Get(), 0, D3D11_MAP_WRITE_DISCARD, 0, &ms);memcpy(ms.pData, frame->data[0], frame->width * frame->height * BYTES_IN_RGBA_PIXEL);device_context->Unmap(m_texture.Get(), 0);// clear the render target view, draw the indices, present the swapchain

Getting to this point and seeing live video on the screen was practically euphoric. Seriously, I raised my hands in the air and praised the coding gods for having guided me thus far.

达到这一点并在屏幕上观看实况视频几乎是令人欣喜的。认真地，我举起双手，并赞扬编码神到目前为止已经引导了我。

But alas. My work was far from over. Now, it was time to go back and fix the two issues I caused back in Step #2.

可惜。我的工作还远远没有结束。现在，该回头解决我在步骤2中引起的两个问题了。

步骤＃6：渲染实际的帧……但是，这次像是正确的。 (Step #6: Rendering actual frames… but, like, properly this time.)

I knew from the beginning of my research that providing FFmpeg with the hardware device name “d3d11va” should output the AVFrame in such a way that the DirectX 11 renderer can easily digest it. But how could I make this happen?

从研究开始，我就知道向FFmpeg提供硬件设备名称“d3d11va”应该以一种DirectX 11渲染器可以轻松消化的方式输出AVFrame 。但是我该怎么做呢？

We need properly initialize the d3d11va hardware device context. Basically, the FFmpeg decoder needs to know about the D3D11 device it’s working with.

我们需要正确地初始化d3d11va硬件设备上下文。 基本上，FFmpeg解码器需要了解其正在使用的D3D11设备。

AVBufferRef* hw_device_ctx = av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_D3D11VA);AVHWDeviceContext* device_ctx = reinterpret_cast<AVHWDeviceContext*>(hw_device_ctx->data);AVD3D11VADeviceContext* d3d11va_device_ctx = reinterpret_cast<AVD3D11VADeviceContext*>(device_ctx->hwctx);// m_device is our ComPtr<ID3D11Device>
d3d11va_device_ctx->device = m_device.Get();// codec_ctx is a pointer to our FFmpeg AVCodecContextcodec_ctx->hw_device_ctx = av_buffer_ref(hw_device_ctx);av_hwdevice_ctx_init(codec_ctx->hw_device_ctx);

It looks like a lot of setup, but ultimately, all we’re doing here is stashing a pointer to our renderer’s ID3D11Device in the decoder’s AVCodecContext. This is what allows the decoder to output frames as DX11 textures.

它看起来像很多设置，但是最终，我们在这里要做的ID3D11Device在解码器的ID3D11Device中存储一个指向渲染器的ID3D11Device的AVCodecContext 。这就是使解码器将帧输出为DX11纹理的原因。

So now, when we send our decoded frames to the renderer, don’t need to transfer them to the CPU, and we don’t need to convert them to RGBA. We can simply do this:

因此，现在，当我们将解码后的帧发送到渲染器时，不需要将它们传输到CPU，也不需要将它们转换为RGBA。我们可以简单地做到这一点：

ComPtr<ID3D11Texture2D> texture = (ID3D11Texture2D*)frame->data[0];

But are we done? Nope. Not even close.

但是，我们完成了吗？不。差远了。

We need to move the pixel format conversion to the GPU. Our swap chain didn’t start magically being able to render NV12 frames, which means the conversion from NV12 to RGBA still has to happen somewhere. Now, instead of happening in the CPU, it’ll happen in the GPU. Specifically, in the pixel shader.

我们需要将像素格式转换移至GPU。 我们的交换链开始无法神奇地渲染NV12帧，这意味着从NV12到RGBA的转换仍然必须发生在某个地方。现在，它将发生在GPU中，而不是发生在CPU中。具体来说，在像素着色器中。

This makes logical sense; we can’t just sample a location in our texture anymore because our texture is no longer in RGBA. For our pixel shader to return the right RGBA value for every pixel, it’ll need to calculate it from the texture’s YUV values.

这是合乎逻辑的；我们不能再对纹理中的某个位置进行采样了，因为我们的纹理不再存在于RGBA中。为了使我们的像素着色器为每个像素返回正确的RGBA值，需要从纹理的YUV值中进行计算。

What that means is that we need to upgrade our pixel shader to take in NV12 and output RGBA. You could derive such a shader yourself, or just use one that’s already been written.

这意味着我们需要升级像素着色器，以使用NV12并输出RGBA。您可以自己派生这样的着色器，也可以只使用已经编写的着色器。

Add another shader resource view. While the RGBA pixel shader takes a single shader resource view as input, the NV12 pixel shader actually needs two: chrominance and luminance. So we’ll need to split our one texture into two shader resource views. (Before this moment, I didn’t understand why DirectX needed to distinguish between textures and shader resource views. Boy, am I glad they did.)

添加另一个着色器资源视图。 尽管RGBA像素着色器将单个着色器资源视图作为输入，但NV12像素着色器实际上需要两个：色度和亮度。因此，我们需要将一个纹理拆分为两个着色器资源视图。 (在此之前，我不明白为什么DirectX需要区分纹理和着色器资源视图。男孩，我很高兴他们这样做了。)

// DXGI_FORMAT_R8_UNORM for NV12 luminance channelD3D11_SHADER_RESOURCE_VIEW_DESC luminance_desc = CD3D11_SHADER_RESOURCE_VIEW_DESC(m_texture, D3D11_SRV_DIMENSION_TEXTURE2D, DXGI_FORMAT_R8_UNORM);m_device->CreateShaderResourceView(m_texture, &luminance_desc,  &m_luminance_shader_resource_view); // DXGI_FORMAT_R8G8_UNORM for NV12 chrominance channelD3D11_SHADER_RESOURCE_VIEW_DESC chrominance_desc = CD3D11_SHADER_RESOURCE_VIEW_DESC(texture,  D3D11_SRV_DIMENSION_TEXTURE2D, DXGI_FORMAT_R8G8_UNORM);m_device->CreateShaderResourceView(m_texture, &chrominance_desc, &m_chrominance_shader_resource_view);

Of course, we also need to make sure to allow our pixel shader to access these chrominance and luminance channels.

当然，我们还需要确保允许我们的像素着色器访问这些色度和亮度通道。

m_device_context->PSSetShaderResources(0, 1, m_luminance_shader_resource_view.GetAddressOf());m_device_context->PSSetShaderResources(1, 1, m_chrominance_shader_resource_view.GetAddressOf());

We need to open our texture as a shared resource. The ID3D11Texture2D object we keep in the renderer is the true bridge between the FFmeg frame and the shader resource views. We copy the new frames into it and extract the shader resource views out of it. It’s a shared resource, and we need to treat it as such.

我们需要打开纹理作为共享资源。 我们保留在渲染器中的ID3D11Texture2D对象是FFmeg框架和着色器资源视图之间的真正桥梁。我们将新框架复制到其中，并从中提取着色器资源视图。这是共享资源，我们需要这样对待。

ComPtr<IDXGIResource> dxgi_resource;m_texture->QueryInterface(__uuidof(IDXGIResource), reinterpret_cast<void**>(dxgi_resource.GetAddressOf()));dxgi_resource->GetSharedHandle(&m_shared_handle);m_device->OpenSharedResource(m_shared_handle, __uuidof(ID3D11Texture2D), reinterpret_cast<void**>(m_texture.GetAddressOf()));

We need to change how we copy the received texture. It’s obviously costly to create new shader resource views every time a frame is rendered, and memcpy isn’t an option anymore since we can’t access our texture’s underlying data easily. I figured the right way to copy the received frame to the texture was to use built-in DirectX functions, like CopySubresourceRegion().

我们需要更改复制接收到的纹理的方式。 每次渲染帧时创建新的着色器资源视图显然很昂贵，而且由于无法轻易访问纹理的基础数据，因此memcpy不再可用。我认为将接收的帧复制到纹理的正确方法是使用内置的DirectX函数，例如CopySubresourceRegion() 。

ComPtr<ID3D11Texture2D> new_texture = (ID3D11Texture2D*)frame->data[0];
const int texture_index = frame->data[1];m_device_context->CopySubresourceRegion(
        m_texture.Get(), 0, 0, 0, 0, 
        new_texture.Get(), texture_index, nullptr);

After these changes, I could safely kiss those av_hwframe_transfer_data() and sws_scale() functions goodbye, and at long, long last, say hello to a fully integrated FFmpeg-DirectX11 video player.

完成这些更改之后，我可以安全地亲吻av_hwframe_transfer_data()和sws_scale()函数，再见，最后，向完全集成的FFmpeg-DirectX11视频播放器问好。

Fin.

鳍。

翻译自: https://medium.com/@nehoraigold/streaming-video-with-ffmpeg-and-directx-11-7395fcb372c4

ffmpeg实时传输视频