Funsound语音识别技术之制作指定领域的语音数据集: 爬取B站音视频 + 基于whisper/funasr 语音识别预标注 + 人工纠正UI

原创已于 2024-08-29 11:19:21 修改 · 848 阅读

5 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#音视频 #whisper #语音识别 #funasr

于 2024-08-20 16:03:04 首次发布

跟随虾哥项目实践，硬件选小智就对了

xiaozhi 开源方案官方适配，二次开发文档齐全

点击查看

本文以制作小学课堂音频数据集为例子

在这里插入图片描述

1. 搜索关键字获取音视频链接


if __name__ == "__main__":
    
    with sync_playwright() as playwright:
        searcher = BLVideoSearch(playwright, headless=True)
        url = searcher.make_url(keyword=["小学公开课"])
        searcher.run(url, outfile="videos_url.txt")

得到链接列表

2. 批量下载和实时视频转音频

you-get: 根据链接下载视频文件
ffmpeg: 将视频实时转音频
subprocess: 通过子进程执行上述命令

2.1 多线程批量下载 (you-get)

you-get 子进程：

command = [YOUGET, "-o", self.video_dir, "-O", utt, task]
                    subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

2.2 实时视频转音频

ffmpeg 子进程：

command = [FFMPEG, "-i", video_file, '-ac', '1', '-ar', '16000', audio_file]
                    subprocess.run(command, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

下载视频文件信息如下：

最终保存为音频文件

3. 使用whisper或funasr进行多路转写

funsound支持多路离线转写，后端可以选用whisper or fuansr

from funsound.funasr.onnx.offline.asr import ASR
from funsound.common.executor import Worker, launch, get_worker_status, submit_task, get_task_progress
from funsound.utils import *

def init_engine(id):
    engine = ASR(cfg_file='conf/funasr_onnx.yaml',
                log_file=f'log/funasr-{id}.log')
    engine.init_state()
    return engine

def processor(self,params):
    audio_file = params[0]
    result = self.engine.inference(audio_file,
                                   make_sentence_split="punc")
    return result

Worker.processor = processor


if __name__ == "__main__":

    nj = 3 # 开启3路
    workers = []
    for id in range(nj):
        engine = init_engine(id)
        worker = Worker(wid=id,log_file=f'log/worker-{id}.log')
        worker.load_engine(engine=engine)
        workers.append(worker)
    launch(workers)
    print(get_worker_status(workers))


    audio_file = "/opt/wangwei/funsound_onnx/funsound/examples/test1.wav"
    task_id = submit_task(workers,params=[audio_file])

    while 1:
        prgs = get_task_progress(task_id)
        print(prgs)
        if prgs['status'] in ["SUCCESS","FAIL"]:
            if prgs['status'] == "SUCCESS":
                for line in  prgs['result']:
                    print(line)
            break
        time.sleep(1)