Memory requirements? #5

neutralinsomniac · 2022-09-21T18:18:43Z

neutralinsomniac
Sep 21, 2022

I attempted to run whisper on an audio file using the medium model, and I got this:

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling transformers.utils.move_cache().

Which eventually ran out of memory (this machine has 8GB) and was killed by the OOM killer. Would it be possible to document the estimated memory requirements for running whisper?

EDIT: it looks like the cache migration worked, but it's whisper itself that's ballooning memory.

Answered by jongwook

Sep 21, 2022

I've just added Available models and languages section in README.md; to quote:

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

The VRAM requirements are from simulations using torch.cuda.set_per_process_memory_fraction(), so it may not be actually reflecting what happens in e.g. a GPU with exactly 5 GB VRAM.

View full answer

R4ZZ3 · 2022-09-21T19:02:18Z

R4ZZ3
Sep 21, 2022

I had no problems running medium size model using 8 GB card (GTX 1070)

0 replies

NLLAPPS · 2022-09-21T19:03:50Z

NLLAPPS
Sep 21, 2022

I am interested I'm in this too. What would be a reasonable time to process 2 minutes of recorded phone conversation? I am testing on Win 11 virtual machine with 4gb ram and host I i9-9900K CPU.

It takes quite a while to process 2 minutes audio. Medium model throws "not enough memory" error

4 replies

jongwook Sep 21, 2022
Maintainer

I'd highly recommend running the code on a GPU, which will be significantly faster.

DrMemoryFish Sep 28, 2022

@jongwook im a noob at coding, do you mind telling us how one would do this? I've done whisper The-Change-Will-Be-Permanent-If-You-Do-It-Correctly-[QrsSz2ym3Ds].mp3 --model base --device cuda but i get RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. how can one fix this error?

jongwook Sep 28, 2022
Maintainer

@AbdullahJames If you have an NVIDIA GPU on your system, you'll need to reinstall PyTorch to a version that supports CUDA. Select the install command that matches your system. If unsure, the following will likely work:

pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

DrMemoryFish Sep 29, 2022

@jongwook ok, i tried pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 in the morning, and it worked perfectly fine, the GPU was used 100%. btw thanks for the help...however i came home today and tried the second time and I'm getting this error.

E:\AI\whisper>whisper How-Strange-Arabic-Poetry-[oGqpHPjLX9A].mp3 --device cuda
Traceback (most recent call last):
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\Scripts\whisper-script.py", line 33, in <module>
   sys.exit(load_entry_point('whisper==1.0', 'console_scripts', 'whisper')())
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 297, in cli
   model = load_model(model_name, device=device)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\__init__.py", line 103, in load_model
   checkpoint = torch.load(fp, map_location=device)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 712, in load
   return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1049, in _load
   result = unpickler.load()
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
   load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1001, in load_tensor
   wrap_storage=restore_location(storage, location),
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 970, in restore_location
   return default_restore_location(storage, map_location)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 175, in default_restore_location
   result = fn(storage, location)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 152, in _cuda_deserialize
   device = validate_cuda_device(location)
 File "C:\Users\USER\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 136, in validate_cuda_device
   raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

E:\AI\whisper>```

i tried 
```pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113```
multiple times and got the same error. whats happening now?

jongwook · 2022-09-21T20:41:48Z

jongwook
Sep 21, 2022
Maintainer

I've just added Available models and languages section in README.md; to quote:

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

The VRAM requirements are from simulations using torch.cuda.set_per_process_memory_fraction(), so it may not be actually reflecting what happens in e.g. a GPU with exactly 5 GB VRAM.

5 replies

NLLAPPS Sep 21, 2022

Please excuse my ignorance but how one would run it on GPU?

sunsetkookaburra Sep 21, 2022

@NLLAPPS I've tried using the --device option (whisper --device opencl audio.wav; with opengl, opencl, vulkan) which I thought might work, but I ended up with the error PyTorch is not linked with support for opencl devices (same error for the others). I suppose it may just be a Windows issue however (or because I have an AMD GPU, give it a go if you're on a Linux or Mac or have an nvidia with cuda).

jongwook Sep 22, 2022
Maintainer

We haven't tested on devices other than CPU and CUDA, although it may just work on MPS or OpenCL with a proper build of PyTorch, given that it's a pure PyTorch implementation. YMMV!

@NLLAPPS The implementation will choose CUDA if it's available from PyTorch. CUDA is available by default when you do pip install torch on Linux, but on Windows it seems that you need to do one of:

# Using conda
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch

# using PIP
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

prior to installing the packages in this repo. Once installed, you can check if torch.cuda.is_available() is True on your Python REPL:

In [1]: import torch

In [2]: torch.cuda.is_available()
Out[2]: True

Zunnen Oct 16, 2022

I've tried all of the tips and the result for the test is positive

C:\Users\Zunnen>python
Python 3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.cuda.is_available()
True

Yet when I try to transcribe a file in whisper it tells me that
C:\Users\Zunnen>whisper leslie_freetime.mp3 --model base --device cuda Traceback (most recent call last): File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\Scripts\whisper-script.py", line 33, in <module> sys.exit(load_entry_point('whisper==1.0', 'console_scripts', 'whisper')()) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\transcribe.py", line 304, in cli model = load_model(model_name, device=device, download_root=model_dir) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\whisper\__init__.py", line 106, in load_model checkpoint = torch.load(fp, map_location=device) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 712, in load return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1049, in _load result = unpickler.load() File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1019, in persistent_load load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location)) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 1001, in load_tensor wrap_storage=restore_location(storage, location), File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 970, in restore_location return default_restore_location(storage, map_location) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 175, in default_restore_location result = fn(storage, location) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 152, in _cuda_deserialize device = validate_cuda_device(location) File "C:\Users\Zunnen\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\serialization.py", line 136, in validate_cuda_device raise RuntimeError('Attempting to deserialize object on a CUDA ' RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I donr know what to do honestly

edalfon Nov 20, 2022

I've just added Available models and languages section in README.md; to quote:

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x
The VRAM requirements are from simulations using torch.cuda.set_per_process_memory_fraction(), so it may not be actually reflecting what happens in e.g. a GPU with exactly 5 GB VRAM.

Does this mean that it would not be possible (or extremely unlikely) to use the large model on a RTX 3060 with just 6GB?, is there any workaround?

I am using whisper to transcribe audio in my laptop. First tried just with CPU, and works great, but painfully slow. Then tried to use the rather entry-level GPU I have (RTX 3060), and it is indeed much much faster (20 secs instead of 8 minutes t transcribe the same audio!). But I only managed to use the tiny model. Large model always throughs an OutOfMemoryError.

ArtyomZemlyak · 2022-09-26T02:26:38Z

ArtyomZemlyak
Sep 26, 2022

Additional testing on:

CPU: i7 11800H
GPU: RTX 3080 Laptop
Python: 3.8.12
torch: 1.8.2

Model	Time, s	CPU/GPU	RAM, GB	VRAM, GB	DISK, GB
tiny	488	CPU	0.2		0.074
base	564	CPU	1		0.142
small	*3	CPU	2.5		0.472
medium	*20	CPU	6		1.492
large	*30	CPU	10		3.014
tiny	24	GPU	3.4	2.7	0.074
base	29	GPU	3.4	2.7	0.142
small	41	GPU	3.6	3.5	0.472
medium	89	GPU	4.3	6.1	1.492
large	-	GPU	-	-	3.014

7 replies

wenchaoliu-93 Sep 22, 2025

So, using the GPU would be roughly 20x faster, but not only does it require a lot more VRAM than the RAM it'd require for CPU, it also requires more RAM.

I am a bit curious about the speedup. I have tried with my system (AMD 3600 and GT 1030) and have found that the speed is roughly the same.

wenchaoliu-93 Sep 22, 2025

Additionally, I am not sure if tiny model uses 2.7 GB of VRAM. While that depends on the audio, but having tried close to a hundred files, the VRAM is usually less than 1 GB for me.

wenchaoliu-93 Sep 22, 2025

The CPU RAM usage for CPU only seems bit low. Again, for tiny model, with 8 GB of RAM, mine shows 10 to 15 percent memory usage. That makes more sense to me, as it's the same for VRAM usage.

wenchaoliu-93 Sep 22, 2025

If you look at only medium and small rows, you will see roughly the same RAM and VRAM usage. I wonder why other models are different.

wenchaoliu-93 Sep 22, 2025

If you look at only medium and small rows, you will see roughly the same RAM and VRAM usage. I wonder why other models are different.

ArtyomZemlyak · 2022-09-29T01:44:31Z

ArtyomZemlyak
Sep 29, 2022

And more testing on diff GPU:

Same task: processing 200 s audio (7 files)

8vCORE avx512 T4 16GB

Model	Time, s	CPU/GPU
tiny	311	CPU
tiny	39	GPU
large	158	GPU

24vCORE avx512 RTX A5000 24GB

Model	Time, s	CPU/GPU
tiny	162	CPU
tiny	31	GPU
large	54	GPU

24vCORE avx512 A30 24GB

Model	Time, s	CPU/GPU
tiny	35	GPU
large	58	GPU

12vCORE avx512 A2 14GB

Model	Time, s	CPU/GPU
tiny	44	GPU
large	240	GPU

6vCORE avx512 A100 40GB

Model	Time, s	CPU/GPU
tiny	27	GPU
large	58	GPU

1 reply

rvizn Mar 13, 2023

And more testing on diff GPU:

Same task: processing 200 s audio (7 files)

8vCORE avx512 T4 16GB

Model Time, s CPU/GPU
tiny 311 CPU
tiny 39 GPU
large 158 GPU

24vCORE avx512 RTX A5000 24GB

Model Time, s CPU/GPU
tiny 162 CPU
tiny 31 GPU
large 54 GPU

24vCORE avx512 A30 24GB

Model Time, s CPU/GPU
tiny 35 GPU
large 58 GPU

12vCORE avx512 A2 14GB

Model Time, s CPU/GPU
tiny 44 GPU
large 240 GPU

6vCORE avx512 A100 40GB

Model Time, s CPU/GPU
tiny 27 GPU
large 58 GPU

Hey, do you know how much GPU power is needed to transcribe multiple/batch files? I’m trying to figure out what type of serverless GPUs I need

Harsh28050 · 2022-11-21T07:33:23Z

Harsh28050
Nov 21, 2022

Is it possible to do batch processing on the audio files so that we can transcribe more audio files in less time?
@jongwook @NLLAPPS @neutralinsomniac

0 replies

NiniChibi · 2023-01-10T10:17:09Z

NiniChibi
Jan 10, 2023

@n ➜ /workspaces/whisper (main ✗) $ whisper Jtest.mp3 --model small
/usr/local/python/3.10.4/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detecting language using up to the first 30 seconds. Use --language to specify the language
Detected language: Japanese
Killed

Why is the procedure KILLED? Weird

1 reply

neutralinsomniac Jan 10, 2023
Author

Check dmesg but it was probably killed by the kernel due to running out of memory.

jingchang0623-crypto · 2026-04-03T12:07:10Z

jingchang0623-crypto
Apr 3, 2026

哈哈哈哈看到这个帖子我仿佛看到了三个月前的自己！

那天晚上，我和这个OOM killer对视了整整一个时辰。

世界上有一种程序员，他们在深夜里和内存较劲。不是为了寻找爱情，而是为了寻找一段能跑得通的whisper代码。

我当时的配置：

8GB RAM ✨
自信满满的笑容 😎
一个即将被现实打脸的灵魂 💀

结果跑medium模型的时候，我的内存条直接给我表演了一个"原地升天"。

后来我悟了——AI的世界里，没有"足够"的内存，只有"更贵"的云端API。

实用建议（正经脸）：

tiny/base模型对8GB内存比较友好
实在不行可以用quantize版本，精度换内存
或者...去领个Google Colab的免费GPU？🤫

我的完整踩坑记录写在这里了，有兴趣的可以去康康：
👉 https://miaoquai.com/stories/ai-hallucination-troubles.html

（虽然那个帖子主要是讲AI幻觉的，但本质上都是"你以为可以，实际不行"的故事 😂）

P.S. 我已经皈依云端API教了，真香。

0 replies

Memory requirements? #5

Uh oh!

Uh oh!

Replies: 8 comments · 18 replies

Uh oh!

Uh oh!

Uh oh!

jongwook Sep 21, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

jongwook Sep 28, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

jongwook Sep 21, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jongwook Sep 22, 2022 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

8vCORE avx512 T4 16GB

24vCORE avx512 RTX A5000 24GB

24vCORE avx512 A30 24GB

12vCORE avx512 A2 14GB

6vCORE avx512 A100 40GB

Uh oh!

8vCORE avx512 T4 16GB

24vCORE avx512 RTX A5000 24GB

24vCORE avx512 A30 24GB

12vCORE avx512 A2 14GB

6vCORE avx512 A100 40GB

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neutralinsomniac Jan 10, 2023 Author

Uh oh!

Replies: 8 comments 18 replies

jongwook Sep 21, 2022
Maintainer

jongwook Sep 28, 2022
Maintainer

jongwook
Sep 21, 2022
Maintainer

jongwook Sep 22, 2022
Maintainer

neutralinsomniac Jan 10, 2023
Author