Czkawka/Krokiet 9.0 — Find duplicates faster than ever before

8 min readMar 16, 2025

Some say that Czkawka has one mode for removing duplicates and another for removing similar images. Nonsense. Both modes are for removing duplicates.

The current version primarily focuses on refining existing features and improving performance rather than introducing any spectacular new additions.

With each new release, it seems that I am slowly reaching the limits — of my patience, Rust’s performance, and the possibilities for further optimization.

Czkawka is now at a stage where, at first glance, it’s hard to see what exactly can still be optimized, though, of course, it’s not impossible.

Changes and Optimizations

For example, below is a performance graph from Hotspot, illustrating duplicate scanning in the current version on a 24-thread processor with an NVMe drive. It shows the application utilizing all available threads while searching for duplicates among 1.5 million files and computing hashes for Above:70 GB of data.

As you can see, the application is highly parallelized. The view from the Hotspot program shows that it utilizes all available threads. However, in my case, this does not mean that each thread is fully utilized at 100%. Despite having a fast NVMe drive, the application spends most of its time waiting for the appropriate batch of data to be retrieved from the disk before it can process it.

Above: RAM usage while hashing ~1 million files (max ~300MB).
Below: CPU usage ~10–40%

Still, in the left part of the hotspot graph, there is a visible 250 ms bar where only a single thread is active, handling memory operations (memory deallocation, vector operations). At the moment, I’m not entirely sure what causes this, but I believe it can be at least partially improved.

One of the more unusual performance issues I encountered was that while scanning around a million files, the algorithm initially distributed tasks evenly across all available threads. However, in the second half of the process, it used only one thread.

An initial analysis indicated that Rayon, using its workload distribution algorithm, eventually move all the “leftover” tasks into a single thread, overloading it with work.

files.into_par_iter().map(|e|…) // files is BTreeMap<String, FileEntry>

Since these tasks are time-consuming (reading data from disk), I figured that enforcing a maximum grouping of elements up to 3 might help.

let new_files = files.into_par_iter().collect::<Vec<_>>(); // Conversion to Vec<>, because with_max_len is not implemented for BTreeMap
non_cached_files_to_check.into_par_iter().with_max_len(3).map(|e|…) // Limiting group to max len 3,

And indeed — this solution eliminated most of the issues, though I’m still not entirely sure why they occurred in the first place or why the problem wasn’t always present.

Increasing the buffer size for file I/O operations

Since the very beginning, file hashing has been performed by reading 16 KB chunks of a file’s content and then applying the hashing operation. This generally didn’t cause major issues (aside from stack overflow on some systems when using a 128 KB buffer).

However, some users reported that on HDDs under heavy load, frequent head jumps between scanned files occurred, negatively impacting performance and disc acoustics.

Data read times for different buffer sizes

The simplest solution turned out to be increasing the buffer size to 2 MB — the downside is that each thread searching for duplicates will consume more memory. However, I find this trade-off fully acceptable, considering the performance boost it provides.

Tests showed that when scanning multiple files ranging from 50 KB to 50 MB on an HDD, scan time could be reduced by up to 50%, while on an SSD, the improvement ranged from a few to several percent.

To further enhance performance, I considered using semaphores to ensure that only one reading process would operate per disk at a time, minimizing head movement on HDDs.

However, tests showed no noticeable performance gains, so I abandoned the idea. That said, if you manage to convince me with a new implementation, I’m open to adding it in the future.

Interrupting Hash Calculations

In version 1.0, the entire GUI would freeze during hash calculations, making it impossible to monitor scan progress or stop the process.

By version 8.0, a scanned file counter already existed, and clicking Stop would cause the application to wait for all ongoing hash calculations to complete — meaning that if scanning a 4GB file had started, the user had to wait until it was fully processed.

In the latest version, I managed to implement near-instant scan interruption without any noticeable performance loss.

Additionally, in some modes, scan progress is now displayed in greater detail — not just the number of scanned files but also the total amount of bytes processed.

This provides a much clearer indication of how long the scan will take. Previously, the GUI sometimes displayed that nearly all files had been scanned, yet the process was still running, and the progress bar remained unchanged for a while.

Multi-threading

Probably the simplest way to improve performance is to process tasks in parallel across multiple threads.

One area where I applied this was in checking whether files within a given group are linked by hard links.
This operation requires access to disk resources, so in my case, checking 500,000 files initially took 4 seconds. After implementing the changes, the time dropped to around 1 second, suggesting that the task was no longer CPU-bound but instead limited by disk performance.

In practice, this modification came down to changing just a few lines of code, mainly replacing into_iter with its parallel equivalent into_par_iter. Since this is Rust, I was able to make this change without the slightest concern that it would introduce difficult-to-debug threading issues.

Faster Image Resizing

In the similar image search mode, the most demanding operations are reading the image from disk and subsequently resizing it (e.g., to 8x8, 16x16, or 32x32).
The larger the image, the slower these operations become.

There’s not much that can be done about the first issue — image-rs is already one of the most optimized libraries in this category (though it could likely be made even faster).

However, for image resizing, I discovered the fast-image-resize library, which — while introducing minimal differences in the appearance of the resized images compared to image-rs — can speed up the process by up to three times for large files.

Because of this, I decided to enable it by default, though users still have the option to disable it during compilation.

Resolution change time for 91 Images (4000x3000) depending on the used algorithm

Portable mode

Czkawka/Krokiet were designed to be portable applications by utilizing Rust and minimizing the number of dynamically linked libraries (mainly in the case of Krokiet). The goal was to ensure they run identically across different systems without requiring the installation of additional dependencies.

However, some users wanted the ability to store cache and settings on an external drive. I initially found this request somewhat unusual since each system typically has its own configuration and cache.

Nevertheless, since this was a frequently requested feature, I added the option to change the directory where the application creates its configuration and cache files. This makes it easy to transfer settings between computers without issues. When launching the application from the terminal, these paths are displayed there.

Changes in current version

Breaking changes

Video, Duplicate (smaller prehash size), and Image cache (EXIF orientation + faster resize implementation) are incompatible with previous versions and need to be regenerated.

Core

Automatically rotating all images based on their EXIF orientation
Fixed a crash caused by negative time values on some operating systems
Updated `vid_dup_finder`; it can now detect similar videos shorter than 30 seconds
Added support for more JXL image formats (using a built-in JXL → image-rs converter)
Improved duplicate file detection by using a larger, reusable buffer for file reading
Added an option for significantly faster image resizing to speed up image hashing
Logs now include information about the operating system and compiled app features(only x86_64 versions)
Added size progress tracking in certain modes
Ability to stop hash calculations for large files mid-process
Implemented multithreading to speed up filtering of hard links
Reduced prehash read file size to a maximum of 4 KB
Fixed a slowdown at the end of scans when searching for duplicates on systems with a high number of CPU cores
Improved scan cancellation speed when collecting files to check
Added support for configuring config/cache paths using the `CZKAWKA_CONFIG_PATH` and `CZKAWKA_CACHE_PATH` environment variables
Fixed a crash in debug mode when checking broken files named `.mp3`
Catching panics from symphonia crashes in broken files mode
Printing a warning, when using `panic=abort`(that may speedup app and cause occasional crashes)

Krokiet

Changed the default tab to “Duplicate Files”

GTK GUI

Added a window icon in Wayland
Disabled the broken sort button

CLI

Added `-N` and `-M` flags to suppress printing results/warnings to the console
Fixed an issue where messages were not cleared at the end of a scan
Ability to disable cache via `-H` flag(useful for benchmarking)

Prebuild-binaries

This release is last version, that supports Ubuntu 20.04 github actions drops this OS in its runners
Linux and Mac binaries now are provided with two options x86_64 and arm64
Arm linux builds needs at least Ubuntu 24.04
Gtk 4.12 is used to build windows gtk gui instead gtk 4.10
Dropping support for snap builds — too much time-consuming to maintain and testing(also it is broken currently)
Removed native windows build krokiet version — now it is available only cross-compiled version from linux(should not be any difference)

Next version

In the next version, I will likely focus on implementing missing features in Krokiet that are already available in Czkawka, such as selecting multiple items using the mouse and keyboard or comparing images.

Although I generally view the transition from GTK to Slint positively, I still encounter certain issues that require additional effort, even though they worked seamlessly in GTK. This includes problems with popups and the need to create some widgets almost from scratch due to the lack of documentation and examples for what I consider basic components, such as an equivalent of GTK’s TreeView.

Price — free, so take it for yourself, your friends, and your family. Licensed under MIT/GPL

Repository — https://github.com/qarmin/czkawka

Files to download — https://github.com/qarmin/czkawka/releases