About 2x faster on my 4-cores ARM server, without any significant parallelism overhead:
$ time ffmpeg_threading/ffmpeg -i input.mp4 -ar 1000 -vn -acodec flac -f flac -y /dev/null -hide_banner -loglevel quiet
14.90s user 2.08s system 218% cpu 7.771 total
$ time ffmpeg -i input.mp4 -ar 1000 -vn -acodec flac -f flac -y /dev/null -hide_banner -loglevel quiet
14.05s user 1.80s system 114% cpu 13.841 total
You are not using hardware acceleration on the decoding side, and removing video output here. I wonder what happens if we use both hardware acceleration on video decoding and encoding, i.e. something like this on NVIDIA card
ffmpeg -hwaccel cuda -i $inputFile -codec:a copy -codec:v hevc_nvenc $output
What's to note about hardware acceleration on the transcoding with NVENC is that it actually has a resolution limit, so if you're trying to transcode something like 8K VR video using that it'll choke
But what part gets multi threading? Because the video compression is already multithreaded. Video decompression I am not sure. And I think anything else is fairly small in comparison in term of performance cost. All improvements are welcome but I would expect the impact to be fairly immaterial in practice.
Well, that's the very specific command I'm using in one of my webapps (https://datethis.app), and it's one of the main performance hotspots, so it's very *not* immaterial.
Very interesting! I had seen the "learn more" video already, but it stayed in a corner of my mind.
To compare any given piece of sound with reference sounds for ENF analysis, the references must have been recorded to start with.
The fact that a webapp like yours can exist... does it mean that we, indeed, have recordings of electrical hum spanning years and years? Are they freely available, or are they commercial products?
It seems so crazy to me that someone decided to put a recorder next to a humming line just to be able to later in the future match the sound with some other recordings...
For Europe, there are academic and public organizations that publish these ENF backlog since about 2017.
For US, I couldn't find any open dataset. For these regions, I'm basically recording the sound of an A/C motor to get the reference data, but I only have a few months of backlog.
This is removing the video stream (-vn) so that's not involved. Not sure which parts are in parallel here, but I'm guessing decoding and encoding the audio.
Threading depends on implementation of each encoder/decoder - most video encoders and decoders are multithreaded, audio ones not so much. At least that was the state of the world the last time I've looked into ffmpeg internals.
Video compression (at least x264/x265) has a maximum number of threads it can use depending on the video resolution. This means that e.g. for 1080p ffmpeg cannot fully utilize a 64-thread CPU.
Nice, great presentation! Curious what he has in mind for the "dynamic pipelines" and "scripting (Lua?)" he mentions in the "Future directions" section. I'm imagining something more powerful for animating properties?
Man, i really want to watch this presentation, but the piss poor audio just causes my brain to have a fit. How in today's time is this still possible to screw up so badly?
Agreed. Maybe some are not as sensitive to this, but it is a major energy suck for me. A little post-processing on noise and compression would come a long way. This recording is as raw as the Ramsey meme.
if you weren't on a different continent separated by a very large body of water, I'd be there. I'll donate by suggesting the use of a lav mic vs a podium mic.
It is very difficult for some people to be able to understand clearly voices that are muddled from off axis audio recording. It's a real condition. I have hard time hearing voices in a crowded room from people across the table from me. We spend time worrying about the aria tags in our mark up, but we just assume that everyone has the same hearing abilities? I get that most people probably don't think about this when they don't have a hearing condition, but to be dismissive about it is an entirely different level of egregiousness.
Could my initial criticism have been provided with an entirely different tact, absolutely. But after the mental exhaustion that video was, that was all the energy I could afford at the time.
What are you talking about... the audio might not professional studio level 10/10, but I don't see anything significantly wrong with it - given that its more like a standard presentation mic. Its clearly good enough.
Every time he turns away from the mic and continues talking while looking at the projection his volume goes way down and at best sounds like a mumble. It is very taxing to keep up with him when he's turned away. It's the wrong mic for the task.
His work is not primarily about multithreading but about cleaning up ffmpeg to be true to its own architecture so that normal human beings have a chance of being able to maintain it. Things like making data flow one way in a pipeline, separating public and private state and having clearly defined interfaces.
Things had got so bad that every change was super difficult to make.
Multithreading comes out as a natural benefit of the cleanup.
^ This: I just spent the ~20 minutes necessary to watch that part of the talk at a reasonable 1.5x speed, and that's the summary. Ffmpeg was suffering from 20 years of incremental change and lack of cleanup/refactoring, and that's what he's spent two years doing.
A couple of great lines including "my test for deprecating an option is if it's been broken for years and nobody is complaining, then definitely nobody is using it".
The verdict of the presentation was that many options are (bad) duplicates of the filter graph, and you should configure the software through the filter graph.
We saw in openssl what the consequences of never removing any code for decades were. It has a real cost.
You can't expect open source developers to be omniscient and know you want to use a specific feature if you don't communicate that to them.
Would you rather have them add telemetry?
> "multi-threading it's not really all about just multi-threading - I will say more about that later - but that is the marketable term"
That's whats said in the video at least in the first 10 seconds so it might be that multi-threading is just a too trivial term for the work here. (But haven't watched the video yet so just an observation.)
If I recall correctly, he meant that multi-threading wasn't a direct goal, but a "side effect" of rewriting some parts of the codebase to match the expected conceptual flow, and paying off some re-write debt that had been accumulated after years of new features being "tacked on".
There are many different types of pipelined processing tasks with many differing kinds of threading approaches, and I guess the video clears up what kinds of approaches work best with transcoding ..
Are there any video editing software that take advantage of ffmpeg? I once thought about making something to draw geometry through SVG and use ffmpeg then, or maybe add some UI or whatever, or just to add text, but I never started.
Avidemux feels like it's a bit that.
Since ffmpeg internals are quite raw and not written to be accessed through a GUI, any video editor based on it would probably be quite clunky and weird and hard to maintain.
Maybe an editor that use modules that just build some kind of preview with an command explainer, or some pipeline viewer.
ffmpeg is quite powerful, but it's a bit stuck because it only works with a command line, which is fine, but I guess it somehow prevents it from being used by some people.
I've already written a python script to take a random amount of clips, and build a mosaic with the xstack filter. It was not easy.
VapourSynth is intended to be a middle ground you might be seeking. You manipulate the video in Python instead of ffmpeg's CLI, but its often more extensible and powerful than pure ffmpeg due to the extensions:
I’ve been looking for more ways to speedup the transcoding process, one solution I found was using gpu acceleration, another was using more threads but its hard to find the optimal amount I should provide.
Can't you just use Hyperparameter Optimization to find the best value? Tools like Sherpa or Scikit-optimize can be used to explore a search space of n-threads/types of input/CPU type (which might be fixed on your machine).
I don't think "just" is appropriate here, that makes it sound like this should be a trivial task for anyone while it is not. Using "just" like this minimizes work and makes people feel stupid which leads to various negative outcomes.
For most workloads, setting the number of threads to the number of vCPUs (i.e. count each hyperthreaded core as 2) works. But GPU acceleration is much better if it's available to you.
Though in my tests I found that gpu acceleration of video decoding actually hurts performance. It seems software decoding is faster than hardware for some codecs. Of course not the case for encoding.
That heavily depends on the GPU being used and whether or not it has hardware support for your codec. Maybe your GPU is just old/weak compared to your CPU?
That's possible, but if you look at nvidia, the whole range uses the same hardware accelerator, so at most it is a difference in term of chip generation, not so much GPU model.
I am not saying GPU hw decoding isn't useful, it certainly is in term of power consumption, and the CPU might be better used for something else happening at the same time. But in term of raw throughput it's not clear that a GPU beats a recent CPU.
"FFmpeg is licensed under the GNU Lesser General Public License (LGPL) version 2.1 or later. However, FFmpeg incorporates several optional parts and optimizations that are covered by the GNU General Public License (GPL) version 2 or later. If those parts get used the GPL applies to all of FFmpeg. "
The general use case for ffmpeg inside proprietary software is the version of ffmpeg used needs to be statically compiled and linked into the software's executable, or it needs to be a separate executable called by the proprietary software.
Can you drill down a bit more into this? I would consider static linking to be including unmodified ffmpeg with my application bundle and calling it from my code (either as a pre-built binary from ffmpeg official or compiled by us for whatever reason, and called either via a code interface or from a child process using a command line interface). Seems bsenftner's comment roughly confirms this, tho their original comment does make the distinction between the two modes.
Static linking means combining compiled object files (e.g. your program and ffmpeg) into a single executable. Loading a .so or .dll file at runtime would be dynamic linking. Invoking through a child process is not linking at all.
Basically you must allow the user to swap out the ffmpeg portion with their own version. So you can dynamically link with a .dll/.so, which the user can replace, and you can invoke a CLI command, which the user can replace. Any modifications you make to the ffmpeg code itself must be provided.
It is widely known and accepted that you need to dynamically link to satisfy the LGPL (you can static link if you are willing to provide your object files on request). There is a tl;dr here that isn't bad: https://fossa.com/blog/open-source-software-licenses-101-lgp...
If one statically links ffmpeg into a larger proprietary application, the only source files one needs to supply are your ffmpeg sources, modified or not. The rest of the application's source does not have to be released. In my (now ex) employer's case, only the low level av_read_frame() function was modified. The entire ffmpeg version used, plus a notice about that being the only modification, is in the software as well as the employer's web site in multiple places. They're a US DOD contractor, so their legal team is pretty serious.
a) […] if the work is an executable linked with the Library, [accompany the work] with the complete machine-readable ‘work that uses the Library’, as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. […]
b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with.
To @keepamovin, "called either via a code interface or from a child process using a command line interface" -- regardless of the license terms, fork()/exec()'d programs "could never" impose any licensing requirements on the parent because the resulting interaction among parent/child is not a derived work. As usual: IANAL, this probably pertains more to USC than other jurisdictions.
I used to work at an FR video security company, where our product was in a significant percentage of the world's airports and high traffic hubs. Statically linked ffmpeg for the win.
Not a video encoding expert, but for live streams you can't merge the output until you process all the N parts, so you introduce delays. And if any part of the input pipeline, like an overlay containing a logo or text, is generated dynamically i.e. not a static mp4, it basically counts as a live stream.
Why not cut the image in rectangles and process those simultaneously? Wouldn't that work for live streams? (There may be artefacts at the seams though?)
You are right, but first you ideally should perform scene change detection (to put keyframes at propper positions), and that alone takes quite some processing.