The most surprising part of uv's success to me isn't Rust at all, it's how much speed we "unlocked" just by finally treating Python packaging as a well-specified systems problem instead of a pile of historical accidents. If uv had been written in Go or even highly optimized CPython, but with the same design decisions (PEP 517/518/621/658 focus, HTTP range tricks, aggressive wheel-first strategy, ignoring obviously defensive upper bounds, etc.), I strongly suspect we'd be debating a 1.3× vs 1.5× speedup instead of a 10× headline — but the conversation here keeps collapsing back to "Rust rewrite good/bad." That feels like cargo-culting the toolchain instead of asking the uncomfortable question: why did it take a greenfield project to give Python the package manager behavior people clearly wanted for the last decade?
It's not just greenfield-ness but the fact it's a commercial endeavor (even if the code is open-source).
Building a commercial product means you pay money (or something they equally value) to people to do your bidding. You don't have to worry about politics, licensing, and all the usual FOSS-related drama. You pay them to set their opinions aside and build what you want, not what they want (and if that doesn't work, it just means you need to offer more money).
In this case it's a company that believes they can make a "good" package manager they can sell/monetize somehow and so built that "good" package manager. Turns out it's at least good enough that other people now like it too.
This would never work in a FOSS world because the project will be stuck in endless planning as everyone will have an opinion on how it should be done and nothing will actually get done.
Similar story with systemd - all the bitching you hear about it (to this day!) is the stuff that would've happened during its development phase had it been developed as a typical FOSS project and ultimately made it go nowhere - but instead it's one guy that just did what he wanted and shared it with the world, and enough other people liked it and started building upon it.
I don't know what you think "typical Foss projects" are but in my experience they are exactly like your systemd example: one person that does what they want and share it with the world. The rest of your argument doesn't really make any sense with that in mind.
That's no longer as true as it once was. I get the feeling that quite a few people would consider "benevolent dictator for life" an outdated model for open source communities. For better or worse, there's a lot of push to transition popular projects towards being led by committee. Results are mixed (literally: I see both successes and failures), but that doesn't seem to have any effect on the trend.
Only a very, very small fraction of open source projects get to the point where they legitimately need committees and working groups and maintainer politics/drama.
> quite a few people would consider "benevolent dictator for life" an outdated model for open source communities.
I think what most people dislike are rugpulls and when commercial interests override what contributors/users/maintainers are trying to get out of a project.
For example, we use forgejo at my company because it was not clear to us to what extent gitea would play nicely with us if we externalized a hosted version/deployment their open source software (which they somewhat recently formed a company around, and led to forgejo forking it under the GPL). I'm also not a fan of what minio did recently to that effect, and am skeptical but hopeful that seaweedfs is not going to do something similar.
We ourselves are building out a community around our static site generator https://github.com/accretional/statue as FOSS with commercial backing. The difference is that we're open and transparent about it from the beginning, and static site generators/component libraries are probably some of the least painful to fork or take issue with their direction, vs critical infrastructure like distributed systems' storage layer.
Bottom line is, BDFL works when 1. you aren't asking people to bet their business on you staying benevolent 2. you remain benevolent.
> Only a very, very small fraction of open source projects get to the point where they legitimately need committees and working groups and maintainer politics/drama.
You’re not wrong, but those are the projects we’re talking about in this thread. uv has become large enough to enter this realm.
> Bottom line is, BDFL works when 1. you aren't asking people to bet their business on you staying benevolent 2. you remain benevolent.
That second point is doing a lot of heavy lifting. All of the BDFL models depend on that one person remaining aligned, interested, and open to new ideas. A lot of the small projects I’ve worked with have had BDFL models where even simple issues like the BDFL becoming busy or losing interest became the death knell of the project. On the other hand, I can think of a few committee-style projects where everything collapsed under infighting and drama from the committee.
It depends on governance, for want of a better word: if a project has a benevolent dictator then that project will likely be more productive than one that requires consensus building.
That's what I'm saying. Benevolent dictator is the rule, not the exception, in FOSS. Which is why GP's argument that private companies good, FOSS bad, makes no sense.
I think OP is directing their ire towards projects with multiple maintainers, thus is more likely to be hamstrung by consensus building and is thus less productive. It does seem like we've been swamped with drama posts about large open-source projects and their governance, notably with Rust itself, linux incorporating Rust, Pebble, etc. It's not hard to imagine this firehose of dev-drama (that's not even about actual code) overshadowing the fact that the overwhelming majority of code ever written has a benevolent dictator model.
The argument isn't about proprietary vs open, but that design by committee, whether that committee be a bunch of open source heads that we like, or by some group that we've been told to other and hate, has limitations that have been exhibited here.
> You don't have to worry about politics, licensing, and all the usual FOSS-related drama. You pay them to set their opinions aside and build what you want, not what they want (and if that doesn't work, it just means you need to offer more money).
Money is indeed a great lubricator.
However, it's not black-and-white: office politics is a long standing term for a reason.
Office politics happen when people determine they can get more money by engaging in politics instead of working. This is just an indicator people aren't being paid enough money (since people politicking around is detrimental to the company, it is better off paying them whatever it takes for them not to engage in such behavior). "You get what you pay for" applies yet again.
> In large companies people engage in politics because it becomes necessary to accomplish large things.
At a large company, your job after a certain level depends on your “impact” and “value delivered”. The challenge is getting 20 other teams to work on your priorities and not their priorities. They too need to play to win to keep their job or get that promotion.
For software engineering, “impact” or “value delivered” are pretty much always your job unless you work somewhere really dysfunctional that’s measuring lines of code or some other nonsense. But that does become a lot about politics after some level.
I would not say it’s about getting other people aligned with your priorities instead of theirs but rather finding ways such that your priorities are aligned. There’s always the “your boss says it needs to help me” sort of priority alignment but much better is to find shared priorities. e.g. “We both need X; let’s work together.” “You need Foo which you could more easily achieve by investing your efforts into my platform Bar.”
If you are a fresh grad, you can mostly just chug along with your tickets and churn out code. Your boss (if you have a good boss) will help you make sure the other people work with you.
When you are higher up, that is when you become said good boss, or that boss's boss, the dynamics of the grandfather comment kick in fully.
Agree. A fresh grad is still measured on “impact” but that impact is generally localized. e.g. Quality of individual code and design vs ability to wrangle others to work with you.
Impact is a handwavy way of saying “is your work good for the company”.
Exceeds 1. Politics is the craft of influence. And, debatably, there's a politic even when population size=1, between your subconscious instinctive mind (eat the entire box of donuts) versus your conscious mind (don't spike your blood sugar).
I think too many people happens because a company would rather hire 10 "market rate" people than 3 well-compensated ones. Headcount inflation dilutes responsibility and rewards, so even if one of the "market rate" guys does the best work possible they won't get rewarded proportionally... so if hard work isn't going to get them adequate comp, maybe politics will.
Alternatively, companies hire multiple subject domain experts, and pay them handsomely.
The experts believe they've been hired for the value of their opinions, rather than for being 'yes-people', and have differing opinions to each other.
At a certain pay threshold, there are multiple peoples who's motivation is not "how do I maximise my compensation?" and instead is "how do I do the best work I can?" Sometimes this presents as vocal disagreements between experts.
> a company would rather hire 10 "market rate" people than 3 well-compensated ones
The former is probably easier. They don't have to justify or determine the salaries, and don't have to figure out who's worth the money, and don't have to figure out how to figure that out.
It also comes that the well-compensated people are probably that because they know how to advocate for their worth, which usually includes a list of things they will tolerate and a list they will not, whereas "market rate" is just happy to be there and more inclined to go along with, ya know, whatever.
I believe incompetence is the key. When someone cannot compete (or the office does not use yardstick that can be measurable) politics is the only way to get you up.
Switch to what Nobel prize to man instead of the woman who do the work … sometimes. Take the credit and get the promotion.
It's a question of what you want to invest your time in. Everyone creates output, whether it's lines of code, a smoke screen to hide your social media time, or a set of ongoing conversations and perceptions than you have a use in the organization.
Sounds like you’re really down on FOSS and think FOSS projects don’t get stuff done and have no success? You might want to think about that a bit more.
FOSS can sometimes get stuff done but I'd argue it gets stuff done in spite of all the bickering, not because of it. If all the energy spent on arguments or "design by committee" was spent productively FOSS would go much farther (hell maybe we'd finally get that "year of the Linux desktop").
It doesn't have to make money now. But it's clearly pouring commercial-project-level of resources into uv, on the belief they will somehow recoup that investment later on.
nah, a lot of people working on `uv` have a massive amount of experience working on the rust ecosystem, including `cargo` the rust package manager. `uv` is even advertised as `cargo` for python. And what is `cargo`? a FLOSS project.
Lots of lessons from other FLOSS package managers helped `cargo` become great, and then this knowledge helped shape `uv`.
Keep in mind that "making money" doesn't have to be from people paying to use uv.
It could be that they calculate the existence of uv saves their team more time (and therefore expense) in their other work than it used to create. It could be that recognition for making the tool is worth the cost as a marketing expense. It could be that other companies donate money to them either ahead of time in order to get uv made, or after it was made to encourage more useful tools to be made. etc
«« I don't want to charge people money to use our tools, and I don't want to create an incentive structure whereby our open source offerings are competing with any commercial offerings (which is what you see with a lost of hosted-open-source-SaaS business models).
What I want to do is build software that vertically integrates with our open source tools, and sell that software to companies that are already using Ruff, uv, etc. Alternatives to things that companies already pay for today.
An example of what this might look like (we may not do this, but it's helpful to have a concrete example of the strategy) would be something like an enterprise-focused private package registry. A lot of big companies use uv. We spend time talking to them. They all spend money on private package registries, and have issues with them. We could build a private registry that integrates well with uv, and sell it to those companies. [...]
But the core of what I want to do is this: build great tools, hopefully people like them, hopefully they grow, hopefully companies adopt them; then sell software to those companies that represents the natural next thing they need when building with Python. Hopefully we can build something better than the alternatives by playing well with our OSS, and hopefully we are the natural choice if they're already using our OSS. »»
> This would never work in a FOSS world because the project will be stuck in endless planning as everyone will have an opinion on how it should be done and nothing will actually get done.
numpy is the the de-facto foundation for data science in python, which is one of the main reasons, if not the main reason, why people use python
I largely agree but don't want to entirely discount the effect that using a compiled language had.
At least in my limited experience, the selling point with the most traction is that you don't already need a working python install to get UV. And once you have UV, you can just go!
If I had a dollar for every time I've helped somebody untangle the mess of python environment libraries created by an undocumented mix of python delivered through the distributions package management versus native pip versus manually installed...
At least on paper, both poetry and UV have a pretty similar feature set. You do however need a working python environment to install and use poetry though.
> the selling point with the most traction is that you don't already need a working python install to get UV. And once you have UV, you can just go!
I still genuinely do not understand why this is a serious selling point. Linux systems commonly already provide (and heavily depend upon) a Python distribution which is perfectly suitable for creating virtual environments, and Python on Windows is provided by a traditional installer following the usual idioms for Windows end users. (To install uv on Windows I would be expected to use the PowerShell equivalent of a curl | sh trick; many people trying to learn to use Python on Windows have to be taught what cmd.exe is, never mind PowerShell.) If anything, new Python-on-Windows users are getting tripped up by the moving target of attempts to make it even easier (in part because of things Microsoft messed up when trying to coordinate with the CPython team; see for example https://stackoverflow.com/questions/58754860/cmd-opens-windo... when it originally happened in Python 3.7).
> If I had a dollar for every time I've helped somebody untangle the mess of python environment libraries created by an undocumented mix of python delivered through the distributions package management versus native pip versus manually installed...
Sure, but that has everything to do with not understanding (or caring about) virtual environments (which are fundamental, and used by uv under the hood because there is really no viable alternative), and nothing to do with getting Python in the first place. I also don't know what you mean about "native pip" here; it seems like you're conflating the Python installation process with the package installation process.
Linux systems commonly already provide an outdated system Python you don’t want to use, and it can’t be used to create a venv of a version you want to use. A single Python version for the entire system fundamentally doesn’t work for many people thanks to shitty compat story in the vast ecosystem.
Even languages with great compat story are moving to support multi-toolchains natively. For instance, go 1.22 on Ubuntu 24.04 LTS is outdated, but it will automatically download the 1.25 toolchain when it seems go 1.25.0 in go.mod.
> Linux systems commonly already provide an outdated system Python you don’t want to use
They can be a bit long in the tooth, yes, but from past experience another Python version I don't want to use is anything ending in .0, so I can cope with them being a little older.
That's in quite a bit of contrast to something like Go, where I will happily update on the day a new version comes out. Some care is still needed - they allow security changes particularly to be breaking, but at least those tend to be deliberate changes.
> Linux systems commonly already provide an outdated system Python you don’t want to use
Even with LTS Ubuntu updated only at EOL, Python will not be EOL most of the time.
> A single Python version for the entire system fundamentally doesn’t work for many people thanks to shitty compat story in the vast ecosystem.
My experience has been radically different. Everyone is trying their hardest to provide wheels for a wide range of platforms, and all the most popular projects succeed. Try adding `--only-binary=:all:` to your pip invocations and let me know the next time that actually causes a failure.
Besides which, I was very specifically talking about the user story for people who are just learning to program and will use Python for it. Because otherwise this problem is trivially solved by anyone competent. In particular, building and installing Python from source is just the standard configure / make / make install dance, and it Just Works. I have done it many times and never needed any help to figure it out even though it was the first thing I tried to build from C source after switching to Linux.
For much of the ML/scientific ecosystem, you're lucky to get all your deps working with the latest minor version of Python six months to a year after its release. Random ML projects with hundreds to thousands of stars on GitHub may only work with a specific, rather ancient version of Python.
> Because otherwise this problem is trivially solved by anyone competent. In particular, building and installing Python from source is just the standard configure / make / make install dance, and it Just Works. I have done it many times and never needed any help to figure it out even though it was the first thing I tried to build from C source after switching to Linux.
I compiled the latest GCC many times with the standard configure / make / make install dance when I just started learning *nix command line. I even compiled gmp, mpfr, etc. many times. It Just Works. Do you compile your GCC every time before you compile your Python? Why not? It Just Works.
Time. CPython compiles in a few minutes on an underpowered laptop. I don't recall last time I compiled GCC, but I had to compile LLVM and Clang recently, and it took significantly longer than "a few minutes" on a high-end desktop.
Why not just use a Python container rather than rely on having the latest binary installed on the system? Then venv inside the container. That would get you the “venv of a version” that you are referring to
Our firm uses python extensively and the virtual environment for every script or script is ... difficult. We have dozens of python scripts running for team research and in production, from small maintenance tools to rather complex daemons. Add to that the hundreds of Jupyter notebooks used by various people. Some have a handful of dependencies, some dozens of dependencies. While most of those scripts/notebooks are only used by a handful of people, many are used company-wide.
Further, we have a rather largish set of internal libraries most of our python programs rely on. And some of those rely on external 3rd party API's (often REST). When we find a bug or something changes, more often than not, we want to roll out the changed internal lib so that all programs that use it get the fix. Having to get everyone to rebuild and/or redeploy everything is a non-starter as many of the people involved are not primarily software developers.
We usually install into the system dirs and have a dependency problem maybe once a year. And it's usually trivially resolved (the biggest problem was with some google libs which had internally inconsistent dependencies at one point).
I can understand encouraging the use of virtual environments, but this movement towards requiring them ignores what, I think, is a very common use case. In short, no one way is suitable for everyone.
It's more complex and heavier than using uv. I see docker/vm/vagrant/etc as something as something I reach for when the environment I want is too big, too fancy or too nondeterministic to manually set up locally; but the entire point is that "plain Python with some dependencies" really shouldn't qualify as any of these (just like build environment for a random Rust library).
Also, what do you do when you want your to locally test your codebase across many Python versions? Do you keep track of several different containers? If you start writing some tool to wrap that, you're back at square one.
'we can't ship the Python version you want for your OS so we'll ship the whole OS' is a solution, but the 'we can't' part was embarrassing in 2015 already.
So basically, it avoids the whole chicken-and-egg problem. With UV you've simply always got "UV -> project Python 1.23 -> project". UV is your dependency manager, and your Python is just another dependency.
With other dependency managers you end up with "system Python 3.45 -> dep manager -> project Python 1.23 -> project". Or worse, "system Python 1.23 -> dep manager -> project Python 1.23 -> project". And of course there will be people who read about the problem and install their own Python manager, so they end up with a "system Python -> virtualenv Python -> poetry Python -> project" stack. Or the other way around, and they'll end up installing their project dependencies globally...
Sorry, but that is simply incorrect, on many levels.
Virtual environments are the fundamental way of setting up a Python project, whether or not you use uv, which creates and manages them for you. And these virtual environments can freely either use or not use the system environment, whether or not you use uv to create them. It's literally a single-line difference in the `pyvenv.cfg` file, which is a standard required part of the environment (see https://peps.python.org/pep-0405/), created whether or not you use uv.
Most of the time you don't need a different Python version from the system one. When you do, uv can install one for you, but it doesn't change what your dependency chain actually is.
Python-native tools like Poetry, Hatch etc. also work by managing standards-defined virtual environments (which can be created using the standard library, and you don't even have to bootstrap pip into them if you don't want to) in fundamentally the same way that uv does. Some of them can even grab Python builds for you the same way that uv does (of course, uv doesn't need a "system Python" to exist first). "system Python -> virtualenv Python -> poetry Python -> project" is complete nonsense. The "virtualenv Python" is the system Python — either a symlink or a stub executable that launches that Python — and the project will be installed into that virtual environment. A tool like Poetry might use the system Python directly, or it might install into its own separate virtual environment; but either way it doesn't cause any actual complication.
Anyone who "ends up installing their project dependencies globally" has simply not read and understood Contemporary Python Development 101. In fact, anyone doing this on a reasonably new Linux has gone far out of the way to avoid learning that, by forcefully bypassing multiple warnings (such as described in https://peps.python.org/pep-0668/).
No matter what your tooling, the only sensible "stack" to end up with, for almost any project, is: base Python (usually the system Python but may be a separately installed Python) -> virtual environment (into which both the project and its dependencies are installed). The base Python provides the standard library; often there will be no third-party libraries, and even if there are they will usually be cut off intentionally. (If your Linux comes with pre-installed third-party libraries, they exist primarily to service tools that are part of your Linux distribution; you may be able to use them for some useful local hacking, but they are not appropriate for serious, publishable development.)
Your tooling sits parallel to, and isolated from, that as long as it is literally anything other than pip — and even with pip you can have that isolation (it's flawed but it works for common cases; see for example https://zahlman.github.io/posts/2025/02/28/python-packaging-... for how I set it up using a vendored copy of pip provided by Pipx), and have been able to for three years now.
> Most of the time you don't need a different Python version from the system one.
Except for literally anytime you’re collaborating with anyone, ever? I can’t even begin to imagine working on a project where folks just use whatever python version their OS happens to ship with. Do you also just ship the latest version of whatever container because most of the time nothing has changed?
> has simply not read and understood Contemporary Python Development 101.
They haven't. At the end of the day, they just want their program to work. You and I can design a utopian packaging system, but the physics PhD with a hand-me-down windows laptop and access to her university's Linux research cluster don't care about python other than it has a PITA library situation that UV addresses.
You misunderstand. The physicists are developing their own software to analyze their experimental data. They typically have little software development experience, but there is seldom someone more knowledgeable available to support them. Making matters worse, they often are not at all interested in software development and thus also don't invest the time to learn more than the absolute minimum necessary to solve their current problem, even if it could save them a lot of time in the long run. (Even though I find the situation frustration, I can't say I don't relate, given that I feel the same way about LaTeX.)
Conda has slowly but surely gone down the drain as well. It used to be bullet proof but there too you now get absolutely unsolvable circular dependencies.
Good question, I can't backtrack right now but it was apmplanner that I had to compile from source, and it contains some python that gets executed during the build process (I haven't seen it try to run it during normal execution yet).
Probably either one of python-serial python-pexpect judging by the file dates, and neither of these are so exciting that there should have been any version conflicts at all.
And the only reason I had to rebuild it at all was due to another version conflict in the apm distribution that expects a particular version of pixbuf to be present on the system and all hell breaks loose if it isn't, and you can't install that version on a modern system because that breaks other packages.
It is insane how bad all this package management crap is. The GNU project and the linux kernel are the only ones that have never given me any trouble.
They're not applications developers, but they need to write code. That's the whole point. Python is popular within academia because it replaces R/Excel/VB.Net, not Java/C++.
If I want to install Python on Windows and start using pip, I grab an installer from python.org and follow a wizard. On Linux, I almost certainly already have it anyway.
If I want to bootstrap from uv on Windows, the simplest option offered involves Powershell.
Either way, I can write quite a bit with just the standard library before I have to understand what uv really is (or what pip is). At that point, yes, the pip UX is quite a bit messier. But I already have Python, and pip itself was also trivially installable (e.g. via the standard library `ensurepip`, or from a Linux system package manager — yes, still using the command line, but this hypothetical is conditioned on being a Linux user).
Not many normal people want to install python. Instead, author of the software they are trying to use wants them to install python. So they follow readme, download windows installer as you say, pip this pipx, pipx that conda, conda this requirements.txt, and five minutes later they have magic error telling that tensorflow version they are installing is not compatible with pytorch version they are installing or some such.
The aftertaste python leaves is lasting-disgusting.
Scenarios like that occur daily. I do quite a bit of software development and whenever I come across something that really needs python I mentally prepare for a day of battle with the various (all subtly broken) package managers, dependency hell and circular nonsense to the point that I am also ready to give up on it after a day of trying.
Just recently: a build of a piece of software that itself wasn't written in python but that urgently needed a very particular version of it with a whole bunch of dependencies that refused to play nice with Anaconda for some reason (which in spite of the fact that it too is becoming less reliable is probably still the better one). The solution? Temporarily move andaconda to a backup directory, remove the venv activation code from .bashrc and compile the project, then restore everything to the way it was before (which I need it to be because I have some other stuff on the stove that is built using python because there isn't anything else).
And let's not go into bluetooth device support in python, anything involving networking that is a little bit off the beaten path and so on.
Traditional Windows install didn’t include things Microsoft doesn’t make. But, any PC distributor could always include Python as part of their base Windows install with all the other stuff that bloats the typical third party Windows installs. They don’t which indicates the market doesn’t want it. Your indictment of the lack of Python out of the box is less on Windows than on the “distro” served by PC manufacturers
I don't think this makes a meaningful difference. The installation is a `curl | sh`, which downloads a tarball, which gets extracted to some directory in $PATH.
It currently includes two executables, but having it contain two executables and a bunch of .so libraries would be a fairly trivial change. It only gets messy when you want it to make use of system-provided versions of the libraries, rather than simply vendoring them all yourself.
It gets mess not just in that way but also someone can have a weird LD_LIBRARY_PATH that starts to have problems. Statically linking drastically simplifies distribution and you’ve had to have distributed 0 software to end users to believe otherwise. The only platform this isn’t the case for is Apple because they natively supported app bundles. I don’t know if flat pack solves the distribution problem because I’ve not seen a whole lot of it in the ecosystem - most people seem to generally still rely on the system package manager and commercial entities don’t seem to really target flat pack.
When you're shipping software, you have full control over LD_LIBRARY_PATH. Your entry point can be e.g. a shell script that sets it.
There is not so much difference between shipping a statically linked binary, and a dynamically linked binary that brings its own shared object files.
But if they are equivalent, static linking has the benefit of simplicity: Why create and ship N files that load each other in fancy ways, when you can do 1 that doesn't have this complexity?
That’s precisely my point. It’s insanely weird to have a shell script to setup the path for an executable binary that can’t do it for itself. I guess you could go the RPATH route but boy have I only experienced pain from that.
By definition greenfield projects literally means free from constraints.
So the answer is in your question:
Why did it take a team unbound by constraints to try something new, as compared to a project with millions of existing stakeholders?
Single vision. Smaller team. What they landed on is a hit (no guarantee of that in advance!)
Conversely, with so many stakeholders, getting everyone to rally around a change (in advance) is hard.
In my experience this is about human nature/organisation and spans all types of organisations, not just python or open source etc.
It also looks like python would have got there, given the foundations put in place as noted in the article.
> the conversation here keeps collapsing back to "Rust rewrite good/bad." That feels like cargo-culting the toolchain instead of asking the uncomfortable question: why did it take a greenfield project to give Python the package manager behavior people clearly wanted for the last decade?
I think there's a few things going on here:
- If you're going have a project that's obsessed with speed, you might as well use rust/c/c++/zig/etc to develop the project, otherwise you're always going to have python and the python ecosystem as a speed bottleneck. rust/c/c++/zig ecosystems generally care a lot about speed, so you can use a library and know that it's probably going to be fast.
- For example, the entire python ecosystem generally does not put much emphasis on startup time. I know there's been some recent work here on the interpreter itself, but even modules in the standard library will pre-compile regular expressions at import time, even if they're never used, like the "email" module.
- Because the python ecosystem doesn't generally optimize for speed (especially startup), the slowdowns end up being contagious. If you import a library that doesn't care about startup time, why should your library care about startup time? The same could maybe be said for memory usage.
- The bootstrapping problem is also mostly solved by using a complied language like c/rust/go. If the package manager is written in python (or even node/javascript), you first have to have python+dependencies installed before you can install python and your dependencies. With uv, you copy/install a single binary file which can then install python + dependencies and automatically do the right thing.
- I think it's possible to write a pretty fast implementation using python, but you'd need to "greenfield" it by rewriting all of the dependencies yourself so you can optimize startup time and bootstrapping.
- Also, as the article mentions there are _some_ improvements that have happened in the standards/PEPs that should eventually make they're way into pip, though it probably won't be quite the gamechanger that uv is.
> the entire python ecosystem generally does not put much emphasis on startup time.
You'd think PyPy would be more popular, then.
> even modules in the standard library will pre-compile regular expressions at import time, even if they're never used, like the "email" module.
Hmm, that is slower than I realized (although still just a fraction of typical module import time):
$ python -m timeit --setup 'import re' 're.compile("foo.*bar"); re.purge()'
10000 loops, best of 5: 26.5 usec per loop
$ python -m timeit --setup 'import sys' 'import re; del sys.modules["re"]'
500 loops, best of 5: 428 usec per loop
I agree the email module is atrocious in general, which specifically matters because it's used by pip for parsing "compiled" metadata (PKG-INFO in sdists, when present, and METADATA in wheels). The format is intended to look like email headers and be parseable that way; but the RFC mandates all kinds of things that are irrelevant to package metadata, and despite the streaming interface it's hard to actually parse only the things you really need to know.
> Because the python ecosystem doesn't generally optimize for speed (especially startup), the slowdowns end up being contagious. If you import a library that doesn't care about startup time, why should your library care about startup time? The same could maybe be said for memory usage.
I'm trying to fight this, by raising awareness and by choosing my dependencies carefully.
> you first have to have python+dependencies installed before you can install python and your dependencies
It's unusual that you actually need to install Python again after initially having "python+dependencies installed". And pip vendors all its own dependencies except for what's in the standard library. (Which is highly relevant to Debian getting away with the repackaging that it does.)
> I think it's possible to write a pretty fast implementation using python, but you'd need to "greenfield" it by rewriting all of the dependencies yourself so you can optimize startup time and bootstrapping.
This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)
> there are _some_ improvements that have happened in the standards/PEPs that should eventually make they're way into pip
Most of them already have, along with other changes. The 2025 pip experience is, believe it or not, much better than the ~2018 pip experience, notwithstanding higher expectations for ecosystem complexity.
PyPy is hamstrung by a limited (previously, a lack of) compatibility with compiled Python modules. If it had been a drop-in replacement for the equivalent Python versions, then it'd probably have been much more popular
> I agree the email module is atrocious in general
Hah. Yes sounds like we are very much on the same page here. Python stdlib could really use a simple generic email/http header parser.
> It's unusual that you actually need to install Python again after initially having "python+dependencies installed".
I’m thinking about 3rd party installers like poetry, pip-tools, pdm, etc, where your installer needs python+dependencies installed before it can start installing.
> “write a pretty fast implementation using python” This is my current main project btw. (No, I don't really care that uv already exists. I'll have to blog about why.)
Do you have anything public yet? I’m totally curious. I started doing this for flake8 and pip back in 2021/2022, but when ruff+uv came along I figured it wasn’t worth my time any more.
Note that the advantages of Rust are not just execution speed: it's also a good language for expressing one's thoughts, and thus makes it easier to find and unlock the algorithmic speedups that really increase speed.
But yeah. Python packaging has been dumb for decades and successive Python package managers recapitulated the same idiocies over and over. Anyone who had used both Python and a serious programming language knew it, the problem was getting anyone to do anything about it. I can't help thinking that maybe the main reason using Rust worked is that it forced anyone who wanted to contribute to it to experience what using a language with a non-awful package manager is like.
Cargo is not really good. The very much non-zero frequency of something with cargo not working for opaque reasons and then suddenly working again after "cargo clean", the "no, I invoke your binaries"-mentality (try running a benchmark without either ^C'ing out of bench to copy the binary name or parsing some internal JSON metadata) because "cargo build" is the only build system in the world which will never tell you what it built, the whole mess with features, default-features, no-default-features, of course bindgen/sys dependency conflicts, "I'll just use the wrong -L libpath for the bin crate but if I'm building tests I remember the ...64". cargo randomly deciding that it now has to rebuild everything or 50% of everything for reasons which are never to be known, builds being not reproducible, cargo just never cleaning garbage up and so on.
rustdoc has only slightly changed since the 2010s, it's still very hard to figure out generic/trait-oriented APIs, and it still only does API documentation in mostly the same basic 1:1 "list of items" style. Most projects end up with two totally disjointed sets of documentation, usually one somewhere on github pages and the rustdoc.
Rust is overall good language, don't get me wrong. But it and the ecosystem also has a ton of issues (and that's without even mentioning async), and most of these have been sticking around since basically 1.0.
(However, the rules around initialization are just stupid and unsafe is no good. Rust also tends to favor a very allocation-heavy style of writing code, because avoiding allocations tends to be possible but often annoying and difficult in unique-to-rust ways. For somewhat related reasons, trivial things are at times really hard in Rust for no discernible reason. As a concrete, simplistic but also real-world example, Vec::push is an incredibly pessimistic method, but if you want to get around it, you either have to initialize the whole Vec, which is a complete waste of cycles, or you yolo it with reserve+set_len, which is invalid Rust because you didn't properly use MaybeUninit for locations which are only ever written.)
I have empathy for anyone who was required to use cargo on a nfs mounted fs. The number of files and random IO cargo uses makes any large project unusable.
I had to stop telling people to stop syncing their cargo env around nfs so many times, but sometimes they have no choice.
> That feels like cargo-culting the toolchain [...]
Pun intended?
Jokes aside, what you describe is a common pattern. It's also why Google internally they used to get decent speedups from rewriting some old C++ project in Go for a while: the magic was mostly in the rewrite-with-hindsight.
If you put effort into it, you can also get there via an incremental refactoring of an existing system. But the rewrite is probably easier to find motivation for, I guess.
I can't find the quote for this, but I remember Python maintainers wanted package installing and management to be separate things. uv did the opposite, and instead it's more like npm.
I don't know the problem space and I'm sure that the language-agnostic algorithmic improvements are massive. But to me, there's just something about rust that promotes fast code. It's easy to avoid copies and pointer-chasing, for example. In python, you never have any idea when you're copying, when you're chasing a pointer, when you're allocating, and so on. (Or maybe you do, but I certainly don't.) You're so far from hardware that you start thinking more abstractly and not worrying about performance. For some things, that's probably perfect. But for writing fast code, it's not the right mindset.
The thing is that a lot of the bottlenecks in pip are entirely artificial, and a lot of the rest can't really be improved by rewriting in Rust per se, because they're already written in C (within the Python interpreter itself).
> That feels like cargo-culting the toolchain instead of asking the uncomfortable question: why did it take a greenfield project to give Python the package manager behavior people clearly wanted for the last decade?
This feels like a very unfair take to me. Uv didn’t happen in isolation, and wasn’t the first alternative to pip. It’s built on a lot of hard work by the community to put the standards in place, through the PEP process, that make it possible.
I suspect that the non-Rust improvements are vastly more important than you’re giving credit for. I think the go version would be 5x or 8x compared to the 10x, maybe closer. It’s not that the Rust parts are insignificant but the algorithmic changes eliminate huge bottlenecks.
It just has to do with values. If you value perf you aren't going to write it in Python. And if you value perf then everything else becomes a no brainer as well.
It's the same way in JS land. You can make a game in a few kilobytes, but most web pages are still many megabytes for what should have been no JS at all.
Poetry largely accomplished the same thing first with most of the speedups (except managing your python installations) and had the disadvantage of starting before the PEPs you mentioned were standardized.
Because it broke backwards compatibility? It's worth noting that setuptools is in a similar situation to pip, where any change has a high chance of breaking things (as can be seen by perusing the setuptools and pip bug trackers). PEP 517/518 removed the implementation-defined nature of the ecosystem (which had caused issues for at least a decade, see e.g. the failures of distutils2 and bento), instead replacing it with a system where users complain about which backend to use (which is at least an improvement on the previous situation)...
That's TensorRT-LLM in it's entirety at 1.2.0rc6 locked to run on Ubuntu or NixOS with full MPI and `nvshmem`, the DGX container Jensen's Desk edition (I know because I also rip apart and `autopatchelf` NGC containers for repackaging on Grace/SBSA).
It's... arduous. And the benefit is what exactly? A very mixed collection of maintainers have asserted that software behavior is monotonic along a single axis most of which they can't see and we ran a solver over those guesses?
I think the future is collections of wheels that have been through a process the consumer regards as credible.
> it's how much speed we "unlocked" just by finally treating Python packaging as a well-specified systems problem instead of a pile of historical accidents.
A lot of that, in turn, boils down to realizing that it could be fast, and then expecting that and caring enough about it.
> but with the same design decisions (PEP 517/518/621/658 focus, HTTP range tricks, aggressive wheel-first strategy, ignoring obviously defensive upper bounds, etc.), I strongly suspect we'd be debating a 1.3× vs 1.5× speedup instead of a 10× headline
I'm doing a project of this sort (although I'm hoping not to reinvent the wheel (heh) for the actual resolution algorithm). I fully expect that some things will be barely improved or even slower, but many things will be nearly as fast as with uv.
For example, installing from cache (the focus for the first round) mainly relies on tools in the standard library that are written in C and have to make system calls and interact with the filesystem; Rust can't do a whole lot to improve on that. On the other hand, a new project can improve by storing unpacked files in the cache (like uv) instead of just the artifact (I'm storing both; pip stores the artifact, but with a msgpack header) and hard-linking them instead of copying them (so that the system calls do less I/O). It can also improve by actually making the cached data accessible without a network call (pip's cache is an HTTP cache; contacting PyPI tells it what the original download URL is for the file it downloaded, which is then hashed to determine its path).
For another example, pre-compiling bytecode can be parallelized; there's even already code in the standard library for it. Pip hasn't been taking advantage of that all this time, but to my understanding it will soon feature its own logic (like uv does) to assign files to compile to worker processes. But Rust can't really help with the actual logic being parallelized, because that, too, is written purely in C (at least for CPython), within the interpreter.
> why did it take a greenfield project to give Python the package manager behavior people clearly wanted for the last decade?
(Zeroth, pip has been doing HTTP range tricks, or at least trying, for quite a while. And the exact point of PEP 658 is to obsolete them. It just doesn't really work for sdists with the current level of metadata expressive power, as in other PEPs like 440 and 508. Which is why we have more PEPs in the pipeline trying to fix that, like 725. And discussions and summaries like https://pypackaging-native.github.io/.)
First, you have to write the standards. People in the community expect interoperability. PEP 518 exists specifically so that people could start working on alternatives to Setuptools as a build backend, and PEP 517 exists so that such alternatives could have the option of providing just the build backend functionality. (But the people making things like Poetry and Hatch had grander ideas anyway.)
But also, consider the alternative: the only other viable way would have been for pip to totally rip apart established code paths and possibly break compatibility. And, well, if you used and talked about Python at any point between 2006 and 2020, you should have the first-hand experience required to complete that thought.
I think this post does a really good job of covering how multi-pronged performance is: it certainly doesn't hurt uv to be written in Rust, but it benefits immensely from a decade of thoughtful standardization efforts in Python that lifted the ecosystem away from needing `setup.py` on the hot path for most packages.
Someone once told me a benefit of staffing a project for Haskell was it made it easy to select for the types of programmers that went out of their way to become experts in Haskell.
Tapping the Rust community is a decent reason to do a project in Rust.
It's an interesting debate. The flip side of this coin is getting hires who are more interested in the language or approach than the problem space and tend to either burn out, actively dislike the work at hand, or create problems that don't exist in order to use the language to solve them.
With that said, Rust was a good language for this in my experience. Like any "interesting" thing, there was a moderate bit of language-nerd side quest thrown in, but overall, a good selection metric. I do think it's one of the best Rewrite it in X languages available today due to the availability of good developers with Rewrite in Rust project experience.
The Haskell commentary is curious to me. I've used Haskell professionally but never tried to hire for it. With that said, the other FP-heavy languages that were popular ~2010-2015 were absolutely horrible for this in my experience. I generally subscribe to a vague notion that "skill in a more esoteric programming language will usually indicate a combination of ability to learn/plasticity and interest in the trade," however, using this concept, I had really bad experiences hiring both Scala and Clojure engineers; there was _way_ too much academic interest in language concepts and way too little practical interest in doing work. YMMV :)
If you're doing something forgettable, what makes you think the workaday Java or Python programmer would find it innately motivating?
Alternately, if you have the sort of work or culture that taps into people's intrinsic motivation, why would that work worse with Haskell or Clojure programmers than anybody else?
People are interested in different things along different dimensions. The way somebody is motivated by what they're doing and the way somebody is motivated by how they're doing it really don't seem all that correlated to me.
> there was way too much academic interest in language concepts and way too little practical interest in doing work.
They are communicating something real, but perhaps misattributing the root cause.
The purely abstract ‘ideal’ form of software development is unconstrained by business requirements. In this abstraction, perfect software would be created to purely express an idea. Academia allows for this, and to a lesser extent some open source projects.
In the real world, the creation of software must always be subordinate to the goals of the business. The goals are the purpose, and the software is the means.
Languages that are academically interesting, unsurprisingly, attract a greater preponderance of academically minded individuals. Of these, only a percentage have the desire or ability to let go of the pure abstract, and instead focus on the business domain. So it inevitably creates a management challenge; not an insurmountable one, but a challenge.
Hence the simplified ‘these people won’t do the work!’.
Paul Graham said the same thing about Python 20 years ago [1], and back then it was true. But once a programming langauge hits mainstream, this ceases to be a good filter.
This is important. The benefit here isn't the language itself. It's the fact that you're pulling from an esoteric language. People should not overfit and feel that whichever language is achieving that effect today is special in this regard.
In "python paradox", 'knows python' is an indication that the developer is interested in something technically interesting but otherwise impractical. Hence, it's a 'paradox' that you end up practically better off by selecting for something impractical.
These days, Python is surely a practical choice, so doesn't really resemble the "interested in something technically interesting but impractical".
I'm my experience this is definitely where rust shined. The language wasn't really what made the project succeed so much as having relatively curious, meticulous, detail-oriented people on hand who were interested in solving hard problems.
Sometimes I thought our teams would be a terrible fit for more cookie-cutter applications where rapid development and deployment was the primary objective. We got into the weeds all the time (sometimes because of rust itself), but it happened to be important to do so.
Had we built those projects with JavaScript or Python I suspect the outcomes would have been worse for reasons apart from the language choice.
Rust is also a systems language. I am still wrapping my mind around why it is so popular for so many end projects when its main use case and goals were basically writing a browser a maybe OS drivers.
But that’s precisely why it is good for developer tools. And it turns out people who write systems code are really damn good at writing tools code.
As someone who cut my teeth on C and low level systems stuff I really ought to learn Rust one of these days but Python is just so damn nice for high level stuff and all my embedded projects still seem to require C so here I am, rustless.
I write scripts in rust as a replacement for bash. Its really quite good at it. Aside from perl, its the only scripting language that can directly make syscalls. Its got great libraries for: parsing, configuration management, and declarative CLIs built right into it.
Sure its a little more verbose than bash one-liners, but if you need any kind of error handling and recovery, its way more effective than bash and doesn't break when you switch platforms (i.e. mac/bsd utility incompatibilities with gnu utilities).
My only complaint would be that dealing with OsString is more difficult than necessary. Way to much of the stdlib encourages programmers to just do "non-utf8 paths don't exist" and panic/ignore when encountering one. (Not a malady exclusive to rust, but I wish they'd gotten it right)
Paths are hard because they usually look like printable text, but don't have to be text. POSIX filenames are octet strings not containing 0x2F or 0x00. They aren't required to contain any "printable" characters, or even be valid text in any particular encoding. Most of the Rust stdlib you're thinking of is for handling text strings, but paths aren't text strings. Python also has the same split between Pathlib paths & all other strings.
If python's painpoints don't bother you enough (or you are already comfortable with all the workarounds,) then I'm not sure Rust will do much for you.
What I like about Rust is ADTs, pattern matching, execution speed. The things that really give me confidence are error handling (right balance between "you can't accidentally ignore errors" of checked exceptions with easy escape hatches for when you want to YOLO,) and the rarity of "looks right, but is subtly wrong in dangerous ways" that I ran into a lot in dynamic languages and more footgun languages.
I rarely if ever encounter bugs that type checking would have fixed. Most common types of bugs for me are things like forgetting that two different code paths access a specific type of database record and when they do both need to do something special to keep data cohesive. Or things like concurrency. Or worst of all things like fragile subprocesses (ffmpeg does not like being controlled by a supervisor process). I think all in all I have encountered about a dozen bugs in Python that were due to wrong types over the past 17 years of writing code in this language. Maybe slightly more than that in JS. The reason I would switch is performance.
Same. I like the type hints -- they're nice reminders of what things are supposed to be -- but I've essentially ~never run into bugs caused by types, either. I've been coding professionally in Python for 10+ years at this point.
It just doesn't come up in the web and devtools development worlds. Either you're dealing with user input, which is completely untrusted and has to be validated anyways, or you're passing around known validated data.
The closest is maybe ETL pipelines, but type checking can't help there either since your entire goal is to wrestle with horrors.
> having relatively curious, meticulous, detail-oriented people on hand who were interested in solving hard problems.... Had we built those projects with JavaScript or Python I suspect the outcomes would have been worse for reasons apart from the language choice.
I genuinely can't understand why you suppose that has to do with the implementation language at all.
Different programming languages come with different schools of thought about programming and different communities of practice around programming.
If you take a group of people who are squarely in the enterprise Java school of thought and have them write Rust, the language won't make much of a difference. They will eventually be influenced by the broader Rust community and the Rust philosophy towards programming, but, unless they're already interested in changed approaches, this will be a small, gradual difference. So you'll end up with Enterprise Java™ code, just in Rust.
But if you hire from the Rust community, you will get people who have a fundamentally different set of practices and expectations around programming. They will not only have a stronger grasp of Rust and Rust idioms but will also have explicit knowledge based on Rust (eg Rust-flavored design patterns and programming techniques) and, crucially, tacit knowledge based on Rust (Rust-flavored ways of programming that don't break down into easy-to-explain rules). And, roughly speaking, the same is going to be true for whatever other language you substitute for "Rust".
(I say roughly because there doesn't have to be a 1:1 relationship between programming languages, schools of thought and communities of practice. A single language can have totally different communities—just compare web Python vs data scientist Python—and some communities/schools can span multiple languages. But, as an over-simplified model, seeing a language as a community is not the worst starting point.)
> I genuinely can't understand why you suppose that has to do with the implementation language at all.
Languages that attract novice programmers (JS is an obvious one; PHP was one 20 years ago) have a higher noise to signal ratio than one that attracts intermediate and above programmers.
If you grabbed an average Assembly programmer today, and an average JavaScript programmer today, who do you think is more careful about programming? The one who needs to learn arcane shit to do basic things and then has to compile it in order to test it out, or the one who can open up Chrome's console and console.log("i love boobies")
How many embedded systems programmers suck vs full stack devs? I'm not saying full stack devs are inferior. I'm saying that more inferior coders are attracted to the latter because the barriers to entry are SO much easier to bypass.
npm isn't the issue there it's the ts\js community and their desire to use a library for everything. in communities that do not consider dependencies to be a risk you will find this showing up in time.
The node supply chain attacks are also not unique to node community. you see them happening on crates.io and many other places. In fact the build time scripts that cause issues on node modules are probably worse off with the flexibility of crate build scripts and that they're going to be harder to work around than in npm.
That argument is FUD. The people who created the NPM package manager are not the people who wrote your dependencies. Further, supply chain attacks occur for reasons that are entirely outside NPM's control. Fundamentally they're a matter of trust in the ecosystem — in the very idea of installing the packages in the first place.
Lack of stronger trust controls are part of the larger issue with npm. Pip, Maven and Go are not immune either but they do things structurally better to shift the problem.
Go: Enforces global, append-only integrity via a checksum database and version immutability; once a module version exists, its contents cannot be silently altered without detection, shifting attacks away from artifact substitution toward “publish a malicious new version” or bypass the proxy/sumdb.
Maven: Requires structured namespace ownership and signed artifacts, making identity more explicit at publish time; this raises the bar for casual impersonation but still fundamentally trusts that the key holder and build pipeline were not compromised.
I think a lot of rust rewrites have this benefit; if you start with hindsight you can do better more easily. Of course, rust is also often beneficial for its own sake, so it's a one-two punch:)
In Python's case, as the article describes quite clearly, the issue is that the design of "working software" (particularly setup.py) was bad to the point of insane (in much the same way as the NPM characteristics that enabled the recent Shai Hulud supply chain attacks, but even worse). At some point, compatibility with insanity has got to go.
Helpfully, though, uv retains compatibility with newer (but still well-established) standards in the Python community that don't share this insanity!
I mean, you’re on the right track in that they did cut out other insanity. But unclear how much of the speed up is necessarily tied to breaking backward compat (are there a lot of “.egg” files in the wild?)
Not as far as I can tell, except perhaps in extended-support legacy environments (for example, ActiveState is still maintaining a Python 2.x distribution).
> uv is fast because of what it doesn’t do, not because of what language it’s written in. The standards work of PEP 518, 517, 621, and 658 made fast package management possible. Dropping eggs, pip.conf, and permissive parsing made it achievable. Rust makes it a bit faster still.
Isn't assigning out what all made things fast presumptive without benchmarks? Yes, I imagine a lot is gained by the work of those PEPs. I'm more questioning how much weight is put on dropping of compatibility compared to the other items. There is also no coverage for decisions influenced by language choice which likely influences "Optimizations that don’t need Rust".
This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
> Isn't assigning out what all made things fast presumptive without benchmarks?
We also have the benchmark of "pip now vs. pip years ago". That has to be controlled for pip version and Python version, but the former hasn't seen a lot of changes that are relevant for most cases, as far as I can tell.
> This also doesn't cover subtle things. Unsure if rkyv is being used to reduce the number of times that TOML is parsed but TOML parse times do show up in benchmarks in Cargo and Cargo/uv's TOML parser is much faster than Python's (note: Cargo team member, `toml` maintainer). I wish the TOML comparison page was still up and showed actual numbers to be able to point to.
This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML. A package installer really only needs to look at it at all when building from source, and part of what these standards have done for us is improve wheel coverage. (Other relevant PEPs here include 600 and its predecessors.) Although that has also largely been driven by education within the community, things like e.g. https://blog.ganssle.io/articles/2021/10/setup-py-deprecated... and https://pradyunsg.me/blog/2022/12/31/wheels-are-faster-pure-... .
> This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML.
I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).
But once you have a lock file there is no resolution needed, is there? It lists all needed libs and their versions. Given how toml is written, I imagine you can read it incrementally - once a lib section is parsed, you can download it in parallel, even if you didn't parse the whole file yet.
(not sure how uv does it, just guessing what can be done)
I'm not sure if it even makes sense for a TOML file to be "read incrementally", because of the weird feature of TOML (inherited from INI conventions) that allow tables to be defined in a piecemeal, out-of-order fashion. Here's an example that the TOML spec calls "valid, but discouraged":
[fruit.apple]
[animal]
[fruit.orange]
So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".
To be fair, the whole post isn't very good IMO, regardless of ChatGPT involvement, and it's weird how some people seem to treat it like some kind of revelation.
I mean, of course it wasn't specifically Rust that made it fast, it's really a banal statement: you need only very moderate serious programming experience to know, that rewriting legacy system from scratch can make it faster even if you rewrite it in a "slower" language. There have been C++ systems that became faster when rewritten in Python, for god's sake. That's what makes system a "legacy" system: it does a ton of things and nobody really knows what it does anymore.
But when listing things that made uv faster it really mentions some silly things, among others. Like, it doesn't parse pip.conf. Right, sure, the secret of uv's speed lies in not-parsing other package manager's config files. Great.
So all in all, yes, no doubt that hundreds of little things contributed into making uv faster, but listing a few dozens of them (surely a non-exhaustive lists) doesn't really enable you to make any conclusions about the relative importance of different improvements whatsoever. I suppose the mentioned talk[0] (even though it's more than a year old now) would serve as a better technical report.
The content is nice and insightful! But God I wish people stopped using LLMs to 'improve' their prose... Ironically, some day we might employ LLMs to re-humanize texts that had been already massacred.
I definitely found the thesis insightful. The actual content stopped feeling insightful to me in the “What uv drops” section, where cut features were all listed as if they had equal weight, all in the same breathless LLM style
The author’ blog was on HN a few days ago as well for an article on SBOMs and Lockfiles. They’ve done a lot of work in the supply-chain security side and are clearly knowledgeable, and yet the blog post got similarly “fuzzified” by the LLM.
> PEP 658 went live on PyPI in May 2023. uv launched in February 2024. uv could be fast because the ecosystem finally had the infrastructure to support it. A tool like uv couldn’t have shipped in 2020. The standards weren’t there yet.
In 2020 you could still have a whole bunch of performance wins before the PEP 658 optimization. There's also the "HTTP range requests" optimization which is the next best thing. (and the uv tool itself is really good with "uv run" and "uv python".)
> What uv drops: Virtual environments required. pip lets you install into system Python by default. uv inverts this, refusing to touch system Python without explicit flags. This removes a whole category of permission checks and safety code.
pip also refuses to touch system Python without explicit flags?
For uv, there are flags that allow it, so it doesn't really "removes a whole category of permission checks and safety code"? uv has "permission checks and safety code" to check if it's system python? I don't think uv has "dropped" anything here.
> Optimizations that don’t need Rust: Python-free resolution. pip needs Python running to do anything.
This seems to me to be implying that python is inherently slow, so yes, this optimization requires a faster language? Or maybe I don't get the full point.
> Where Rust actually matters: No interpreter startup. ... uv is a single static binary with no runtime to initialize.
I used to rely on this, and still mostly do - but you’d be surprised how quickly this has entered the normal vernacular! I hear people using it in conversation unprompted all the time.
To me, unless it is egregious, I would be very sensitive to avoid false positives before saying something is LLM aided. If it is clearly just slop, then okay, but I definitely think there is going to be a point where people claim well-written, straightforward posts as LLM aided. (Or even the opposite, which already happens, where people purposely put errors in prose to seem genuine).
there is going to be a point where people have read so much slop that they will start regurgitating the same style without even realising it. or we could already be at that point
I have reached a point where any AI smell (of which this articles has many) makes me want to exit immediately. It feels tortuous to my reading sensibilities.
I blame fixed AI system prompts - they forcibly collapse all inputs into the same output space. Truly disappointing that OpenAI et all have no desire to change this before everything on the internet sounds the same forever.
You're probably right about the latter point, but I do wonder how hard it'd be to mask the default "marketing copywriter" tone of the LLM by asking it to assume some other tone in your prompt.
As you said, reading this stuff is taxing. What's more, this is a daily occurrence by now. If there's a silver lining, it's that the LLM smells are so obvious at the moment; I can close the tab as soon as I notice one.
> do wonder how hard it'd be to mask the default "marketing copywriter" tone of the LLM by asking it to assume some other tone in your prompt.
Fairly easy, in my wife's experience. She repeatedly got accused of using chatgpt in her original writing (she's not a native english speaker, and was taught to use many of the same idioms that LLMs use) until she started actually using chatgpt with about two pages of instructions for tone to "humanize" her writing. The irony is staggering.
It’s pretty easy. I’ve written a fairly detailed guide to help Claude write in my tone of voice. It also coaxes it to avoid the obvious AI tells such as ‘It’s not X it’s Y’ sentences, American English and overuse of emojis and em dashes.
It’s really useful for taking my first drafts and cleaning them up ready for a final polish.
https://ember.dev ’s deeper pages (not the blog, but the “resumelike” project pages) was written by claude with guidance and a substantial corpus of my own writing and i still couldn’t squash out all the GPTisms in the generation passes. probably net waste of time, for me, for writing.
It’s definitely partially solved by extensive custom prompting, as evidenced by sibling comments. But that’s a lot of effort for normal users and not a panacea either. I’d rather AI companies introduce noise/randomness themselves to solve this at scale.
The problem isn’t the surface tics—em dashes, short exclamatory sentences, lists of three, “Not X: Y!”.
Those are symptoms of the deep, statistically-built tissue of LLM “understanding” of “how to write a technical blog post”.
If you randomize the surface choices you’re effectively running into the same problem Data did on Star Trek: The Next Generation when he tried to get the computer to give him a novel Sherlock Holmes mystery on the holodeck. The computer created a nonsense mishmash of characters, scenes, and plot points from stories in its data bank.
Good writing uses a common box of metaphorical & rhetorical tools in novel ways to communicate novel ideas. By design, LLMs are trying to avoid true (unpredictable) novelty! Thus they’ll inevitably use these tools to do the reverse of what an author should be attempting.
> Ironically, some day we might employ LLMs to re-humanize texts
I heard high school and college students are doing this routinely so their papers don't get flagged as AI
this is whether they used an LLM for the whole assignment or wrote it themselves, has to get pass through a "re-humanizing" LLM either way just to avoid drama
> Zero-copy deserialization. uv uses rkyv to deserialize cached data without copying it. The data format is the in-memory format. This is a Rust-specific technique.
This (zero-copy deserialization) is not a rust-specific technique, so I'm not entirely sure why the author describes it as one. Any good low level language (C/C++ included) can do this from my experience.
I think the framing in the post is that it's specific to Rust, relative to what Python packaging tools are otherwise written in (Python). It's not very easy to do zero-copy deserialization in pure Python, from experience.
(But also, I think Rust can fairly claim that it's made zero-copy deserialization a lot easier and safer.)
I suppose it can fairly claim that now every other library and blog post invokes "zero-copy" this and that, even in the most nonsensical scenarios. It's a technique for when you can literally not afford the memory bandwidth, because you are trying to saturate a 100Gbps NIC or handling 8k 60Hz video, not for compromising your data serialization schemes portability for marketing purposes while all applications hit the network first, disk second and memory bandwidth never.
You’ve got this backward. The vast majority of time due to spatial and temporal locality, in practice for any application you’re actually usually doing CPU registers first, cache second, memory third, disk fourth, network cache fifth, and network origin sixth. So this stuff does actually matter for performance.
Also, aside from memory bandwidth, there’s a latency cost inherent in traversing object graphs - 0 copy techniques ensure you traverse that graph minimally, just what’s needed to actually be accessed which is huge when you scale up. There’s a difference between one network request and fetching 1 MB vs making 100 requests to fetch 10kib and this difference also appears in memory access patterns unless they’re absorbed by your cache (not guaranteed for object graph traversal that a package manager would be doing).
Many of the hot paths in uv involve an entirely locally cached set of distributions that need to be loaded into memory, very lightly touched/filtered, and then sunk to disk somewhere else. In those contexts, there are measurable benefits to not transforming your representation.
(I'm agnostic on whether zero-copy "matters" in every single context. If there's no complexity cost, which is what Rust's abstractions often provide, then it doesn't really hurt.)
I can't even imagine what "safety" issue you have in mind. Given that "zero-copy" apparently means "in-memory" (a deserialized version of the data necessarily cannot be the same object as the original data), that's not even difficult to do with the Python standard library. For example, `zipfile.ZipFile` has a convenience method to write to file, but writing to in-memory data is as easy as
with zipfile.ZipFile(archive_name) as a:
with a.open(file_name) as f, io.BytesIO() as b:
b.write(f.read())
return b.getvalue()
(That does, of course, copy data around within memory, but.)
> Given that "zero-copy" apparently means "in-memory" (a deserialized version of the data necessarily cannot be the same object as the original data), that's not even difficult to do with the Python standard library
This is not what zero-copy means. Here's a working definition[1].
Specifically, it's not just about keeping things in memory; copying in memory is normal. The goal is to not make copies (or more precisely, what Rust would call "clones"), but to instead convey the original representation/views of that representation through the program's lifecycle where feasible.
> a deserialized version of the data necessarily cannot be the same object as the original data
rust-asn1 would be an example of a Rust library that doesn't make any copies of data unless you explicitly ask it to. When you load e.g. a Utf8String[2] in rust-asn1, you get a view into the original input buffer, not an intermediate owning object created from that buffer.
> (That does, of course, copy data around within memory, but.)
Yeah, so you'd have to pass around the `BytesIO` instead.
I know that zero-copy doesn't ordinarily mean what I described, but that seemed to be how TFA was using it, based on the logic in the rest of the sentence.
> Yeah, so you'd have to pass around the `BytesIO` instead.
That wouldn’t be zero-copy either: BytesIO is an I/O abstraction over a buffer, so it intentionally masks the “lifetime” of the original buffer. In effect, reading from the BytesIO creates new copies of the underlying data by design, in new `bytes` objects.
(This is actually a great capsule example of why zero-copy design is difficult in Python: the Pythonic thing to do is to make lots of bytes/string/rich objects as you parse, each of which owns its data, which in turn means copies everywhere.)
I’m just a casual observer of this thread, but I think you’d find it worthwhile to read up a bit on zero-copy stuff.
It’s ~impossible in Python (because you don’t control memory) and hard in C/similar (because of use-after-free).
Rust’s borrow checker makes it easier, but it’s still tricky (for non-trivial applications). You have to do all your transformations and data movements while only referencing the original data.
As a quick and kind of oversimplified example of what zero copy means, imagine you read the following json string from a file/the network/whatever:
json = '{"user":"nugget"}' // from somewhere
A simple way to extract json["user"] to a new variable would be to copy the bytes. In pythony/c pseudo code
let user = allocate_string(6 characters)
for i in range(0, 6)
user[i] = json["user"][i]
// user is now the string "nugget"
instead, a zero copy strategy would be to create a string pointer to the address of json offset by 9, and with a length of 6.
{"user":"nugget"}
^ ]end
The reason this can be tricky in C is that when you call free(json), since user is a pointer to the same string that was json, you have effectively done free(user) as well.
So if you use user after calling free(json), You have written a classic _memory safety_ bug called a "use after free" or UAF. Search around a bit for the insane number of use after free bugs there have been in popular software and the havoc they have wreaked.
In rust, when you create a variable referencing the memory of another (user pointing into json) it keeps track of that (as a "borrow", so that's what the borrow checker does if you have read about that) and won't compile if json is freed while you still have access to user. That's the main memory safety issue involved with zero-copy deserialization techniques.
> pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence.
pip is simply difficult to maintain. Backward compatibility concerns surely contribute to that but also there are other factors, like an older project having to satisfy the needs of modern times.
For example, my employer (Datadog) allowed me and two other engineers to improve various aspects of Python packaging for nearly an entire quarter. One of the items was to satisfy a few long-standing pip feature requests. I discovered that the cross-platform resolution feature I considered most important is basically incompatible [1] with the current code base. Maintainers would have to decide which path they prefer.
> pip is simply difficult to maintain. Backward compatibility concerns surely contribute to that but also there are other factors, like an older project having to satisfy the needs of modern times.
Backwards compatibility is the one thing that prevents the code in an older project from being replaced with a better approach in situ. It cannot be more difficult than a rewrite, except that rewrites (arguably including my project) may hold themselves free to skip hard legacy cases, at least initially (they might not be relevant by the time other code is ready).
(I would be interested in hearing from you about UX designs for cross-platform resolution, though. Are you just imagining passing command-line flags that describe the desired target environment? What's the use case exactly — just making a .pylock file? It's hard to imagine cross-platform installation....)
My favorite speed up trick: “ HTTP range requests for metadata. Wheel files are zip archives, and zip archives put their file listing at the end. uv tries PEP 658 metadata first, falls back to HTTP range requests for the zip central directory, then full wheel download, then building from source. Each step is slower and riskier. The design makes the fast path cover 99% of cases. None of this requires Rust.”
> Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive.
This is clearly LLM-generated and the other bullet points have the same smell. Please use your own words.
> Ignoring requires-python upper bounds. When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive.
Yes, but it's (probably) the least worse thing they can do given how the "PyPI" ecosystem behaves. As PyPI does not allow replacement of artefacts (sdists, wheels, and older formats), and because there is no way to update/correct metadata for the artefacts, unless the uploader knew at upload time of incompatibilities between their package and and the upper-bounded reference (whether that is the Python interpreter or a Python package), the upper bound does not reflect a known incompatibility. In addition, certain tools (e.g. poetry) added the upper bounds automatically, increasing the amount of spurious bounds. https://iscinumpy.dev/post/bound-version-constraints/ provides more details.
The general lesson from this is when you do not allow changes/replacement of invalid data (which is a legitimate thing to do), then you get stuck with handling the bad data in every system which uses it (and then you need to worry about different components handling the badness in different ways, see e.g. browsers).
No. When such upper bounds are respected, they contaminate other packages, because you have to add them yourself to be compatible with your dependencies. Then your dependents must add them too, etc. This brings only pain. Python 4 is not even a thing, core developers say there won't ever be a Python 4.h
> you have to add them yourself to be compatible with your dependencies
This is no more true for version upper bounds than it is for version lower bounds, assuming that package installers ensure all package version constraints are satisfied.
I presume you think version lower bounds should still be honoured?
> PEP 658 went live on PyPI in May 2023. uv launched in February 2024. The timing isn’t coincidental. uv could be fast because the ecosystem finally had the infrastructure to support it. A tool like uv couldn’t have shipped in 2020. The standards weren’t there yet.
How/why did the package maintainers start using all these improvements? Some of them sound like a bunch of work, and getting a package ecosystem to move is hard. Was there motivation to speed up installs across the ecosystem? If setup.py was working okay for folks, what incentivized them to start using pyproject.toml?
> If setup.py was working okay for folks, what incentivized them to start using pyproject.toml?
It wasn't working okay for many people, and many others haven't started using pyproject.toml.
For what I consider the most egregious example: Requests is one of the most popular libraries, under the PSF's official umbrella, which uses only Python code and thus doesn't even need to be "built" in a meaningful sense. It has a pyproject.toml file as of the last release. But that file isn't specifying the build setup following PEP 517/518/621 standards. That's supposed to appear in the next minor release, but they've only done patch releases this year and the relevant code is not at the head of the repo, even though it already caused problems for them this year. It's been more than a year and a half since the last minor release.
I should have mentioned one of the main reasons setup.py turns out not okay for people (aside from the general unpleasantness of running code to determine what should be, and mostly is, static metadata): in the legacy approach, Setuptools has to get `import`ed from the `setup.py` code before it can run, but running that code is the way to find out the dependencies. Including build-time dependencies. Specifically Setuptools itself. Good luck if the user's installed version is incompatible with what you've written.
I remain baffled about these posts getting excited about uv’s speed. I’d like to see a real poll but I personally can’t imagine people listing speed as one of the their top ten concerns about python package managers. What are the common use cases where the delay due to package installation is at all material?
At a previous job, I recall updating a dependency via poetry would take on the order of ~5-30m. God forbid after 30 minutes something didn’t resolve and you had to wait another 30 minutes to see if the change you made fixed the problem. Was not an enjoyable experience.
> updating a dependency via poetry would take on the order of ~5-30m. God forbid after 30 minutes something didn’t resolve and you had to wait another 30 minutes to see if the change you made fixed the problem
Working heavily in Python for the last 20 years, it absolutely was a big deal. `pip install` has been a significant percentage of the deploy time on pretty much every app I've ever deployed and I've spent countless hours setting up various caching techniques trying to speed it up.
I can run `uvx sometool` without fear because I know that it'll take a few seconds to create a venv, download all the dependencies, and run the tool. uv's speed has literally changed how I work with Python.
I generally tend to let the shell autocomplete, so I don't type it out every time, but I see your point. If I use a program more than once or twice, I install it.
`poetry install` on my dayjob’s monolith took about 2 minutes, `uv sync` takes a few seconds. Getting 2 minutes back on every CI job adds up to a lot of time saved
As a multi decade Python user, uv's speed is "life changing". It's a huge devx improvement. We lived with what came before, but now that I have it, I would never want to go back and it's really annoying to work on projects now that aren't using it.
Docker builds are a big one, at least at my company. Any tool that reduces wait time is worth using, and uv is an amazing tool that removes that wait time. I take it you might not use python much as it solves almost every pain point, and is fast which feels rare.
CI: I changed a pipeline at work from pip and pipx to uv, it saved 3 minutes on a 7 minute pipeline. Given how oversubscribed our runners are, anything saving time is a big help.
It is also really nice when working interactivly to have snappy tools that don't take you out of the flow more than absolutely more than necessary. But then I'm quite sensitive to this, I'm one of those people who turn off all GUI animations because they waste my time and make the system feel slow.
Probably 90% of ppl commenting here are focused on managing their own Python installs and mostly don’t care about speed. uv seems to be designed for enterprise, for IT management of company wide systems, and this post is, I’m guessing, a little promotional astroturfing. For most of us, uv solves a low priority problem.
It's not just about delays being "material"; waiting on the order of seconds for a venv creation (and knowing that this is because of pip bootstrapping itself, when it should just be able to install cross-environment instead of having to wait until 2022 for an ugly, limited hack to support that) is annoying.
I avoided Python for years, especially because of package and environment management. Python is now my go to for projects since discovering uv, PEP 723 metadata, and LLMs’ ability to write Python.
The speed is nice, but I switched because uv supports "pip compile" from pip-tools, and it is better at resolving dependencies. Also pip-tools uses (used?) internal pip methods and breaks frequently because of that, uv doesn't.
Setting up a new dev instance took 2+ hours with pip at my work. Switching to uv dropped the Python portion down to <1 minute, and the overall setup to 20 minutes.
A similar, but less drastic speedup applied to docker images.
Speed is one of the main reasons why I keep recommending uv to people I work with, and why I initially adopted it: Setting up a venv and installing requirements became so much faster. Replacing pipx and `uv run` for single-file scripts with external dependencies, were additional reasons. With nox adding uv support, it also became much easier and much faster to test across multiple versions of Python
One weird case where this mattered to me, I wanted pip to backtrack to find compatible versions of a set of deps, and it wasn't done after waiting a whole hour. uv did the same thing in 5 minutes. This might be kinda common because of how many Python repos out there don't have pinned versions in dependencies.txt.
For me speed was irrelevant however uv was the first Python project manger with tolerable ui that I encountered. I never before done any serious development in Python because I just refused dealing with venvs requirements.txt and whatever. When a script used a dependancy or another Python version I installed it system wide. uv is perfectly usable, borderline pleasent. But I'm sure the speed helps.
You can also use pixi[1] if you want conda with uv's solver, that does appears to be faster than the mamba solver. Though the main reasons I recommend pixi, are that it doesn't have a tendency to break random stuff due to polluting your environment by default, and that it does a much better job of making your environments reproducible, among another benefits
I like the implication that we can have an alternative to uv speed-wise, but I think reliability and understandability are more important in this context (so this comment is a bit off-topic).
What I want from a package manager is that it just works.
That's what I mostly like about uv.
Many of the changes that made speed possible were to reduce the complexity and thus the likelihood of things not working.
What I don't like about uv (or pip or many other package managers), is that the programmer isn't given a clear mental model of what's happening and thus how to fix the inevitable problems. Better (pubhub) error messages are good, but it's rare that they can provide specific fixes. So even if you get 99% speed, you end up with 1% perplexity and diagnostic black boxes.
To me the time that matters most is time to fix problems that arise.
> the programmer isn't given a clear mental model of what's happening and thus how to fix the inevitable problems.
This is a priority for PAPER; it's built on a lower-level API so that programmers can work within a clear mental model, and I will be trying my best to communicate well in error messages.
> No bytecode compilation by default. pip compiles .py files to .pyc during installation. uv skips this step, shaving time off every install. You can opt in if you want it.
Are we losing out on performance of the actual installed thing, then? (I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?)
No, because Python itself will generate bytecode for packages once you actually import them. uv just defers that to first-import time, but the cost is amortized in any setting where imports are performed over multiple executions.
That sounds like yes? Instead of doing it once at install time, it's done once at first use. It's only once so it's not persistently slower, but that is a perf hit.
My first cynical instinct is to say that this is uv making itself look better by deferring the costs to the application, but it's probably a good trade-off if any significant percentage of the files being compiled might not be used ever so the overall cost is lower if you defer to run time.
I think they are making the bet that most modules won't be imported. For example if I install scipy, numpy, Pillow or such: what are the chances that I use a subset of the modules vs literally all of them?
I would bet on a subset for pretty much any non-trivial package (i.e. larger than one or two user facing modules). And for those trivial packages? Well they are usually small, so the cost is small as well. I'm sure there are exceptions: maybe a single gargantuan module thst consists of autogenerated FFI bindings for some C library or such, but that is likely the minority.
> It's only once so it's not persistently slower, but that is a perf hit.
Sure, but you pay that hit either way. Real-world performance is always usage based: the assumption that uv makes is that people run (i.e. import) packages more often than they install them, so amortizing at the point of the import machinery is better for the mean user.
Which part? The assumption is that when you `$TOOL install $PACKAGE`, you run (i.e. import) `$PACKAGE` more than you re-install it. So there's no point in slowing down (relatively less common) installation events when you can pay the cost once on import.
(The key part being that 'less common' doesn't mean a non-trivial amount of time.)
Why would you want to slow down the more common thing instead of the less common thing? I'm not following that at all. That's why I asked if that's backwards.
Probably for any case where an actual human is doing it. On an image you obviously want to do it at bake time, so I feel default off with a flag would have been a better design decision for pip.
I just read the thread and use Python, I can't comment on the % speedup attributed to uv that comes from this optimization.
Images are a good example where doing it at install-time is probably the best yeah, since every run of the image starts 'fresh', losing the compilation which happened last time the image got started.
If it was a optional toggle it would probably become best practice to activate compilation in dockerfiles.
> On an image you obviously want to do it at bake time
It seems like tons of people are creating container images with an installer tool and having it do a bunch of installations, rather than creating the image with the relevant Python packages already in place. Hard to understand why.
For that matter, a pre-baked Python install could do much more interesting things to improve import times than just leaving a forest of `.pyc` files in `__pycache__` folders all over the place.
You can change it to compile the bytecode on install with a simple environment variable (which you should do when building docker containers if you want to sacrifice some disk space to decrease initial startup time for your app).
you are right. it depends on how often this first start is, if its bad or not..most usecases id guess (total guess, have limited exp with python projects professionally) its not an issue.
My Docker build generating the byte code saves it to the image, sharing the cost at build time across all image deployments — whereas, building at first execution means that each deployed image instance has to generate its own bytecode!
That’s a massive amplification, on the order of 10-100x.
“Well just tell it to generate bytecode!”
Sure — but when is the default supposed to be better?
Because this sounds like a massive footgun for a system where requests >> deploys >> builds. That is, every service I’ve written in Python for the last decade.
Yes, uv skipping this step is a one time significant hit to start up time. E.g. if you're building a Dockerfile I'd recommend setting `--compile-bytecode` / `UV_COMPILE_BYTECODE`
Historically the practice of producing pyc files on install started with system wide installed packages, I believe, when the user running the program might lack privileges to write them.
If the installer can write the .oy files it can also write the .pyc, while the user running them might not in that location.
This optimization hits serverless Python the worst. At Modal we ensure users of uv are setting UV_COMPILE_BYTECODE to avoid the cold start penalty. For large projects .pyc compilation can take hundreds of milliseconds.
> I'm not 100% clear on .pyc files TBH; I'm guessing they speed up start time?
They do.
> Are we losing out on performance of the actual installed thing, then?
When you consciously precompile Python source files, you can parallelize that process. When you `import` from a `.py` file, you only get that benefit if you somehow coincidentally were already set up for `multiprocessing` and happened to have your workers trying to `import` different files at the same time.
If you have a dependency graph large enough for this to be relevant, it almost certainly includes a large number of files which are never actually imported. At worst the hit to startup time will be equal to the install time saved, and in most cases it'll be a lot smaller.
> a large number of files which are never actually imported
Unfortunately, it typically doesn't work out as well as you might expect, especially given the expectation of putting `import` statements at the top of the file.
There's an interesting psychology at play here as well, if you are a programmer that chooses a "fast language" it's indicative of your priorities already, it's often not much the language, but that the programmer has decided to optimize for performance from the get go.
> When a package says it requires python<4.0, uv ignores the upper bound and only checks the lower. This reduces resolver backtracking dramatically since upper bounds are almost always wrong. Packages declare python<4.0 because they haven’t tested on Python 4, not because they’ll actually break. The constraint is defensive, not predictive.
This is kind of fascinating. I've never considered runtime upper bound requirements. I can think of compelling reasons for lower bounds (dropping version support) or exact runtime version requirements (each version works for exact, specific CPython versions). But now that I think about it, it seems like upper bounds solve a hypothetical problem that you'd never run into in practice.
If PSF announced v4 and declared a set of specific changes, I think this would be reasonable. In the 2/3 era it was definitely reasonable (even necessary). Today though, it doesn't actually save you any trouble.
I think the article is being careful not to say uv ignores _all_ upper bound checks, but specifically 4.0 upper bound checks. If a package says it requires python < 3.0, that's still super relevant, and I'd hope for uv to still notice and prevent you from trying to import code that won't work on python 3. Not sure what it actually does.
I read the article as saying it ignores all upper-bounds, and 4.0 is just an example. I could be wrong though - it seems ambiguous to me.
But if we accept that it currently ignores any upper-bounds checks greater than v3, that's interesting. Does that imply that once Python 4 is available, uv will slow down due to needing to actually run those checks?
That would deliver a blow to the integrity of the rest of that section because those sorts of upper bound constraints immediately reducible to “true” cannot cause backtracking of any kind.
Are there any plans to actually make a 4.0 ever? I remember hearing a few years ago that after the transition to 3.0, the core devs kind of didn't want to repeat that mess ever again.
That said, even if it does happen, I highly doubt that is the main part of the speed up compared to pip.
I think there's a future where we get a 4.0, but it's not any time soon. I think they'd want an incredibly compelling backwards-incompatible feature before ripping that band-aid off. It would be setting up for a decade of transition, which shouldn't be taken lightly.
At Plotly we did a decent amount of benchmarking to see how much the different defaults `uv` uses lead to its performance. This was necessary so we could advise our enterprise customers on the transition. We found you lost almost all of the speed gains if you configured uv behave as much like pip as you could. A trivial example is the precompile flag, which can easily be 50% of pips install time for a typical data science venv.
The precompilation thing was brought up to the uv team several months ago IIRC. It doesn't make as much of a difference for uv as for pip, because when uv is told to pre-compile it can parallelize that process. This is easily done in Python (the standard library even provides rudimentary support, which Python's own Makefile uses); it just isn't in pip yet (I understand it will be soon).
> This reduces resolver backtracking dramatically since upper bounds are almost always wrong.
I am surprised by this because Python minor versions break backwards compatibility all the time. Our company for example is doing a painful upgrade from py39 to py311
Could you explain what major pain points you've encountered? I can't think of any common breakages cited in 3.10 or 3.11 offhand. 3.12 had a lot more standard library removals, and the `match` statement introduced in 3.10 uses a soft keyword and won't break code that uses `match` as an identifier.
This post is excellent. I really like reading deep dives like this that take a complex system like uv and highlight the unique design decisions that make it work so well.
I also appreciate how much credit this gives the many previous years of Python standards processes that enabled it.
> Every code path you don’t have is a code path you don’t wait for.
No, every code path you don't execute is that. Like
> No .egg support.
How does that explain anything if the egg format is obsolete and not used?
Similar with spec strictness fallback logic - it's only slow if the packages you're installing are malformed, otherwise the logic will not run and not slow you down.
And in general, instead of a list of irrelevant and potentially relevant things would be great to understand some actual time savings per item (at least those that deliver the most speedup)!
But otherwise great and seemingly comprehensive list!
Even in compiled languages, binaries have to get loaded into memory. For Python it's much worse. On my machine:
$ time python -c 'pass'
real 0m0.019s
user 0m0.013s
sys 0m0.006s
$ time pip --version > /dev/null
real 0m0.202s
user 0m0.182s
sys 0m0.021s
Almost all of that extra time is either the module import process or garbage collection at the end. Even with cached bytecode, the former requires finding and reading from literally hundreds of files, deserializing via `marshal.loads` and then running top-level code, which includes creating objects to represent the functions and classes.
It used to be even worse than this; in recent versions, imports related to Requests are deferred to the first time that an HTTPS request is needed.
> Unless memory mapped by the OS with no impact on runtime for unused parts?
Yeah, this is presumably why a no-op `uv` invocation on my system takes ~50 ms the first time and ~10 ms each other time.
> Exactly, so again have no impact?
Only if your invocation of pip manages to avoid an Internet request. Note: pip will make an Internet request if you try to install a package by symbolic name even if it already has the version it wants in cache, because its cache is an HTTP cache rather than a proper download cache.
But even then, there will be hundreds of imports mainly related to Rich and its dependencies.
I got confused by the direction of the discussion.
My original point was that Requests imports in pip used to not be deferred like that, so you would pay for them up front, even if they turned out to be irrelevant. (But also they are relevant more often than they should be, i.e. the deferral system doesn't work as well as it should.)
Part of the reason you pay for them is to run top-level code (to create function and class objects) that are irrelevant to what the program is actually doing. But another big part is the cost of actually locating the files, reading them, and deserializing bytecode from them. This happens at import time even if you don't invoke any of the functionality.
rtld does a lot of work even in “static” binaries to rewrite relocations even in “unused parts” of any PIE (which should be all of them today) and most binaries need full dyld anyway.
So... will uv make Python a viable cross-platform utility solution?
I was going to learn Python for just that (file-conversion utilities and the like), but everybody was so down on the messy ecosystem that I never bothered.
Yes, uv basically solves the terrible Python tooling situation.
In my view that was by far the biggest issue with Python - a complete deal-breaker really. But uv solves it pretty well.
The remaining big issues are a) performance, and b) the import system. uv doesn't do anything about those.
Performance may not be an issue in some cases, and the import system is ... tolerable if you're writing "a python project". If you're writing some other project and considering using Python for its scripting system, e.g. to wrangle multiple build systems or whatever than the import mess is a bigger issue and I would thing long and hard before picking it over Deno.
Thanks! I don't really think about importing stuff (which maybe I should), because I assume I'll have to write any specialized logic myself. So... your outlook is encouraging.
I have to say it's just lovely seeing such a nicely crafted and written technical essay. It's so obvious that this is crafted by hand, and reading it just reemphasises how much we've lost because technical bloggers are too ready to hand the keys over to LLMs.
I've talked about this many times on HN this year but got beaten to the punch on blogging it seems. Curses.
... Okay, after a brief look, there's still lots of room for me to comment. In particular:
> pip’s slowness isn’t a failure of implementation. For years, Python packaging required executing code to find out what a package needed.
This is largely refuted by the fact that pip is still slow, even when installing from wheels (and getting PEP 600 metadata for them). Pip is actually still slow even when doing nothing. (And when you create a venv and allow pip to be bootstrapped in it, that bootstrap process takes in the high 90s percent of the total time used.)
The article info is great, but why do people put up with LLM ticks and slop in their writing? These sentences add no value and treats the reader as stupid.
> This is concurrency, not language magic.
> This is filesystem ops, not language-dependent.
Duh, you literally told me that the previous sentence and 50 million other times.
This kind of writing goes deeper than LLM's, and reflects a decline in both reading ability, patience, and attention. Without passing judgement, there are just more people now who benefit from repetition and summarization embedded directly in the article. The reader isn't 'stupid', just burdened.
Indeed, I am coming around in the past few weeks to realization and acceptance that the LLM editorial voice is a benefit to an order of magnitude more hn readers than those (like us) for whom it is ice pick in the nostril stuff.
Why? They are pretty compatible. Just set the venv in the project's mise.toml are you are good to go. Mise will activate it automatically when you change into the project directory.
I believe I was trying it the other way around. I installed uv and python with mise but uv still created a .python_version file and using the one installed in the system instead of what was in mise
If it's really not doing any upper bound checks, I could see it blowing up under more mundane conditions; Python includes breaking changes on .x releases, so I've had eg. packages require (say) Python 3.10 when 3.11/12 was current.
> Some of uv’s speed comes from Rust. But not as much as you’d think. Several key optimizations could be implemented in pip today: […] Python-free resolution
As for 20 ms, if you deal with 20 dependencies in parallel, that's 400ms just to start working.
Shaving half a second on many things make things fast.
Althought as we saw with zeeek in the other comment, you likely don't need multiprocessing since the network stack and unzip in the stdlib release the gil.
Threads are cheaper.
Maybe if you'd bundle pubgrub as a compiled extension, you coukd get pretty close to uv's perf.
parallel downloads don't need multi-processing since this is an IO bound usecase. asyncio or GIL-threads (which unblock on IO) would be perfectly fine. native threads will eventually be the default also.
Indeed, but unzipping while downloading do. Analysing multiple metadata files and exporting lock data as well.
Now I believe unzip releases the GIL already so we could already benefit from that and the rest likely don't dominate perfs.
But still, rust software is faster on average than python software.
After all, all those things are possible in python, and yet we haven't seen them all in one package manager before uv.
Maybe the strongest advantage of rust, on top of very clean and fast default behaviors, is that it attracts people that care about speed, safety and correctness. And those devs are more likely to spend time implementing fast software.
Thought the main benefit of uv is not that it's fast. It's very nice, and opens more use cases, but it's not the killer feature.
The killer feature is, being a stand alone executable, it bypasses all python bootstrapping problems.
Again, that could technically be achieved in python, but friction is a strong force.
I guess you mean doing the things in Python that are supposedly doable from Python.
Yeah, to a zeroth approximation that's my current main project (https://github.com/zahlman/paper). Of course, I'm just some rando with apparently serious issues convincing myself to put in regular unpaid work on it, but I can see in broad strokes how everything is going to work. (I'm not sure I would have thought about, for example, hard-linking files when installing them from cache, without uv existing.)
> pip could implement parallel downloads, global caching, and metadata-only resolution tomorrow. It doesn’t, largely because backwards compatibility with fifteen years of edge cases takes precedence. But it means pip will always be slower than a tool that starts fresh with modern assumptions.
what does backwards compatibility have to do with parallel downloads? or global caching? The metadata-only resolution is the only backwards compatible issue in there and pip can run without a setup.py file being present if pyproject.toml is there.
Short answer is most, or at least a whole lot, of the improvements in uv could be integrated into pip as well (especially parallelizing downloads). But they're not, because there is uv instead, which is also maintained by a for-profit startup. so pip is the loser
very nice article, always good to get a review of what a "simple" looking tool does behind the scense
about rust though
some say a nicer language helps finding the right architecture (heard that about cpp veteran dropping it for ocaml, any attempted idea would take weeks in cpp, was a few days in ocaml, they could explore more)
also the parallelism might be a benefit the language orientation
> Not many projects use setup.py now anyway and pip is still super slow.
Yes, but that's still largely not because of being written in Python. The architecture is really just that bad. Any run of pip that touches the network will end up importing more than 500 modules and a lot of that code will simply not be used.
For example, one of the major dependencies is Rich, which includes things like a 3600-entry mapping of string names to emoji; Rich in turn depends on Pygments which normally includes a bunch of rules for syntax highlighting in dozens of programming languages (but this year they've finished trimming those parts of the vendored Pygments).
Another thing is that pip's cache is an HTTP cache. It literally doesn't know how to access its own package download cache without hitting the network, and it does that access through wrappers that rely on cachecontrol and Requests.
Mine either. Choosing Rust by no means guarantees your tool will be fast—you can of course still screw it up with poor algorithms. But I think most people who choose Rust do so in part because they aspire for their tool to be "blazing fast". Memory safety is a big factor of course, but if you didn't care about performance, you might have gotten that via a GCed (and likely also interpreted or JITed or at least non-LLVM-backend) language.
Yeah sometimes you get surprisingly fast Python programs or surprisingly slow Rust programs, but if you put in a normal amount of effort then in the vast majority of cases Rust is going to be 10-200x faster.
I actually rewrote a non-trivial Python program in Rust once because it was so slow (among other reasons), and got a 50x speedup. It was mostly just running regexes over logs too, which is the sort of thing Python people say is an ideal case (because it's mostly IO or implemented in C).
I too use pipenv unless there's a reason not to. I hope people use whatever works best for them.
I feel that sometimes there's a desire on the part of those who use tool X that everyone should use tool X. For some types of technology (car seat belts, antibiotics...) that might be reasonable but otherwise it seems more like a desire for validation of the advocate's own choice.
My biggest complaint with pipenv is/was(?) that it's lockfile format only kept the platform identifiers of the platform you locked it on - so if you created it on Mac, then tried to install from the lockfile on a Linux box, you're building from source because it's only locked in wheels for MacOS.
Came here to ask about pipenv. As someone who does not use python other than for scripting, but also appreciates the reproduceability that pipenv provides, should I be using uv? My understanding is that pipenv is the better successor to venv and pip (combined), but now everyone is talking about uv so to be honest it's quite confusing.
Edit: to add to what my understanding of pipenv is, the "standard/approved" method of package management by the python community, but in practice is it not? Is it now uv?
this shit is ChatGPT-written and I'm really tired of it. If I wanted to read chatgpt I would have asked it myself. Half of the article are nonsensical repeated buzzwords thrown in for absolutely no reason
This is great to read because it validates my impression that Python packaging has always been a tremendous overengineered mess. Glad to see someone finally realized you just need a simple standard metadata file per package.
It has been realized in the Python community for a very long time. But there have been years of debate over the contents and formatting, and years of trying to figure out how to convince authors and maintainers to do the necessary work on their end, and years of trying to make sure the ecosystem doesn't explode from trying to remove legacy support.
There are still separate forms of metadata for source packages and pre-compiled distributions. This is necessary because of all the weird idiosyncratic conditional logic that might be necessary in the metadata for platform-specific dependencies. Some projects are reduced to figuring out the final metadata at build time, while building on the user's machine, because that's the only way to find out enough about the user's machine to make everything work.
It really isn't as straightforward as you'd expect, largely because Python code commonly interfaces to compiled code in several different languages, and end users expect this to "just work", including on Windows where they don't have a compiler and might not know what that is.
reply