Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> This is interesting in that I wouldn't expect that the typical resolution involves a particularly large quantity of TOML.

I don't know the details of Python's resolution algorithm, but for Cargo (which is where epage is coming from) a lockfile (which is encoded in TOML) can be somewhat large-ish, maybe pushing 100 kilobytes (to the point where I'm curious if epage has benchmarked to see if lockfile parsing is noticeable in the flamegraph).





But once you have a lock file there is no resolution needed, is there? It lists all needed libs and their versions. Given how toml is written, I imagine you can read it incrementally - once a lib section is parsed, you can download it in parallel, even if you didn't parse the whole file yet.

(not sure how uv does it, just guessing what can be done)


For Cargo,

- synchronization operations are implicit so we need to re-resolve to confirm the lockfile is still valid. We could take some short cut but it would require re-implementing some logic

- dependency resolution only uses `Cargo.toml` for local and git dependencies. Registry dependencies have a json summary of what content is relevant for dependency resolution. Cargo parses nearly every locked package's `Cargo.toml` to know how to build it.


For whatever it's worth, the toml library uv uses doesn't support streaming parsing: https://github.com/toml-rs/toml/issues/326

I'm not sure if it even makes sense for a TOML file to be "read incrementally", because of the weird feature of TOML (inherited from INI conventions) that allow tables to be defined in a piecemeal, out-of-order fashion. Here's an example that the TOML spec calls "valid, but discouraged":

    [fruit.apple]
    [animal]
    [fruit.orange]
So the only way to know that you have all the keys in a given table is to literally read the entire file. This is one of those unfortunate things in TOML that I would honestly ignore if I were writing my own TOML parser, even if it meant I wasn't "compliant".

I don't think that's worse than having to search an arbitrary distance for a matching closing bracket. There are tasks where you can start working knowing that a given array in the data might be appended to later (similarly for objects).

It's worse than having to parse a matching bracket, because any context where you have an item defined via nested brackets is going to be a subset of this use case. But yes, that doesn't mean you couldn't do some theoretical eager processing, but it's going to be context dependent. For example, consider a Cargo.toml file, where we've processed the `features` key for a given dependency. Is it safe to begin compiling that dependency with the given set of features before we finish parsing the file? No, because there might be a `default-features=false` key that applies to this dependency later in the file. In a format where tables weren't allowed to be split, the mere act of parsing a single, self-contained dependency entry would be enough to know for certain that no such `default-features` key exists. Not all potential keys are going to require this sort of consideration, but it could be a footgun depending on the semantics of your schema.

TOML as a format doesn't make sense for streaming

- Tables can be in any order, independent of heirarchy

- keys can be dotted, creating subtables in any order

On top of that, most use cases for the format are not benefitted by streaming.


Lockfiles aren't an issue. It is all the dependencies themselves.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: