Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pg_GraphQL: A GraphQL Extension for PostgreSQL (supabase.com)
431 points by samwillis on Dec 3, 2021 | hide | past | favorite | 99 comments


As co-founder of another database company in the PostgreSQL ecosystem, I have to say that I'm really impressed with the quality and velocity of launches from the Supabase team. Nothing else to add, except please keep up the great work!


thanks Ajay, we're also big fans of yours here at Supabase :)


Hey it's unrelated but, what's the relationship between Supabase and Postgrest?

My understanding is that Postgrest is a separate project that began development in an unrelated way to Supabase, and Supabase is just an user. Is that correct?

But, Supabase is also a contributor to Postgrest, right? Does Supabase employs engineers dedicated to it?


> What's the relationship between Supabase and Postgrest?

Supabase uses PostgREST for its automatic/reflected REST API

> Supabase is just an user. Is that correct?

yep!

> Does Supabase employs engineers dedicated to it?

https://supabase.com/blog/2020/06/15/supabase-steve-chavez


Cool, thanks!


What's your company?


Timescale. Another Postgres adjacent startup that's doing amazing work.


Click his name if you want to find out :)


I've explored multiple Graphql solutions for PostgreSQL in the past and couldn't convince of the resolvers based approach and always felt that an extension would be the right solution. The closest I've come across and used so far is Graphjin https://github.com/dosco/graphjin . Thanks Supabase for the fantastic extension!


I enjoyed Sam Newman's take on this capability [1]

> Great to see AWS providing direct data coupling as a service. /s

> This service allows you to directly map a GraphQL endpoint to a database table. It’s like putting getters and setters on an object and claiming your encapsulating private variables. The end result is coupling between GraphQL clients and the underlying datasource.

> Information hiding is a key concept in independent change. Can I change the provider (of the GraphQL) endpoint independent of the clients? Directly exposing internal data structure makes this very difficult.

And [2]:

> So a few people have asked why I have this snarky response. What is my problem with this service? Well, to be clear, it’s not an issue with GraphQL, it’s an issue with direct coupling with underlying datasources #thread

> The service as advertised makes it simple to map a GraphQL definition against a database. Now, what’s the problem with this? Well, the devil here is in the detail. But fundamentally it comes down to how important information hiding is to you.

> ... see the thread for more ...

[1] https://twitter.com/samnewman/status/1346541251617828877 [2] https://twitter.com/samnewman/status/1346749556583780352


I don't think Sam is wrong, exactly -- and that second thread does add some of the missing nuance -- but I do think there's a key missing piece here.

He's absolutely right that you shouldn't couple directly to the underlying representation. But Postgres lets you transparently define views that can be queried (and, with a little more elbow grease, updated) just like any other table. You can provide decoupling from within the database, and do so on-demand as your domain and your data model evolve.

I don't enjoy planting a separate bespoke API server on top of the database. Usually you end up lifting many of the same capabilities the database already has (auth, batching, ...) to your custom API, so a lot of the server is just boilerplate. Many API operations are natural consequences of your data model; there's little business or engineering value-add once you've settled on the latter, you're just writing glorified FFI bindings.

Lazy engineering will cause problems no matter what architectural stack you end up using. But a state-first architecture doesn't have to mean a complete loss of loose coupling -- it just means different techniques for achieving it.


Totally agree.

I consider views and functions to be my database’s API, and when using a wrapping tool (I’ve used Hasura) I get it to use only the views in a dedicated API schema rather than the tables directly.

In addition to coupling, you want control over what tables and fields you publish.

A simple view is literally one line of SQL; it’s hardly a burden.


We could use PG views or functions to make our own abstraction layer inside the database and handle security better, etc. But putting that complicated database rebuild aside, just because we can publish GraphQL APIs from the database doesn’t mean we have to: it would be interesting to use GraphQL as a communications layer from backend to database too. Just as frontend-to-backend could use GraphQL, so too could backend-to-database communication, but with different schemas (data models) between frontend and backend. Having a backend GraphQL server between the frontend and database could allow for connection pooling, schema versioning and abstraction, for proxying multiple databases, handling business logic and perhaps validating inputs, caching or managing compute tasks.


I think that (on paper at least) it makes perfect safe to allow graphql to be your only hammer (or turtle). I don't think federating graphql over bespoke swahger/json, sql, redis, soap is inherently better - expose graphql and federated graphql:

https://www.apollographql.com/docs/federation/v2/

Smack graphql on your postgres, and on your ldap, and your graphdb - and federated them as graphql?


For front-end development where you already using GQL as a layer on top of content I can see the use case for this.

A lot of old systems have this tight coupling between databases and back-end code. I would advice against going down that path.

GraphQL was designed to solve the flaws in the utterly poor service design at Facebook. Do not forget that.


There is no way every cloud provider won't jump on this as soon as it's stable for their Postgres offerings.


Hopefully so. It is awesome that supabase is releasing this as open source, as they could have easily kept this proprietary. I probably won't ever be able to use supabase for my primary job (the company is obnoxiously invested in an Azure/AWS hybrid cloud). But I do have some profitable side projects with the ability to choose my own service providers, and this sort of developer goodwill really goes a long way.


Supabase is open source. I believe the hosted version is on AWS, but there's no reason you couldn't run it on your company's Azure/AWS directly? :)


As someone that has used Hasura very heavily in the past, I can see the operational benefit of not needing to manage one more service.

The talk about memory constraints seems a bit odd though. The extension is still going to require memory and without profiling the alternatives it seems odd to say: "we won't use any more memory with an extension". Especially when you say "these established/stable alternatives fulfill all our feature requirements". In practice I always found Hasura to be reasonably lightweight (I can't speak for PostGraphile).

I think there's an advantage to running this API layer as an additional service in front of the DB though. Then it can act as an API gateway to more than just your database.

Also, the generation of a "Relay style" schema is off putting for me. I'm really not a fan of the style and was one of the biggest advantages for me in using Hasura.


It is nice Hasura offers it as well via the /v1beta1/relay endpoint[1].

[1] https://hasura.io/docs/latest/graphql/core/databases/postgre...


Curious how it compares to https://www.graphile.org/postgraphile/


I had to make sure someone had plugged postgraphile here. It's a great system. What impresses me most about it is how it grows with the complexity of your application. There are so many thoughtful points of extensibility built in, I always have confidence that I can go into the docs and find a way to do what I need to.

That reminds me, I need to figure out how to get my employer to sponsor the project...


You could hire one of the maintainers to consult on your project.


its very similar in goals to postgraphile

as samwillis mentioned, the memory footprint is tiny, which is a big perk for supabase's platform (or if you're self hosting) but its also fully language agnostic which opens up lots of options for extensibility:

For simple use-cases you can expose the graphql functionality over http using a PostgREST as described here https://supabase.github.io/pg_graphql/quickstart/

but, if you want more configuration like adding in middleware or wiring it up to an existing backend application, you can do that from any programming language that can connect to postgres, rather than only javascript


The article suggests they built it so that by running it within the DB it would have less compute overhead and they could match GraphQL requests to SQL transactions 1:1.


Just noting that Graphile also uses transactions - that’s a given for this sort of tool I’d say.


I use Dgraph, a "native" GraphQL DB. But I still transform the incoming/outgoing JSON to Java object and transforming the object again to/from the DB. Why? 2 things: Business Logic and Security/Access Control. Yes I can put (some) Business Logic in Dgraph DB, but it feels "leaky" like Stored Procedure (maybe because I come from Java). I feel it's tiring and stupid, but I don't know a better solution.

I reckon for this extension, the business logic uses Stored Procedure and Access Control uses PG's user role? Many apps I know simply have 1 user "myappuser" (or even default user) to access its DB.


> Many apps I know simply have 1 user "myappuser" (or even default user) to access its DB.

Sure; either those apps don't need to differentiate access between their users, in which case one role is sufficient, or they reimplement their own auth system, in which case you'd use Postgres' own rather robust auth system instead. It comes down to the needs of the domain; you'll solve the problems differently depending on what approach you take, but you need to solve the same problems.

Yes -- I've found it very tiring, as you put it, to keep reimplementing the same boilerplate in every API server just to lift the operations my database can already support out to an HTTP frontend. Postgres' auth means I don't have to make or press into service a separate auth system, and there are multiple ways to handle business logic orthogonally.

Stored procedures and triggers work well, but are synchronous within the current transaction, and sometimes simply don't map well to the domain needs. You can also use the AWAIT and NOTIFY statements to set up asynchronous external workers. I find this has a positive effect on the data model, as you're forced to consider what states a system will pass through during an asynchronous flow.


Whilst I don't like the paradigm of generating a GraphQL schema from your database (or vice-versa!), I appreciate this one for having a specific goal of being able to run as part of a small database VM. So congrats on the release!


Ideally you don't expose the schema directly, you make views in another schema and expose that instead of exposing your real tables.


Even so, the patterns I really enjoy using with GraphQL make heavy use of interfaces and unions; and I’ve yet to see any automatic “data model to schema” tool make use of them.


author here! happy to answer questions


We use Postgraphile heavily, it drives the entirety of our API. A few feature questions; Is there any plan to support, or has there been any conversation around:

- custom directives that run custom code

- custom routes on the server that do custom things? Or perhaps proxying to another app to handle alternate/custom routes

- schema injection, custom resolver logic


> After tallying the resources reserved for PostgreSQL, PostgREST, Kong, GoTrue, and a handful of smaller services, we were left with a total memory budget of ... 0 MB

This was a really interesting bit to me, could you give detail on how the memory usage gets split up across these? Thank you.


sure, if we're talking free-tier:

Postgres runs in a 1 GB VM all by itself with some optimizations to get the most that limited hardware

All other services are in a second 1 GB VM.

You can see a list of those services here https://github.com/supabase/supabase/blob/master/docker/dock...

The memory use per process can differs by use-case, so its important to leave a bit of headroom. For example, PostgREST's memory consumption can grow if the amount of data being returned from its queries is large.

By the time you include:

- supabase studio

- kong

- auth

- storage

- meta

if they each take only 100MB, its pretty snug on a 1 GB VM!


Congrats on release. We use Hasura but haven’t been happy with their speed of iteration, lack of communication of various large bugs, and lack of M1 support / communication.

But we do use a lot of their more advanced features like being able to use aggregates in sorts, aggregates in results, custom functions, etc. What are your plans there and will you have a public roadmap?


I joined Hasura recently to lead the Product team because I love the tool and the people are nice.

Thanks for the feedback on what you're finding valuable and sharing your dissatisfaction with the lack of communication around bugs and M1 support.

M1 support is very close to ready, we've been waiting on a couple of dependencies to support M1.

For what it's worth, I agree with you that our communication needs to improve and it is something we're working on.

Also, congratulations to Oliver and the Supabase team on pg_graphql! PostgreSQL extensions are slick and it'll help even more people embrace APIs from databases.


There is high level public roadmap that covers the minimum features required before we'd consider going GA here:

https://supabase.github.io/pg_graphql/roadmap/

Its early days, so the conversations around aggregates haven't happened yet, but I'm optimistic that they'll make an appearance in a future release


Am I correct in saying that for read queries, you essentially translate the GraphQL query into nested jsonb_agg calls?


yep!


Very excited about!

Do you have performance comparisons for the same datasets with Hasura / Graphile?


We don't have an apples-to-apples comparison for similar hardware and queries at this point but here are the links to performance figures for pg_graphql and grahile. Hasura does not publish throughput number afaik

pg_graphql: https://supabase.github.io/pg_graphql/performance/

graphile: https://www.graphile.org/postgraphile/performance/

We'll certainly be keeping an eye on performance as it gets closer to GA


Hi Oliver, Supabase team,

Love what you are doing with Supabase!

Quick question, have you considered building any kind of local mirroring system for offline mobile app/PWA, say on top of SQLite? Something like Realm for mongodb or PouchDB/Couchbase Lite for CouchDB/Couchbase.

It would obviously need devs to add some extra columns to tables for tracking, and a way to define the merge/overture characteristics for each column, but it would be awesome to have something like that!

It’s something I have thought about building for a while but never found the time. (I want to combine it with Yjs for collaborative offline rich text editing)


There have been a few discussions about offline-sync support for the postgREST clients but nothing concrete to announce yet

On the graphql side, we'll be targeting making mutations offline capable but are still in the brainstorming phase. If you have any suggestions for how you think it could work please open an issue on supabase/pg_graphql and we can discuss over there


just want to give a big shout out to Oli - his implementation really ingeneous, leveraging multiple parts of the Supabase stack while also being agnostic enough to work natively inside Postgres (or with other tools)

{supabase team}


So you're calling Graphql via PostgREST, but I didn't ready about why you're including Graphql in the first place. Isn't it orthogonal to PostgREST which is already used in your stack?


PostgREST does solves a few of the core challenges that GraphQL is intended to address like over/under-selection and resource embedding (relationships).

Even so, there has still been a lot of interest in GraphQL so users can leverage the growing ecosystem for things reflecting the data model/types for client usage, offline caching, etc


Thanks, this was missing for me in the Motivation section on the blog post. It was a genuine question, not sure why it‘s downvoted.


Does this work in YugabyteDB/CockroachDB?


Like Timescale (below), I'm not aware of anything in these implementations that would be incompatible with pg_graphql, but it has not been tested

As long as pg_catalog schema is visible & the base postgres version is >= 13 it should be good-to-go


I've been looking for something like this for a long time, so cool! Great work!

Does it work with TimescaleDB?


It has not been tested, but I'm not aware of any differences Timescale introduces that would be a compatibility issue.


how do you implement search/sort against something like this?


Thanks for giving this to the community. Will definitely be looking at you for a SASS app i'm building.


They just released a free course you might like: https://egghead.io/courses/build-a-saas-product-with-next-js...


This is awesome! Really creative approach to the problem. Does this also support custom datatypes that can be marshalled to json? For example, PostGIS geometries?


Currently, any datatypes that are not in the allow list

https://supabase.github.io/pg_graphql/reflection/#type-conve...

are cast as strings, but prioritizing a JSON conversion if one is available is a great idea that we'll look into


Personally I like the idea of allowing the user to create a cast:

https://www.postgresql.org/docs/current/sql-createcast.html

There are some weird permissions issues to work out IIRC, but this allows the user to specify how the json should be marshalled, and then can be implicitly or explicitly used by any sql statements. This is how PostgREST solves this problem, and I currently have a few custom casts created for range and multirange datatypes through my PostgREST server.


Don't know how your JSON generation works but when using `row_to_json()`, all PostGIS geometries are automatically casted to GeoJSON objects by default which is very convenient.


That’s interesting. How does it deal with authentication and authorization? It it handles those well, it could be a real “serverless” solution for most CRUD apps.


As the sibling comments point out, it uses Postgres Row Level Security (RLS). For an approachable introduction to how Supabase's auth works:

https://supabase.com/docs/guides/auth

> 1. A user signs up. Supabase creates a new user in the auth.users table.

> 2. Supabase returns a new JWT, which contains the user's UUID.

> 3. Every request to your database also sends the JWT.

> 4. Postgres inspects the JWT to determine the user making the request.

> 5. The user's UID can be used in policies to restrict access to rows.

> Supabase provides a special function in Postgres, auth.uid(), which extracts the user's UID from the JWT. This is especially useful when creating policies.

For instance, say you have a `todos` table and want to make it so users can only read their own todos - you could have an RLS policy `todos.user_id = auth.uid()`. Afterward, `SELECT * FROM todos` will only return the authenticated user's todos. (Equivalent to manually issuing `SELECT * FROM todos WHERE todos.user_id = auth.uid()`.)

There's also `auth.role()` so you can easily restrict access by role: `auth.role() = "admin"`


Is there integrated way to achieve claim based approach where access to a given resource is governed by both uid (lets say org uid, and all users belong to particular org) and set of claims (lets say there are at least CRUD claims for all tables and I can set for any user any combination of them, or create a role containing them then assign that role to a user) ?


The extension is compatible with your existing row level security policies. If you connect to the database as a role like `authenticated` then those policies will be applied.

The columns and tables that are visible are also controlled by the role.

One cool thing about that approach is you could run e.g. an admin API and a user facing API all from that same endpoint by executing as different postgres roles!


They mention in the article that auth is handled by reusing the existing row level security you'd already use to secure the DB (i.e. `CREATE POLICY`).


While this is super neat, this and the amplify thread can't help but make me wonder:

Is everyone really this okay with their API always being 1-to-1 with their DB models?

In my experience, that kind of setup is only viable for the smaller, simpler projects and otherwise you always run into something where you'd really prefer to have a layer between you and your database.

I am currently using graphql at work and this is a very hard requirement for us. Our database schema is not translated literally into graphql(or the other way around) and this is very intentional. The whole idea of the API layer is to be able to make changes to your internals without breaking all your consumers.

This was a problem with amplify(though the least of the problems we had) and it seems to me this is also a problem here, as it is with Hasura and postgrest.


> The whole idea of the API layer is to be able to make changes to your internals without breaking all your consumers.

You can do so by not exposing tables and instead use views and stored functions/procedures.

See https://postgrest.org/en/v9.0/schema_structure.html#schema-i...


How does it compare with https://github.com/solidsnack/GraphpostgresQL? It's like 7 years older and it takes the same approach, it seems.


GraphpostgresQL is no longer maintained and hasn't been updated for many years.

I am the person who wrote it. It was a proof-of-concept, written in PL/pgSQL. This made it fairly easy to set up and test but made maintenance and contributions very difficult.


Never ceases to amaze me when someone pops up on HN with a "yeah, I made that, ask me anything".

Nice work, looks like an awesome project.


Very cool! I was not familiar with this project. It does takes a similar approach to query building. It is described in the README as an alpha POC


I've been using Hasura extensively and love it. I'm curious what the delta is between the two. It's so simple to run Hasura on dokku and instantly get an amazing UI for postgres with the bonus of graphql. It would take a lot for me to switch.


> It would take a lot for me to switch.

We were pretty happy with Hasura but had to switch to Postgraphile due to poor multi-database support, bummer.

(Postgraphile is not as polished as Hasura in some ways, but since it can used as a library, it's easy to dynamically create N instances of it with different configurations at runtime. Hasura required our ops team to define a new instance of the service in the docker-compose.yml file for each database)


Pretty recently, as of 2.0, hasura has added support for multiple databases.


It's a VERY fragile support:

https://github.com/hasura/graphql-engine/issues/6648

Basically, if two entities in the databases have the same names, Hasura fails unless you manually define a unique `custom_name` for each such entity.

Given that the most common multiple database scenario involves different databases with either the exact same schema (one-db-per-tenant) or similar schemas (staging vs. production database), it forces you to painstakingly set a custom_name for basically every single entity in your db.

Thankfully there is an API so in theory you could set this programmatically, but it still means that your client code needs to be manually kept in-sync with whatever custom name generator rule you used.


For what it's worth, this is addressed in 2.1, which is only out in beta. Maybe should have been there from the start.


Hey, I'd be curious to hear your thoughts on our solution. It allows you to combine any number of Databases with PostgreSQL, MySQL, SQLite and SQLServer are supported, here's the full list of supported DataSources: [0]

Our solution comes with a feature called Namespacing [1], which means, every API has its own namespace so there are 0 collisions between the different types and fields. It even goes so far that we also namespace directives so you can have a combined schema of multiple GraphQL APIs and can still use the namespaced directives on fields from that particular upstream.

Disclaimer, I'm the founder of WunderGraph.

[0]: https://wundergraph.com/docs/overview/datasources/overview#o...

[1]: https://wundergraph.com/docs/overview/features/api_namespaci...


Hadn't heard of Wundergraph, I will keep it in mind if we encounter issues with Postgraphile. Thanks!

Regarding your solution, it seems to be the same that Hasura is working towards. It's a perfectly fine solution if you have a few types that happen to clash in their basic names (we have that too, e.g. for "products.suppliers" vs "services.suppliers").

For multi-tenant solutions, i.e. where all data sources are identical but they refer to different customers' data, separated by user/schema/database/instance, namespacing works but it's not _great_.

It means that the client code needs to get the tenant's namespace from somewhere (probably a claim in the authn system), and then manually interpolate it in all the graphql queries. It's not a security flaw (if you screw up and query spacex_users from a different tenant, you'll just get a 404 - I hope!), but it's going to play awkwardly with most developer tooling, having to always work with interpolated strings.

More importantly, if you use namespace for tenants, now you can't also use them to solve simple name overlaps, unless you split each tenant over multiple APIs.

If you want to improve your multitenant story, here's what we did with Postgraphile (which is a straight adaptation of what our .NET backend does): when the backend starts up, it initializes one identical GraphQL source per tenant. Then looks up each tenant's authorization URL. When a request comes in, the backend validates the auth token, and it uses the signed authorization URL to determine which GraphQL source should run the query.

In this way the clients don't need to worry about tenancy or namespaces at all, there might be a single tenant for all they know. The same request that a developer tests with his login under TestCompany will also run for every other customer, but the auth token and the auth token alone determines which data source it gets run against.

As you can see, this approach isn't a replacement for ad-hoc namespacing. It's meant specifically for the scenario where the same identical schema exists in multiple data sources.


Hey, thanks for your feedback. This sounds like a great feature to add. Would you be available for a chat? I'd like to learn more about how you solved this problem. Your solution sounds very well thought out. You can find me on this discord: https://wundergraph.com/discord


No problem. Not right now, but I'll see if I can drop by in the next couple of days.


I see this as a neat way of exposing a graphql interface but a pattern with a number of limitations that are traded off against the goal of reducing the network latency. Definitely useful for users who want a single datasource and only want to expose access through GraphQL (cannot scale db resources indepedently of the api access layer). Wondering if the graphql engine could make use of the postgres stats collector for optimizing graphql resolver execution ex) resolver dispatch scheduling or pre-execution query optimization?


Heck ya! I can’t wait to try it in whatever my next project is.

I’d like to learn more about the security of the functionality. I’m not talking about how to apply pg roles/privs/RLS, but rather my perceived risk of how hard it is to write stuff safely in C. That is, who is handling inbound http requests, pg or the extension? What confidence do I have that my db mem is safe from imperfect extension logic?


What are the use cases we could use a direct plugin to database for GraphQL? I mean where do we write business logic before data becomes GraphQL?


Security definers are a powerful feature of postgres that allow a lot of business logic to live in the database. With an extension like this, I reckon postgres can theoretically become a first class public api for a surprising number of use cases.


How long before someone just goes into the Postgres source code and creates true native support for GraphQL


An extension is the best next thing to native support and in my opinion a better place for such functionality.


Naive question: If your entire DB schema is automatically reflected into a GraphQL schema, then how is this better than just having the client send SQL queries directly to the backend? Does GraphQL make it easier to do this securely?


This runs within the database engine? Doesn't that adversely affect scalability?


On the contrary, this could scale much better.

If the GraphQL server is a separate component, then a GraphQL query needs to do its own query planning, then the resolvers turn that into SQL queries to the database, then the database does its own query planning and execution for each query.

Since one GraphQL query will often turn into multiple SQL queries, there's likely to be duplicated work on the database side across those queries since they relate to the same data.

By integrating the GraphQL server into the DBMS, it can do query planning once for the whole GraphQL query, which means that it can reuse parts of the plan and prevent duplicated work and N+1 queries.

Either way it's going to have to query the database to get all the data, so the essential work is the same. But this way you have opportunities to reduce wasted work to process the whole GraphQL query.


Most likely the actual work done by this extension is little compared to the actual work required to perform the queries…but…

If you want to scale horizontally you can run multiple read replicas with something which distributes connections evenly to the replicas.


I use Hasura heavily and the thing I wish it had out of the box is groupby aggregates.

This doesn't have aggregates yet but when that conversation happens hopefully it's considered.


Looks excellent, and I'm pleased to see you've provided the relay connection style pagination, using keyset pagination under the hood. That's awesome!


Wow, with Relay built-in. Looks great!


Can I also install such extension in a Google Cloud SQL Instance?


supabase look like so helpful tool. which has anyone use in production? does it have a hasura like event trigger mechanism?


Yes, event triggers are built right into Postgres so they're supported out-of-the-box

Event triggers are the mechanism pg_graphql uses to keep the GraphQL schema up-to-date with the SQL schema


How does it compare to Apache Age?


Apache Age is based around the openCypher graph query language rather than GraphQL


PostgreSQL has the great shiny features, but still doesn't support zero downtime upgrades?


Wow,

Interesting - many extensions are coming up allowing different interfaces to PostgreSQL! Other "Protocol Convertors" launched recently

MS SQL - https://babelfishpg.org/ MongoDB - https://www.ferretdb.io/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: