Stop trying to categorize things. Period. Recognize categories as a sometimes-useful shortcut for talking about somewhat-average (within that category) things, and that they're absolutely useless for everything else. When you stop worrying about whether something is or is not a fish or planet or alive, you will not miss it. You will be surprised at how rarely you actually need to worry about what category something is in.
This applies directly to database designs and why almost every database is awful unmaintainable garbage. If you have a table per category of thing, your database is terrible; but know you're in good company, as 99.9% of database developers make this simple and completely avoidable mistake.
Unfortunately our laws also like to categorize things, which does sometimes make it unavoidable in software dealing with those laws.
> Fish generally live in the water, breathe through gills, have tails and fins, possess a certain hydrodynamic shape, lay eggs, and are in a certain part of the phylogenetic tree.
This is not true (somewhat saved by "generally"). Either you're describing physical characteristics, or you're describing phylogeny; not both. Phylogenetically, whales are fish, and so are you. (This rests on the idea that only monophyletic groups are at all meaningful in genetics; nothing ever stops being in the "fish" tree, if such a tree even exists. Cats are felines and mammals and lizards and fish. Unless you believe there is no such thing as a fish at all, which is also a solution.)
> A fish is a creature phylogenetically related to various other fish, and with certain defining anatomical features.
Take a shark, pretty universally recognized as a kind of fish. If the argument is "whales are closer, phylogenetically, to humans and cats than to sharks, therefore they are mammals", then you should know: so are salmon. Salmon are closer, phylogenetically, to humans and cats than to sharks.
The point not being that you should call whales fish, but rather being: fuck categories.
> When terms are not defined directly by God, we need our own methods of dividing them into categories.
No, we really, genuinely, honestly, do not need any methods of dividing things into categories. We really don't.
I realize that the author does kinda get to a weaker version of this with talk about fuzzy borders and the goals of the category. The transgender discussion is pretty good, with his criticism of the fallacious argument:
> If a man thinks he’s a woman, then we might (empathetically) wish he were a woman, other people might demand we call him a woman, and we might be much more popular if we say he’s a woman. But if we’re going to be rationalists who focus on believing what’s actually true, then we’ve got to call him a man and take the consequences.
I go much harder anti-category and say: this argument is fallacious on its face because it asserts that belonging to the category "man" is even theoretically a factual thing. It's not, just as belonging to any category is not. Any argument that it is either True or False that object X belongs to category Y is immediately fallacious and should be completely discarded.
Maybe I'm misunderstanding the use of "category" or "thing" here, but it would be difficult for me to make useful business software for my clients if I didn't have one table for "sales orders" and another for "purchase orders".
No, it wouldn't. They have a lot in common. What various statements do you want to make about "sales orders"? What various statements do you want to make about "purchase orders"? My guess is that a lot of those statements are the same. In fact, a lot of the statements, e.g. "when was this record last modified?", are statements you want to make about all sorts of things. If you list out everything you might ever want to know about a PurchaseOrder, and everything you might ever want to know about a SalesOrder, and you just fucking JAM all that shit in each table (or a combinatorial explosion of child and association tables), you end up with two giant tables that are kinda-the-same-but-kinda different; you end up with a bad database. A database with a table called SalesOrder(s) and another called PurchaseOrder(s) is a bad database. That is exactly my point. Everybody makes this mistake. They only get away with it because they're tackling simple, easy problems (badly).
Everyone seems to think that one record is one noun. No! One record is one statement. One table is not a list of instances of one distinct noun; it's a list of instances of one kind of statement you want to make. This is much closer to a "pure" relational model (e.g. as espoused by Codd, as hinted at in Out of the Tar Pit) than constantly making the exact categories-are-hard-boundaries mistake discussed in TFA. Most people have started recognizing the problems with "class Cat : class Animal", but then they do this exact thing in their databases, reintroducing all the same problems.
Yes Out of the Tar Pit is wonderful, but I don't believe it actually argues what you appear to be claiming it argues. It is arguing a different (and more important point).
Or put more specifically, one could argue that a SalesOrder and a PurchaseOrder are the same thing, because fundamentally they are with a different perspective. But one cannot argue that a Teacher and a Student are the same thing. As a result having a table for Teachers and a separate table for Students is a good thing. And that would be categorization.
Are you arguing that Students and Teachers should be stored in the same table when modeling a DB? If so, then you are arguing for categorization. But even then, that's not the heat of the issue.
More detailed there are 3 arguments that in my opinion you are muddling together:
1. Some things are the same thing but viewed from the perspective of two different users, these should be modeled as the same thing (PurchaseOrder vs SalesOrder). [I agree with this specific case, but there aren't a ton of these]
2. You should model events in a system and not nouns/entities [The Out of the Tarpit argument, which I also agree with and is the direction of ideal coding if performance were no object, and you can easily do queries / derived models on top of it.]
3. You shouldn't categorize things. [This argument I don't understand as I see generalization and categorization as the essence of software modeling, but am open to an explanation. A StudentEnrolled event table should be categorized separately from a ClassLectureGiven event table, and those are categories.]
> one could argue that a SalesOrder and a PurchaseOrder are the same thing,
I am not arguing that.
> As a result having a table for Teachers and a separate table for Students is a good thing. And that would be categorization.
I disagree that that's a good thing.
> Are you arguing that Students and Teachers should be stored in the same table when modeling a DB? If so, then you are arguing for categorization.
No. I am arguing that you do not know beforehand how many tables Students and Teachers (nor Purchase/Sales Orders) are going to take up, because you are going to come up with more things to say about them. The more design attention you give it, the more tables they're likely to take up. And those tables are very likely to be shared with other entities. Because once you start making statements like "This record was modified by User X" you are no longer even speaking the language of Students and Teachers.
> 1. Some things are the same thing but viewed from the perspective of two different users, these should be modeled as the same thing
No, I am not claiming that. I never said "same thing".
> 2. You should model events in a system and not nouns/entities
No, I didn't say "events", I said "statements". An event is "X happened at Y time" -- which also happens to be a statement. But another statement is "Carbon is a chemical element with symbol C, atomic number 6, and standard atomic weight of 12.011". You could argue that's actually 3 statements, and that decision is a complex engineering decision you make based on the tradeoffs in front of you. I'd actually probably break it down into "Carbon is a chemical element with symbol C and atomic number 6", since those are relatively unique to Chemical Elements, and take its atomic weight and some other properties to a different table since those properties are shared more widely.
Modeling events accidentally tends to get databases closer to the design I advocate for, which is "one table per set of data fields you need to make a clear statement". But that's just a happy coincidence. I claim the success of event-based architectures is mostly coincidental to their "eventness" and actually more to do with the fact that they're closer to a "statement-driven" rather than "noun-driven" architecture. You can have a statement-driven architecture without events at all.
In fact it would be silly to try to model the statement that Carbon's Atomic Number is 6 as an event. Plenty of things are just "facts" (for purposes of your system) that you want to state without having to say when this fact became true.
> 3. You shouldn't categorize things.
Correct.
In your defense, I am trying to jam a big argument with a decade of theory behind it into an HN comment, so I don't really fault you for not getting my point. But I'm saying a good database design is about 90 degrees rotated to the standard "one table per noun" design. You don't put your Students and Teachers in the same table, nor in different tables: you don't put them in any table. You only put statements in tables. Some of those statements are about Students. Some of them are about Teachers. Some of your Students are also Teachers! (Ever seen a class TA'd by a Ph.D. student?) Some of your statements about Students will still be true when they stop being Students, and some won't! You make a new table when you need to make a new kind of statement about anything, regardless of what noun or category they're about. You basically never need to worry about what an entity "is" or "is not". You literally never need to categorize anything.
So let's continue with your example. I'm still trying to understand your position. The phrase "Carbon is a chemical element with symbol C and atomic number 6" specifies a "chemical element". That to me is a category, do you disagree? Have you not just categorized Carbon as being a chemical element?
Or moving away from things you disagree with, how would you specifically model the statement above in a physical table / DB. Or the simpler "Carbon is a chemical element with symbol C"?
Do you have a table with a single column that stores the string above? Do you have a table with 2 columns one that stores the string "Carbon" one that stores, "Chemical Element" and one that stores "C"? Do you instead have a table with 2 foreign keys one tracking names of elements (ElementId, Name) that would store (1, "Carbon"); (2, "Nitrogen") depending on the order you created your statements? And another table storing element abbreviations with entity Ids and fields of "C", "Fe", "Ni"? And name your statement table chemical elements? Or do you add a third column and call it relations and have a list of relation types where "IsAChemicalElement" is one type of relation?
The phrase statement to you have a very specific meaning to you in both an English everyday use sense, and a modeling sense. We both agree (or mostly agree) on what statement means in an everyday use sense, but it can mean many different things in a modeling sense. Can you specific how in a physical table you would choose to model the statement "Carbon is a chemical element with symbol C"? and the presumably related statement "Carbon has Atomic Number 6" if helpful in explaining.
What I'm getting from you now is you're envisioning something closer to a Entity Component System but for general data modeling. Or more accurately something like prolog style relations; with all entities, regardless of "type", tracked in a single table of entity ids, and having each relation between entities referred to as a statement, and tracked in their own table. But it's still not clear to me until you give an example of how you would model it.
> The phrase "Carbon is a chemical element with symbol C and atomic number 6" specifies a "chemical element". Have you not just categorized Carbon as being a chemical element?
A fair point -- except it's a worldwide consortium of scientists that made that categorization, so it's not like I introduced it here. But you're right: the categorization "carbon is a chemical element" is not actually what I care about; I care about the fact that "Carbon is one of the things you might see in a Chemical Formula, where it will be represented by a C." It's possible that things that aren't chemical elements, like Amino Acids, could also have these same statements made about them. So as you see, I've not actually restricted members of my table to only Chemical Elements (with this minor change).
> Do you have a table with a single column that stores the string above? Do you have a table with 2 columns ...
Everything you listed is a possibility, except you need GUIDs or a similar scheme since the same entity is represented in multiple tables. In general it's just (key, statement) where "statement" is all the fields you need to exactly make your statement. I'd probably put the string "Carbon" in its own table, possibly with other text used to canonically identify the entity in English (or I'd explicitly store the language if I was going full internationalization). I also sometimes track singular vs. plural, although for chemical elements that's
I'd put atomic number and the chemical-formula abbreviation in the same table, possibly with other "periodic table" type information if it were important. I probably would stop there: I wouldn't actually say it's a solid, for instance -- that's just not an actually useful statement in the real world. This is just too simple an example to get much meat out of it. But that's OK - small tables are fine.
There are no foreign key constraints because there is no single table which is a canonical list of "chemical elements", by design. Foreign keys are secretly business rules that just happen to be easily representable in a database. Business rules go in the application layer.
> [statement] can mean many different things in a modeling sense
A statement (or component, spoiler that you got that right) is an immutable ("honorarily" immutable in the DB, but truly immutable in the application layer) ordered tuple of related data (read: struct) that can be associated with an entity. Of course to be stored in the DB it must be associated with an entity, but in the application layer it's useful that they don't have to be.
> What I'm getting from you now is you're envisioning something closer to a Entity Component System but for general data modeling.
Yes, that is the term I use for it: "Entity-Component Architecture". I've taken to calling them "statements" instead of "components" when introducing the topic because it's more obvious to people who don't know ECS, and people who do know ECS assume I'm just talking about games. The word "system" shows up a lot but does not have a specific meaning; it's more like "service". Basically systems are code elements that layer on successively more rules about what operations are or are not valid to perform on sets of these (key, statement) objects -- but that's just one of many useful words to describe such code elements.
> Or more accurately something like prolog style relations; with all entities, regardless of "type", tracked in a single table of entity ids, and having each relation between entities referred to as a statement, and tracked in their own table.
You don't actually need a master list of every entity, and there are even reasons to avoid it. But you could do it that way if you wanted; it has its pros and cons. I personally don't.
Keep in mind not all statements are "relations between entities". Some are; some are relations between 3+ entities. Some are just plain-old facts that you want to remember. But yes, each different kind of statement you want to make gets its own table.
> But it's still not clear to me until you give an example of how you would model it.
You have the right idea. Frankly the differences don't show up until a pretty reasonable level of complexity, and everything is dependent on the exact business cases you want to support. I really should just write a blog post on it.
So it sounds like you are looking at an Entity Component System as applied to a DB for data modeling. Thanks for taking the time to explain. I do agree it's a valid approach that has many benefits - However, I do disagree that it is a strictly better approach.
I've been around a decent while and have seen a lot of tech, and it's a trade-off that has come about in many different forms. It's static typing vs dynamic typing, it's fixed schema vs flexible schema, it's compile-time checks vs run-time checks. It's SQL vs NOSQL. Overall it gives a lot of flexibility in that any entity can take on any behavior with minimal changes; however it also comes with the flip side of it being harder to catch modeling errors. It's the same argument of "make impossible states unrepresentable" vs "constraints should be enforced in the app along with business logic". It's also Typescript vs Javascript. But there isn't a true winner in approach. Typescript beats JS in my book, but dynamically typed Clojure is also a great language. It's "is-a" [1] vs "has-a"[2] relationships and it's squarely in the "has-a" approach which I (and most people) feel is the better approach. So again, no clear winner in approach.
I do think it's going to continue to gain steam and more people will adopt it simply because ECS is popular. I also think down the line people will realize it has similar / the same short-comings as NoSQL and will re-adopt relational for its benefits.
If you are thinking of writing a blog, it never hurts, that said there was a post to HN just a week or two ago of someone discussing the approach. They were making a different point, but their underlying assumption was the same. Namely: "You can due ECS in a DB and benefit from it." Note though that they also called out the trade-off of "normal" ECS and how you lose mode enforced / constraint checking. (Their improvement was to go to DB approach, although that doesn't solve it). If you hadn't seen it, you might want to check it out [3]. And if you squint ECS (or ECA) I believe is the non-OOP version of mixins. If you haven't done any reading about that it might be of interest [4].
But thanks for sharing your thinking and yes I do agree it's a valid approach for design - just as with anything make sure you know the tradeoffs (what you gain and what it costs).
I have so much to say about all of this, but the indentation on this comment thread is getting extreme. Everything you said is correct in some general sense of understanding software engineering, and yet ... somehow too fence-sitting. Static Typing vs. Dynamic Typing is not some 50/50 argument: evidence and the experience of people who use both is strongly in favor of Static Typing being close to "strictly better" for most teams. SQL vs. NoSQL is less clear, but you can extrapolate known "good practices" to show that the relational model is "strictly better" for the kinds of applications that most people actually want to build.
There's No Free Lunch, sure, but that's defined over all problems, which is an unbelievably vast set of things that we mostly never care about. The problems that human beings ever actually care about rounds to 0.000% of "all problems", and you could keep going with 100 more 0s after that. When you look at the industry and the problems people are actually trying to solve, sometimes "strictly better" starts looking like the right term to use for some of these tradeoffs; or at least "almost always better".
Ultimately Software Engineering is still a juvenile discipline compared to other kinds of Engineering, and I think we can look to our big siblings for some insights. It's obviously true that in Aeronautical Engineering, everything is a tradeoff and nothing is "strictly better", yet we don't see many biplanes flying around anymore. It is possible for basically the whole field to advance. (Although old ideas have a way of resurrecting, too -- maybe biplanes will suddenly become very useful for certain drones, or in the Martian atmosphere or something!)
> dynamically typed Clojure is also a great language
I'm sure it is! And I'm also sure that statically typed Clojure, with a sufficiently strong type system to represent the kind of code you want to write with it, would be an even better language. Although perhaps "sufficiently strong" is not yet achievable.
> I also think down the line people will realize [ECS] has similar / the same short-comings as NoSQL and will re-adopt relational for its benefits.
Entity-Component is more relational than standard table-per-noun architecture (at least the way I do it), but I understand why this isn't obvious and that it's something I'd need to prove.
> Note though that they also called out the trade-off of "normal" ECS and how you lose mode enforced / constraint checking.
I encourage you to sit down and think through all of the constraints on your domain logic, and then see how many are enforced at the database level. Let's look at the old classic https://wiki.c2.com/?WhyIsPayrollHard -- how many of these constraints are enforced with ACID guarantees at the database level? Something like this:
> After being sick for 3 days (when your pay comes from your normal account), you go on disability, where your pay continues but comes from another account. After N days you only get 60% of your pay, unless you come back and go out on a different disability.
This is a complex domain-logic rule that you could never implement in the database (unless you're using T-SQL as your application development language, in which case your Triggers are simply your application and you're still doing this in the application layer). All foreign key constraints can be phrased in terms of (often volatile) business validation rules -- there is nothing special about them, except the fact that they happen to be easy to enforce by the database. Look at that list of Payroll requirements and then ask again: how do you prevent invalid data states? There's no reason to single out database-enforced foreign key constraints as the only data validation check that you absolutely must enforce at the database layer. It's completely fine to drop foreign key constraints if you have other ways to validate your model -- and you must! (Yes, at the dreaded "application layer"! That's exactly what your application layer is!)
> any entity can take on any behavior with minimal changes;
Unfortunately we haven't even started talking about how behavior and domain logic works its way in here. It's not really like mixins -- mixins is like trying to add the concept of addition onto the number 7. 7 is just a number, and addition exists outside of it. Think more of the "Arithmetic System" as defining a set of operations that make sense to do on (collections of) numbers, and checking/enforcing constraints on that (no division by 0). (Obviously this doesn't explicitly exist since programming languages and databases already give us arithemetic for free.) Then the "Unit Conversion" system relies on the Arithmetic System to do its jobs, defining more complex aggregate data types, further narrowing the scope of valid operations, further enforcing checks. Then the "Chemical Reaction" system relies on the Unit Conversion System (and others), defines larger, stricter aggregates, further narrows the scope of valid operations on those aggregates, etc. It's built in successively strict layers working on successively more complicated "aggregate components" -- not really anything like mixins.
Thanks for the links. SpaceTimeDB is not really what I'm talking about but it's sorta close-ish.
I will say, based on what you've shared, you and I appear to have fairly similar approaches / philosophies to design - which I don't often find. I also agree this thread is way deeper than I planned, but it's a topic I really enjoy so here's another reply. :)
Regarding the choices I listed above I'm firmly in the Static Typing / SQL / TypeScript / Fixed Schema / "has-a" / make-the-impossible-unrepresentable camps. Which, excluding the last one, I believe aligns with your views of the word. (I do also agree that a statically / optionally typed clojure would probably be my preference. And I haven't spent much/any time yet with clojure's spec). I mainly listed them with alternates as those alternates are valid choices at times even if they aren't my choice and I think I can defend why those I listed are better (and it seems you agree). But the other reason I listed them is that ECS clearly falls in with the dynamically typed / NoSQL / ... camps. It is structural typing (or duck typing). But if I go that route how do I enforce that my modeling is correct? Or in a computer game example, if I model my chat messages as entities, how do I ensure that a chat message can never "hold a grenade launcher" via a component? Clearly an error in modeling. And if the answer is "write good code", then that's fine, but it's definitely the dynamically typed approach to solving the problem which isn't my usual preference.
For me that's an important part of modeling - making invalid state unrepresentable. So if I were actually going with ECS (or ECA), I'd actually go ahead and build a type system on top of it. I'd build a meta-component which is a name of a type and it would specify via statements (using your terminology) this type named X must have these components and can optionally have these other components. Any other components by default it can't have. A type (specified by the type meta-component) would be a collection of components. And entity must have an entry in the type component specifying which other components it can and must have. But then it brings the next question, can a type refer to other types? If I declare a type A that has 30 components and a type B that has 20, and I now want to declare a type C, do I have to list out all 50 components? Or can I say type C has all of Type A's and Type B's components? Once I've now decided to include these types that reference types (or higher order / higher-kinded types), and I include the most basic two ways of combining them sum/product, or enum/struct or tuple/discriminatedUnion or which ever names you want to call them by and I have a full support for ADTs I'd be happy. But then the pragmatist in me would ask now that you've added a fully structured type system on top of your ECS/ECA have you now lost all of the flexibility that ECA gave you? I don't know the answer to that. At this point, we've reached thought exercise for me so I'd have to do it and see. But It would be very tempted to explore to to answer those questions and decide do I stick with ECS or include typing on top to better ensure my system adheres to my business domain constraints.
So jumping back up the stack a bit, the C2 Payroll list is a great real-world list of the style of things that people often ignore. However, I think things go deeper than that. That list is a collection of condition action pairs. If an entity has this situation then treat it like the following. That is a big part of of business logic. But it ignores the other part. Which entities are allowed to have which situations. For example I want to ensure that in my system a non-union worker can never have the UPGWA old-timer pay. Funnily enough that's actually one of the things I was playing around with in clojure from concepts of Out of the Tar Pit.
> After being sick for 3 days (when your pay comes from your normal account), you go on disability, where your pay continues but comes from another account. After N days you only get 60% of your pay, unless you come back and go out on a different disability.
You see, for this hard to describe constraint I see the same problem as you do with it. But my approach was the exact opposite. Defining constraints in the data layer is great - but only a subset of constraints can be easily defined in a relational data layer. Your approach was push everything to the application layer. My approach was add support for way more complex constraints in the data layer.
I'd model the payroll constraint above as: have your only base table be statements of employee X was sick on Y day. Have a derived table / sql view that takes the last 3 days of entries and sees if it was 3 days ago, if so in an 'out of the tar pit' derived-data sense or a Functional Reactive Programming sense enable a field on this derived table that says they are now on disability. Another set of code or table that says 'if that field is enabled then reference the alternate account'. To do this you need support for so many more constraints than basic SQL support now. So that was my approach build in support all of these complex row level, multi-row, and full table constraints into your data system (or just above it) so that your domain language could be expressed as constraints on either you base statement table, or constraints on your derived data / materialized view tables. And yup with each insert, modify, delete in an FRP/trigger style all of your relevant derived tables would be updated and their constraints checked. It turned out that once you start deriving data the constraints aren't too complicated, lots of map, filter, reduce and the rest across your table rows (yes you could express the same things in SQL but I was playing with clojure and the functional approach here seemed cleaner). But this project never really made it past 1 weekend of playing around. It was promising, but I will also say it was solving a slightly different problem which is "how do you get state out of code?" - and that was via dynamically updating derived data (similar enough to FRP).
So all in all, if I'm understanding you correctly, you and I may both be seeing the same problem and approached it slightly differently. But I would agree it is a area where there is a lot of potential for improving software quality and modeling. But maybe you went a different way with your ECA and building relational types on top of it? I don't quite follow that final paragraph you wrote about arithmetic, but will think it over some more.
(I loved this conversation by the way, so thanks!)
Just a couple last major things:
> make-the-impossible-unrepresentable ... ECS clearly falls in with the dynamically typed / NoSQL camps
I see why you think that, but I disagree: you still make the impossible unrepresentable. The difference is that "impossible" depends on your perspective -- on the "domain layer" you're working in. Lower-level domain layers are less strict than higher-level ones. Negative numbers are really useful -- your Number System should allow it; it's up to the higher-level consumers of your Number System to decide whether negative numbers should be possible in their ruleset. So type InventoryReading { date, product, amount } can verify that amount is not negative. It can even define amount to be a PositiveInteger if it wants; that's fine. But what is or is not "impossible" turns out to be extremely dependent on your perspective.
> So if I were actually going with ECS (or ECA), I'd actually go ahead and build a type system on top of it...
You're actually overthinking a bit here. No need for higher-kinded types. Just make structs-of-structs-of-structs. If you have a "Facility System" that works with "Facility Models" like { USPostalAddress address, IEnumerable<NaicsCode> naicsCodes, ContactInfo operatorContact } (all strongly typed), then the Facility System can define a bunch of operations by relying on the USPostalSystem, NaicsSystem, and ContactSystem. It can do its own checks and enforce its own rules, do its own translations, etc., but it does that on top of passing each smaller object to its (generally dependency-injected) subsystems.
Those subsystems can in turn rely on other subsystems. The lower level subsystems are less strict -- maybe the USPostalSystem allows PO Boxes, but your Facility System does not consider that a valid address for its uses. So USPostalAddress could even have a method like IsPOBox() -- although even better might be something like IsPhysicalBuilding() or something, so nobody else has to even know what a "PO Box" is. Or to go further into making invalid states unrepresentable, you could have "PhysicalAddress" as the restrictive type here, and your AddressSystem implementation(s) would know how to translate that into their less-strict type after their own checks (can't be a PO Box).
Of course there's an obvious question if we loop way back to what started this whole thing: "but aren't you categorizing things as a Facility right now?"
Naming things is hard. Really hard. The important thing is how I think about what "Facility" represents -- a (rather complicated!) statement that can be made about an entity. Deep-rooted in the philosophy of this design is the idea that I never consider that list to be the comprehensive list of all things anyone might call a "Facility", nor must everything in that list meet all the requirements that anyone might have on a "Facility". It's really just a collection of statements I want to make about some entities. I'm very aware that this definition of "Facility" is highly opinionated and volatile, and that many other people will have a different definition. In fact, that's largely the point of the architecture. So nowhere still have I defined hard boundaries for a "Facility" category. Nobody familiar with this design should be surprised if they encounter a competing definition of "Facility" -- we can handle both! It's just the names that are hard.
Thanks for sharing thoughts, it's been a great discussion. I'm gonna call it an end here (otherwise I could go on forever), but if you do ever end up putting up a blog post on it, please do share!
This is an interesting idea, because I haven't run across this paradigm before. Can you please flesh it out a little bit further, so I can better understand it?
So, I build a table (call it "transactions"?) that contains all of these, without repeating any of the similar fields. (I'll note that that's still categorizing data; I'm not sure what that implies about your original point.) There will, of necessity, be fields that are different across the two transaction, um, types. How do you handle those?
What happens when your unified table gets so big that it affects performance? It seems to me that separating the two types of transaction into separate tables at least postpones that day.
What about presenting different types of data to end-users? Business logic dictates that accountants and analysts will want to treat these types of transaction differently.
How easy is this design to work with and reason about? I'd fear it comes at the cost of monster queries which align poorly with business logic. I'd be worried that makes said analysts dependent on the db architect to construct their queries for them, which isn't a situation I'd prefer. Is that, in practice, not a concern?
Which of those real-world concerns am I over-valuing?
I probably should sit down and write a comprehensive blog post about it sometime instead of dropping pieces of it in HN comments.
> So, I build a table (call it "transactions"?) that contains all of these, without repeating any of the similar fields.
That's still kinda just re-categorizing things. It's more like: identify the groups of fields that PurchaseOrders and SalesOrders have in common, and make that group a table (or possibly more than one).
> How easy is this design to work with and reason about? I'd be worried that makes said analysts dependent on the db architect to construct their queries for them, which isn't a situation I'd prefer. Is that, in practice, not a concern?
It's much easier to work with and reason about long-term for software developers, in my opinion. It's much harder to work directly in the database, which is a deal-breaker for some. Analysts are not constructing their own queries without help from a lot of views; but you have more opportunity to custom-tailor those views to their specific needs. Frankly, I don't think business analysts should be directly hitting the database -- it is a persistence layer, not an application layer. It'd be like demanding that analysts can specific memory addresses -- it's just the wrong layer of abstraction for them. "Canonical" functionality is added piecemeal in layers by various domain "library"-like layers. Analysis can pick the libraries that they need (more likely, devs set this up for them).
> I'd fear it comes at the cost of monster queries which align poorly with business logic.
Yes, queries are bigger and include more JOINs, but they are more aligned with the business logic, not less. I've done it with table sizes on the order of 100 million records (SQL Server), and it's just fine, but I haven't gotten into the real "big data" sizes. Databases are very good at joins with proper indexes nowadays. If you're in "big data" size, I don't see a reason why table-per-noun is any better of a starting point for Apache Spark or whatever you're doing.
> Which of those real-world concerns am I over-valuing?
In my opinion, you are over-valuing direct access to the database tables with no business logic applied on top at the application layer. That's what the application layer is for.
> I probably should sit down and write a comprehensive blog post about it sometime
Please do! I enjoy reading new approaches to apparently "settled" fields. I'm not sure I agree with you yet, or (properly speaking) quite understand what you're doing, but I'd be fascinated to hear more.
Positive quantities on sales orders decrease inventory. Positive quantities on purchase orders increase inventory.
Closed sales orders increase cost of goods sold and accounts receivable. Closed purchase orders increase assets and accounts payable.
You could create one uber-record that could represent all of these different assets, but it wouldn't necessarily be a good idea, because my clients and their accountants conceive of them as separate entities, and I need to make them legible to my clients as separate entities.
I create schemas that store/make legible data beyond a double-entry general ledger.
Would you say that saying that Argument X belongs to category (True/False) is immediately fallacious? And that therefore your last sentence is self-contradictory?
I think I see where you're going, but I don't think I've quite made an argument of the form that I am claiming is fallacious. Reality is complicated and it's hard to even speak English without relying on categories at all. The complete rejection of categories is enough of a departure from standard reasoning that I'd have a lot of work to do to show that it actually leads to a non self-contradictory theory. But such an exercise is kind of same kind of pedantry that I'm arguing against: an attempt to apply math to English language and then demand everybody follow your math. It's just not useful to do that. I can't really use philosophical arguments to prove that philosophy is not useful.
Person A: "I am highly critical of society for the following nine reasons..."
Person B: "Curious! You are critical of society, and yet you live in it! Hypocrite much?"
Person A: "Yes I do also need to buy groceries to live."
Person B feels like they gotcha'd Person A, but really, they haven't. Our conversation went like this:
Me: "Categories are stupid and you basically never need them. Stop trying to categorize things."
You: "Curious! You hate categories and yet you used them implicitly in your statement about how bad they are! Hypocrite much?"
Me: "Yes, I do need to speak English for anyone to understand me, and I suppose one could argue that English words are themselves categories."
Almost every exercise in categorizing things or arguing about what category things belong to is pointless. Saying "aha, but you yourself are categorizing arguments as being about or not about categories!" feels like a weak nitpicky gotcha to me.
In fact I think the pointlessness of this very exchange is evidence in my favor.
It's not necessary. Nouns-as-categories is a misunderstanding of language; an extremely common mistake that people make when studying language, but never when fluently using language. Half of philosophy is based on (and rendered completely pointless by) this mistake. It's just a case where people's own intuition about how they're using their own language is faulty. Perhaps I should have said "it's difficult to use language without being accused of relying on categories". Ontology came way way later in human thought than natural language did.
Rationalists really need a course in mid-century sociology/metaphysics, specifically grounding[1], typification, and reification.
It's incredibly frustrating to watch them go on about being rational and also being only obliquely aware[2] of the patterns of behavior that sociologists have identified decades ago. When Scott illustrates the Solomon conversation in section I, he's clearly talking about two largely overlapping categories which have different groundings and those groundings are precisely the cause of the disagreement.
Typification in particular is the subject of section II. And grounding again is the central focus of the rest of the sections on maps, gender, &c.
Scott, in the unlikely event you read HN comments, I'm literally begging you to pick up Categories We Live By, Ásta[3]. Its 154 pages of non-dense explanation on grounding.
2. Inferred by proxy. No mention of these terms by name, but oblique references to the shadows of their ideas. I'm sure there is some limited knowledge diffusion process taking place
> When Scott illustrates the Solomon conversation in section I, he's clearly talking about two largely overlapping categories which have different groundings and those groundings are precisely the cause of the disagreement.
But the author is clearly aware of this; he just doesn't call it grounding. So your argument seems to be "rationalists aren't thinking about this correctly because they're not using the correct words that other people invented". Which I find deeply ironic given the subject of the post.
A useful take might be "the difference between Soloman and the biologist is sometimes called grounding; here's some more info on it". But instead your take was "Rationalists are going on about something they don't understand and need to read more philosophy", which is arrogant and wrong. Nobody needs to read more philosophy; it does not actually help you think about the world in any way.
Taking the first two sentences of your first link:
> Consider the following claim: the truck drivers are engaged in a labor strike in virtue of picketing. What sort of claim is this?
Who cares what sort of claim it is? Seriously? Why is that at all a useful question to ask? Here we are talking about categories and whether they're useful, and that link is saying "it's very very important that we categorize this claim correctly". No, it's not. That question does not help anyone come to the correct conclusion about life in any way. And then you come along with an addendum "and if you can't correctly categorize this claim, then you should not be discussing rationality." Come on.
Sorry, but I don't have time to wade through idiosyncratic terminology. Scott's a big boy, he can use big boy words. Besides, pawning off cognitive load on the reader is such a lowball move.
> So your argument seems to be "rationalists aren't thinking about this correctly because they're not using the correct words that other people invented".
You could always ask what my argument is.
My claim is that we can cut to more interesting discussions by saying, "We're talking about the rationality of groundings. The utility/consistency/feelings/whatever the biologist gets out of scientific taxonomy grounding is X, and Solomon's utility/consistency/feelings/whatever in his groundings are Y." Saying basically, "groundings exist" isn't very insightful. Saying, "groundings are a product of Z" or "here's a framework for rational grounding creation" is actually interesting.
Using the full palette of words allows us to get to these interesting discussions more quickly. If you disagree then I'll challenge you to reply in hulk-speak.
You're not asking him to use "big boy" words, you're asking him to use your preferred set of terminology from a particular branch of philosophy that you happen to be familiar with, and fuck other philistines who haven't read exactly the same philosophy articles as you have. But of course, he should not use any words that you specifically are not already familiar with, otherwise he'd be a poser acting needlessly erudite. You want him to start from exactly where your specific mental model is and go from there, skipping all the parts you're already familiar with. Basically, you're the main character?
I don't have much faith that you would be able to rephrase my argument. You're welcome to try, but so far every retort is against an argument I haven't made.
There are so many wild, interesting, contradictory and overlapping concepts in the world. This article covers a few of them using normal people language.
Arguing about niche academic word definitions is just so, so boring. An act of supreme mediocrity.