> Most of the time you don't need a different Python version from the system one.
Except for literally anytime you’re collaborating with anyone, ever? I can’t even begin to imagine working on a project where folks just use whatever python version their OS happens to ship with. Do you also just ship the latest version of whatever container because most of the time nothing has changed?
If you're writing Python tools to support OS operations in prod, you need to target the system Python. It's wildly impractical to deploy venvs for more than one or two apps, especially if they're relatively small. Developing in a local venv can help with that targeting, but there's no substitute for doing that directly on the OS you're deploying to.
I don't really understand the point around error handling. Sure, with structured outputs you need to be explicit about what errors you're handling and how you're handling them. But if you ask the model to return pure text, you now have a universe of possible errors that you still need to handle explicitly (you're using structured outputs, so your LLM response is presumably being consumed programmatically?), including a whole bunch of new errors that structured outputs help you avoid.
Also, meta gripe: this article felt like a total bait-and-switch in that it only became clear that it was promoting a product right at the end.
In my experience the semantic/lexical search problem is better understood as a precision/recall tradeoff. Lexical search (along with boolean operators, exact phrase matching, etc.) has very high precision at the expense of lower recall, whereas semantic search sits at a higher recall/lower precision point on the curve.
Yeah, that sounds about right to me. The most effective approach does appear to be a hybrid of embeddings and BM25, which is worth exploring if you have the capacity to do so.
For most cases though sticking with BM25 is likely to be "good enough" and a whole lot cheaper to build and run.
Depends on the app and how often you need to change your embeddings, but I run my own hybrid semantic/bm25 search on my MacBook Pro across millions of documents without too much trouble.
Doesn't this depend on your data to a large extent? In a very dense graph "far" results (in terms of the effort spent searching) that match the filters might actually be quite similar?
The "far" here means "with vectors having a very low cosine similarity / very high distance". So in vector use cases where you want near vectors matching a given set of filters, far vectors matching a set of filters are useless. So in Redis Vector Sets you have another "EF" (effort) parameter just for filters, and you can decide in case not enough results are collected so far how much efforts you want to do. If you want to scan all the graph, that's fine, but Redis by default will do the sane thing and early stop when the vectors anyway are already far.
HNSW indices are big. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.
> You use REINDEX CONCURRENTLY.
Even with a bunch of worker processes, how do I do this within a reasonable timeframe?
> How do you think a B+tree gets updated?
Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.
Interested to hear more about your experience here. At Halcyon, we have trillions of embeddings and found Postgres to be unsuitable at several orders of magnitude less than we currently have.
On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.
How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?
I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.
We host thousands of forums but each one has its own database, which means we get a sort of free sharding of the data where each instance has less than a million topics on average.
I can totally see that at a trillion scale for a single shard you want a specialized dedicated service, but that is also true for most things in tech when you get to the extreme scale .
Thanks for the reply! This makes much more sense now. To preface, I think pgvector is incredibly awesome software, and I have to give huge kudos to the folks working on it. Super cool. That being said, I do think the author isn't being unreasonable in that the limitations of pgvector are very real when you're talking indices that grow beyond millions of things, and the "just use pgvector" crowd in general doesn't have a lot of experience with scaling things beyond toy examples. Folks should take a hard look at what size they expect their indices to grow to in the near-to-medium-term future.
Ok, but what was the cost of labor put into curation of the training dataset and performing the fine-tuning? Hasn’t the paper’s conclusion been repeatedly demonstrated - that it is possible to get really good task-specific performance out of fine-tuned smaller models? There just remains the massive caveat that closed-source models are pretty cheap and so the ROI isn’t there in a lot of cases.
No offense, but this software engineering elitism does no favors to perceptions of the field. In reality, most other fields are complex and the phenomenon of believing something is simple because you don't understand it is widespread across fields. Dan Luu expounded on this at much greater length/with greater eloquence: https://danluu.com/cocktail-ideas/
There are also international tourists who may have different local parking rules than the ones in SF. Having clear demarcations between allowed and non allowed parking areas makes it easier for everyone to follow the correct rules.
Do you have an rss feed of road rules piped into Anki cards or what?
Or just maybe "driver's license is a privilege that requires you to study and know the rules of the road" is a fallacious claim that rests on pedantic legal formalism and an impoverished sense of human psychology.
No, I don't; there are plenty of places you can't legally park that do not have painted curbs or "No Parking" signage. Do we also need curbs and signage near every fire hydrant? How about every driveway? Can drivers double-park anywhere they want? Should they yield to pedestrians in crosswalks? Etc. etc.
Reviewing the thread, the context is newly enforcing something that might not be illegal in all jurisdictions. Citing different contexts where signage is not always used doesn't change the fact that the discussion focuses on change in common practice in a specific context. In fact, I observe plenty of signage for fire hydrants and driveways in places where people commonly make parking errors.
The question still stands. How do you ensure you detect changes rules of the road in order to maintain your privilege?
>>>>>>> Where I live, many people park at intersections right up to the curb
>>>>>> This is now illegal in some states
>>>>> It's illegal in California but in San Francisco official policy is to not enforce this law.
>>>>> If there's no red paint on the curb, they won't ticket you.
>>>> It was ridiculous that they were originally proposing ticketing people without there being signage that it was illegal to park there.
>>> Why? Having a driver's license is a privilege that requires you to study and know the rules of the road.
>> Do you have an rss feed of road rules piped into Anki cards or what?
> No, I don't; there are plenty of places you can't legally park that do not have painted curbs or "No Parking" signage.
It isn't bizarre that rules and practice vary widely in different cultural contexts. Even your claim is caveated as "most of EU," recognizing that it might not be the same in all places.
In many places in the US, there is a culture of legibility, whereby informational affordances are relevantly and generously provisioned. This allows for more certainty for both facility users and rule enforcers. On the flip side, there are a lot of signs all over the place.
I couldn't agree with this more. I don't think the majority of problems with vector search at scale are vector search problems (although filtering + ANN is definitely interesting), they're search-problems-at-scale problems.
Except for literally anytime you’re collaborating with anyone, ever? I can’t even begin to imagine working on a project where folks just use whatever python version their OS happens to ship with. Do you also just ship the latest version of whatever container because most of the time nothing has changed?
reply