Gitlab Duo

remram · on April 22, 2024

From that page:

> Will my code be used for training AI models?

> GitLab does not train generative AI models based on private (non-public) data. The vendors we work with also do not train models based on private data.

So they will steal your code if it is public, ignoring license. Understood.

pryelluw · on April 22, 2024

I’ve been feeling more and more that we should reconsider open source licenses to protect our rights as authors from LLM vendors. I’m not against them basing their models on my work. I do have reservations on them generating their models on my work and creating derivative works without respecting the license. It’s why I’ve started to use the MIT license less and less and have adopted various from the GPL family. Models trained in OSS should be open sourced IMO.

remram · on April 22, 2024

AI companies believe copyright doesn't apply and all this is fair use or transformative enough. If copyright doesn't apply they are not bound by license terms either.

yencabulator · on April 22, 2024

If they're not respecting MIT's attribution requirement for derived works, why would they respect GPL?

pas · on April 22, 2024

They argue their use is transformative and thus the works are not derivatives.

pryelluw · on April 22, 2024

Do you mind explaining further ?

AnarchismIsCool · on April 22, 2024

Not much to explain, that's just how Fair Use doctrine is currently interpreted. It's not really morally right in this context ("greater good" rationalists will argue with me on that) but so far the courts haven't spanked anyone too hard for using that loophole.

radarsat1 · on April 22, 2024

They take publicly accessible code and learn to predict good token strings from it to solve related or unrelated problems. I am not convinced this is not morally right.

What they are not doing, as far as I can tell, at least not intentionally, is just copying code and removing the license and attribution. Even if you don't see the AI output as legitimately "creative", it's certainly transforming and remixing existing solutions in ways that are transformative enough not to be just copies of anything except the most bog standard boilerplate, which is usually not a copyright issue.

People seem upset because there is money to be made, if there was no money involved I don't think anyone would see any issue here.

It's not about the greater good, and I don't mean to be an apologist, but I just legitimately don't see a problem with doing it, or how it's not fair use. It really does seem like fair use to me.

xigoi · on April 22, 2024

They take code with a license that prohibits using it to create preprietary software and use it to create proprietary software. It’s not about money.

radarsat1 · on April 22, 2024

But it's not forking or linking to any of that software. It's "using it" only in the sense of reading the code as examples of how to do things after mixing it with a million other sources. Sorry but I just don't see the problem with that. Very different, categorically, in my mind from what is intended by software licenses when they talk about derivative works.

oooyay · on April 23, 2024

One simple solution is just don't open source. A lot of folks forget that many open source licenses were produced to be "business friendly" to encourage participation by the private sector. The private sector, however, has largely done the bare minimum unless profit was involved. Then they'd either co-opt the project or fork it to their own ends in pursuit of profit with minimal returns to the original project WRT their gains.

That's a long way of saying, for what open source is and was, a lot of projects are mislicensed when juxtaposed to their goals and needs. It's okay to sell your software, it's okay to be "source available", it's okay to be entirely proprietary.

toastal · on April 22, 2024

https://github.com/non-ai-licenses/non-ai-licenses

These aren’t official (& hosted on a different, proprietary data siphoner), but there are attempts to add anti-AI clauses.

Zambyte · on April 22, 2024

Assuming copyright licenses are actually meaningful for this "anti-AI" clause, the only license that this actually changes an AIs ability to comply with is the CC0 license. All the rest are already violated by an AI agent (they all require attribution that an AI does not comply with).

And the CC0 license seems extremely contradictory with this added clause. I would definitely stay away from all of these licenses.

roywiggins · on April 22, 2024

Companies are busy training their LLMs on totally unlicensed material. I'd be curious to know how much weight, legally speaking, anti-AI clauses can have. Naively I would expect no license should be maximally restrictive on what you can do with something.

bastardoperator · on April 22, 2024

The license in your repo is entirely meaningless when you grant an exclusive license to places like GitLab and GitHub to "display" your code.

remram · on April 23, 2024

That's a wild thing to say. Displaying is a very specific thing, having code displayed to you does not grant you any copyright or ownership over it (imagine if movies worked that way...)

bastardoperator · on April 29, 2024

It's so wild nobody has been able to successfully challenge it in court.

arwineap · on April 22, 2024

Given that it's public, yes, gitlab as well as every other model on earth will be trained from it.

Is it right? I don't know. Can we stop it? Absolutely not.

Honestly, I find it comforting that gitlab is at least being straightforward about it

bondarchuk · on April 22, 2024

I would not call that straightforward. Being straightforward would mean including something like "we & our vendors might (or will) train generative AI models based on public data", instead of letting us infer that for ourselves.

freedomben · on April 22, 2024

Straightforward? I very much disagree.

You could maybe convince me that it is straightforward when measured against only corporate speak, but if not looked at as corporate speak it is very much not straightforward. Straightforward would be an explicit "Yes, we train from public code regardless of the license"

danielmarkbruce · on April 22, 2024

The only statement they can make which is accurate is that they will not train on private data.

They might train on your data if it's public. If your code is public, and it doesn't get pulled into their dataset, they won't train on it.

cbsmith · on April 22, 2024

> Is it right? I don't know. Can we stop it? Absolutely not.

Depends on what you mean by "stop it". Patents are public, but it's hard to employ them for commercial purpose without an agreement from the owner.

jacooper · on April 22, 2024

This is for the supreme Court to decide, till then, its assumed to be fair use by these companies.

rickydroll · on April 22, 2024

I don't think licensing issues are important in this context. I have never gotten code from an LLM system that wasn't something I would create myself if I taken the time to write it manually. You should be thinking of LLM coding assistance as a junior programmer working for you with all of the pluses and minuses. If someone claims they found code from open source project in the output of an LLM, it is most likely the result of over constraining the specification i.e. I want you to make me painting that looks just like the Mona Lisa.

golergka · on April 22, 2024

If a person can read it and learn from it, AI should be able to do the same.

jayd16 · on April 22, 2024

If it's patented they can't use it...

If it's a copy left license I would say that at that point anything made by the ai should be copy left as well.

Just because you think a machine should have the right to use open source code doesn't mean the author or the license agree.

SomeHacker44 · on April 22, 2024

You can still read and learn from something patented. That is, after all, the whole point of a patent!

You can also read and learn from copyleft and use that learning elsewhere.

Remember, copyright does not apply to ideas. It applies to a specific individual instantiation. In theory.

oblio · on April 22, 2024

There's a valid moral question here, in short: why?

I care about helping other people.

Helping some type of amoral entity that's usually controlled by another amoral for-profit entity: why would I help those?

thiht · on April 22, 2024

Then use a license that say your code can't be used to train AI models.

Honestly I agree that in the current state of things, it's fair for them to train on public data as long as the license doesn't forbid it. I learned to code this way, so it makes sense to allow AIs to do the same, for now. If it's a problem, just use another license.

oblio · on April 22, 2024

The problem is that there are software evolutions not foreseen by older licenses. Cloud providers, AI. Relicensing can be non trivial. Choosing the right license is non trivial.

Who can say for sure which license is good in this new age?

thiht · on April 23, 2024

Oh easy, there’s not appropriate license yet. Some organizations need to step up and create a license that forbids AI training. But it will probably not happen because of "but is it libre open source" useless debates.

xigoi · on April 22, 2024

Atmost all open source code has a licence requiring attribution. Where is the attribution?

thiht · on April 23, 2024

When you read some source code to learn how to do anything, do you attribute it? It’s the same thing, attribution doesn’t apply to training material.

You want to explicitly forbid AI training, not require attribution.

radarsat1 · on April 22, 2024

If my code is included in a training set which then helps someone else solve a problem, then I am helping other people.

Why should I care if some 3rd party is the middleman that builds the system that performs the transformation? I put out my code so that people can use it. The existence of said 3rd party doesn't restrict that in the slightest, in fact it enhances the reusability of my work which seems like a win-win.

golergka · on April 22, 2024

Why would you call these entities amoral? Through vcs and pension funds they are owned exactly by the people you want to help.

oblio · on April 22, 2024

VCs?!? :-)))

Pension funds I'd kind of understand, on the other hand there is a huge generational conflict, the pensioners that rely on this in many places are actually the comparatively rich older people.

golergka · on April 22, 2024

Who do you think gives money to VCs, private equity and other finance boogeymen? It’s pension funds, banks and other institutions that hold regular people's money. Of course, older people have more savings, but its still the same people that you wanted to help in the first place, aren't they?

EVyesnoyesnoyes · on April 22, 2024

Because you get to use it too?

Llama, mistral and co.

Zambyte · on April 22, 2024

Not all models have Free weights and not all models that are Free can run on your hardware.

Normally the "can't run on your hardware" is a little dubious, but given how insanely high the hardware requirements can get to run these I think it's meaningful here.

EVyesnoyesnoyes · on April 22, 2024

I'm still here to create a product/tool/script/solve something and not just to write code.

If anything i do helps to make the ecosystem around me better i'm for it.

And if you look at anything LLM right now, all the research happening, its not opensource doing the research. Its high paid google, ms, etc. people and academy replicates it fast for opensource.

We already have a give and take on both sides.

Without them, we would not have that at all.

Zambyte · on April 22, 2024

> And if you look at anything LLM right now, all the research happening, its not opensource doing the research.

I assume you mean open weight rather than open source, but: https://ai.meta.com/research/

remram · on April 22, 2024

That's a claim not an argument. Why do you believe this? Why should we?

freedomben · on April 22, 2024

I agree. Given the effort that has been put in to ensure code isn't copied verbatim, I don't see much of a philosophic difference between a human doing it or AI doing it.

That said I think it's weasely that they would just explicitly say it. It's obviously a question a lot of people have if it's in the FAQs, and a non-answer like, "we never train on private code" is ridiculous.

xigoi · on April 22, 2024

People are not software.

ikari_pl · on April 22, 2024

i would prefer it to learn from better verified solutions, reviewed and battle tested.

Instead I get AI companions writing comments like "This code is unintelligible and batshit complicated, kill it with fire" when I was hoping for a doc string

calvinmorrison · on April 22, 2024

arghwhat · on April 22, 2024

Just in case people didn't know, the phrase "all rights reserved" does not have any legal consequence (in the early days of copyright people felt it was necessary, but all rights are automatically granted), and uploading content to Github means that you must also license it to Github and other Github users as per the ToS. It is your responsibility to ensure that you can grant that license.

https://docs.github.com/en/site-policy/github-terms/github-t...

In addition, one has to consider whether the act of studying a project whose sources you have legally obtained to gain experiences applied in other contexts where the original license may not apply or be upheld is an issue. This applies to humans as much as to ML.

calvinmorrison · on April 22, 2024

> You grant us and our legal successors the right to store, archive, parse, and display Your Content, and make incidental copies, as necessary to provide the Service, including improving the Service over time

A couple things - you agree github can do it, ostensbly taking archives for backups sure, but in this case it's to distribute out to some 3rd party and explicitly to be locked in a vault with no business purpose.

My personal uploading of say - a GPL licensed code, or hell, microsofts leaked code DOES NOT magically grant github to do whatever they want with it. There's already a license, no?

from the github archive page: > Archiving software across multiple organizations and forms of storage helps to ensure its long-term preservation. "The Arctic Code Vault"

> On 02/02/2020 GitHub captured a snapshot of every active public repository. Those millions of repos were then archived to hardened film designed to last for 1,000 years, and stored in the GitHub Arctic Code Vault in a decommissioned coal mine deep beneath an Arctic mountain in Svalbard, Norway. Our partners include the Long Now Foundation, Software Heritage, the Internet Archive, Microsoft Research’s Project Silica, the Arctic World Archive, GHTorrent, and GHArchive. Our advisors include both technological visionaries and world-renowned experts in the humanities.

so, like, I am sure it was vetted by lawyers, but my plain english understanding of that is github may need to archive it, whats not clear is that github may send it to any number of non profits to do with it what it pleases.

Then you've got software heritage who literally just scrapes the web for cgit repos and copies them without any agreement or license between you as the host and software heritage.

Anyway, it's all very Uber-model. Fuck the legality, we're just gonna do it and fuck it because the worst that can happen is a slap on the wrist.

arghwhat · on April 22, 2024

I agree that Github has rights to parse any uploaded code, and any code with a license that would allow a human to study and learn from it does, in my opinion, also allow a machine to study and learn from it.

Granted, the machine has to learn to not directly plagiarize in the same manner humans are not allowed to when they use acquired knowledge - unless the license allows it of course - but the act of studying material that is allowed to be read cannot be considered harmful.

ff317 · on April 23, 2024

There's two issues here still, IMHO:

1) The LLM owners really can't guarantee that it won't directly plagiarize without attribution or licensing. Your code may contain a unique algorithm or method for solving something, and when someone asks the right question, your code may simply be the only answer it knows to give.

2) While the code being used as training input was open source and visible to the public to learn from, the models being built often aren't. It seems unethical to train from public data yet keep the resulting weights private and charge for access to use the trained weights.

arghwhat · on April 23, 2024

For the first aspect, neither can a human, and it's incredibly hard to decide if something is plagiarized or fair-use/inspiration. There are several things to consider:

1) These tools are generally used in a pair-programming fashion, and in that function, the output can be considered similar to when you ask a coworker on Slack and they paste you a snippet, or if you browse github and read someone else's implementation (without having the LICENSE text within your field of view at all times). A possible violation would then only occurs once the snippets in question are included into your code-base and distributed in ways that violate the original license.

2) One could argue that sharing the snippet with you was a form of redistribution, but I would not consider this to apply if a human did it and would therefore not apply it to machines either, and I do not think that is what people generally consider redistribution of an open source project. GPL technically has a clause second-hand violations, but I do not think that one holds.

It should also be noted that licenses like MIT only require the copyright and permissions notice included in substantial portions of the program, and so smaller snippets are always fine. Humans also do not bother attributing smaller copy-paste blocks - we'd run out of storage linking to all the stackoverflow answers!

3) The issue gets a bit hairier when the machine reproduces large/important portions of projects with no hint as to its source, license or ways to do proper attribution, but even then I'd consider the violation to occur only if included verbatim into a project which is then redistributed under incompatible terms.

4) Even when code is largely identical, it generally only an issue if the code is a unique invention, not if the code trivially follows for a skilled individual of the trade. That's a principle in the practice of many laws, including patent law.

For the second aspect, I do not see any importance to the fact that the trained model is not public. A person studying open-source projects do not upload a brain dump afterwards, and others only directly benefit from their experience (their "weights") if they decide to teach the subject. Nor is every project they write afterwards with their knowledge necessarily open-source, only being public if they want to make them public. Licenses generally do not restrict private or internal usage, including modification and derivative works. It is redistribution they trigger on (with some catches for things like AGPL).

(I would of course like the model to be public for the betterment of mankind, but that's different from the legal aspect of it.)

remram · on April 22, 2024

I don't understand, software heritage doesn't make derivatives or remove the attached license.

dustyharddrive · on April 23, 2024

They can't be bothered to allow opt-outs (https://news.ycombinator.com/item?id=39771318) or even attempt to check if code they're archiving is freely licensed (https://www.softwareheritage.org/faq/#25_Is_the_code_checked...)

It's like Common Crawl, another non-commerical mass scraping project just benevolently stealing creations for "AI" companies.

remram · on April 23, 2024

Their content policy says you can:

> To request the removal of a content from the Software Heritage archive, you must file a formal request containing all of the following informations:

> (...)

> Please send your request by e-mail to takedown@softwareheritage.org

https://www.softwareheritage.org/legal/content-policy/

The HN thread you posted even shows that they contacted the forge's admin week in advance to check for possible concerns.

dustyharddrive · on April 24, 2024

Those are instructions to send a copyright infringement notice with tons of PII. We should have higher standards for "opt-out" than that, even for non-consensual data vacuums.

ccgreg · on April 23, 2024

We like calling Common Crawl a crawl, not a scraper. Our 17 year old dataset predates the current AI explosion.

calvinmorrison · on April 24, 2024

it's just piracy bro. My git repo is a portfolio. If U2 puts a music video on their website, it doesnt mean they grant you unending access to download it, back it up, redistribute it, etc. Of course not. If you take my art portfolio on your site and download and reproduce it on the wall, that's

renewiltord · on April 22, 2024

[flagged]

remram · on April 22, 2024

You're obviously not posting in good faith, but their docs are CC-BY-SA, and I comply with the terms. Bro.

renewiltord · on April 22, 2024

No link to license: non-compliant. Less theft, less lying about compliance, more compliance please.

wizzwizz4 · on April 22, 2024

In the US, there's a notion of "fair use": quoting small passages with attribution for the purposes of criticism does not require reproduction of the relevant Creative Commons license. Most of the other former British colonies have the concept of "fair dealing", which is similar.

https://www.documentary.org/feature/negotiating-copyright-ex... makes interesting reading.

remram · on April 22, 2024

The link at the top of this page has the license. In what way can comments in this page, when they clearly state that content comes from the article, not be clearly linking to the article?

xyst · on April 22, 2024

Back to self hosting my code.

Side note: Gitlab is one of the few companies to provide a self hosted version of their VCS.

https://docs.gitlab.com/ee/install/requirements.html

Also gitea is a much more lightweight option and can use sqlite for db. Much easier to deploy on a rpi

petepete · on April 22, 2024

Soft serve is lighter still.

https://github.com/charmbracelet/soft-serve

rigid · on April 22, 2024

not sure how lightweight any of these are, but https://gitolite.com/gitolite/ just needs git and ssh deployed. And it works like a charm.

q0uaur · on April 22, 2024

running all my homelab / private repos off of a tiny gitolite3 server, and i'm always amazed that it all runs on an alpine linux VM that uses 75mb of RAM.

for anything even just sharing with friends or whatnot i won't recommend it though, people do love their web-UI and having to explicitly give someone access can turn off people already.

remram · on April 23, 2024

For a single author, if you have a server with SSH, you don't need Gitolite at all.

The lack of anonymous public access is often a deal-breaker though.

rigid · on April 23, 2024

For a single author you don't necessarily need any server at all. A cloud directory or zip files work well.

But gitolite is so easy to setup & maintain, it's not a big difference and for r/w-access management within teams, it's priceless.

I guess one could even hack anonymous access with "PermitEmptyPasswords yes" and "AuthenticationMethods none"

remram · on April 23, 2024

No Gitolite requires keys.

rigid · on April 24, 2024

Hence "hack". It needs keys for administration but at a first glance, I see no reason why a git-anon user couldn't be part of gitolite's git user.

remram · on April 24, 2024

You might need to use another user if you want to set its shell to `gitolite-shell username` (no command= for password authentication) but then you'd need to chain sudo or something to have Gitolite run under its own user again... Seems very tricky.

Or maybe you can write a shell that runs a gitolite-shell command is its arguments are not already gitolite-shell?

xyst · on April 22, 2024

Ooh this is very nice. Might use this for private projects.

Problem with this is that it’s likely very unfamiliar with most folks. At least with gitea or gl-ce, there’s a familiar interface 99% of developers can use to browse and search code.

Like if I want to share a quick nixOS config, I doubt most people would even be willing to pull the code and only browse via web interface.

Zambyte · on April 22, 2024

I am of the school of thought that you should be using multiple remotes anyways. I don't think there is a good reason not to. Codeburg and, as another mentioned, Sourcehut may be good places to host your code in addition to your self hosted remote.

Radicle[0] is also a really interesting option.

[0] https://radicle.xyz/

remram · on April 23, 2024

I can't wait to check out Radicle again, 1.0 seems imminent. I wonder how usable the average developer will find it.

danirod · on April 22, 2024

> Also gitea is a much more lightweight option and can use sqlite for db. Much easier to deploy on a rpi

Worth mentioning Forgejo as well (https://forgejo.org), which is Codeberg's fork of Gitea. Same features and as lightweight as Gitea. Hard-forked a couple of months ago after some transparency concerns from the new parent company that owns Gitea now.

remram · on April 23, 2024

Isn't it just the same thing with a different name? Why is it worth mentioning at this point in time?

hnarn · on April 23, 2024

Because functionality isn’t the only thing everyone cares about, I use Forgejo for ideological reasons and so do probably most people since it’s mainly an ideological fork.

Why would it not be worth mentioning? The question is quite strange.

remram · on April 23, 2024

The project and community might be worthwhile, but as long as the product is the exact same, I don't see what there is to mention. I'm excited to see what the future holds though, more attention in this space sure sounds good.

Unless my information is out of date?

thiht · on April 22, 2024

If you're concerned, you should switch the license of your code, not really where you host it. As long as your license allows LLM/AI models to train on your code, why would you be mad that it does?

xigoi · on April 22, 2024

My code has a license that disallows using it to create proprietary software, but it seems that proprietary LLMs are still being trained on it.

thiht · on April 23, 2024

Training on your code and using your code are very different things. As a human, if your code is open, I can read it, learn how it works and reuse my knowledge to create proprietary source code without infringing on your license.

xigoi · on April 27, 2024

I am talking about the model itself, not the code it produces. You are not proprietary software.

remram · on April 23, 2024

Almost none of the common licenses allow this, because at the very least they have an attribution requirement. Licenses are being ignored.

thiht · on April 23, 2024

Attribution doesn’t apply on training. We’d all be screwed otherwise, it would mean you’d have to attribute everything you learned while reading open source codes to every codebase you read every time you write code.

remram · on April 23, 2024

Either attribute or don't use it, that's what the license says.

"Doesn't apply" is just opinion right now until settled in court, hopefully the other way.

I'm curious to hear why you think so though? Any other license terms you routinely ignore?

thiht · on April 23, 2024

> I'm curious to hear why you think so though?

As I said: it would mean you’d have to attribute everything you learned while reading open source codes to every codebase you read every time you write code.

itherseed · on April 22, 2024

OneDev is really slim and have all I need to start. https://github.com/theonedev/onedev

strogonoff · on April 22, 2024

Sourcehut is great, and I doubt Drew would take the hosted version in that direction.

Zambyte · on April 22, 2024

Yeah, seems unlikely[0]

[0] https://drewdevault.com/2023/08/29/2023-08-29-AI-crap.html

pjot · on April 22, 2024

Are you concerned at all about the raspi’s sd card corrupting/failing? I’d feel like this would always be at the back of my mind

jsheard · on April 22, 2024

You don't have to run a Pi from an SD card nowadays, and shouldn't do if you can help it. They've supported USB boot for a while and the Pi5 can also boot from an NVMe SSD hooked up to the exposed PCIe lane.

xnyan · on April 22, 2024

As others have said, Pi now supports USB boot. An inexpensive USB to sata adaptor and a 128GB SSD (~$15) is the most inexpensive option to get started and performs very well.

seanmceligot · on April 22, 2024

This happened to me Tuesday September 28th 2001 at 3:30pm. I know because I never rewrote the complicated e-paper code that I lost when my SD card died without a backup :(

xyst · on April 22, 2024

I run my rp5 off of nvme

AnthOlei · on April 22, 2024

Gerrit for really fine grained control

olejorgenb · on April 22, 2024

We've tried this out for some weeks at my company and concluded it's not ready yet.

As someone else mentioned they don't use state of the art models: https://docs.gitlab.com/ee/user/ai_features.html#language-mo... (though they are working on using Claude 3)

The completion plugin for the JetBrain IDEs (pycharm tested) appear to be very barebones, and judging by the results - does not seem to take advantage of the context beyond the current file in a good way. One of my colleges say the 2.x version is better though.

We were excited about the possibilities for MR and issue integration, but using it for this is IMO totally useless at the moment. Last I checked some of this will also only be available to Ultimate customers.

olejorgenb · on April 23, 2024

Upgraded the plugin today. Using the new chat sidepanel just work forever without any results when trying the "explain" and "generate test" commands.

bufferoverflow · on April 22, 2024

Claude isn't open source. Doesn't that, by definition, mean that they leak your code to Anthropic?

carimura · on April 22, 2024

Wow those code comments the AI generated were so helpful!

  // Assign the number 5 to the integer x
  int x = 5;

syndicatedjelly · on April 22, 2024

Examples like this make it clear who GitLab Duo’s target audience really is.

cal85 · on April 22, 2024

oblio · on April 22, 2024

Probably many IBM developers circa 2010+.

nikola52 · on April 22, 2024

did you get that in your workspace?

misnome · on April 22, 2024

I'm guessing this has come up (again?) now because there has been a "General Availability" in the new 16.11 where it's been apparently automatically turned on in our internal Premium instance (and individual users getting nagged about using it after the upgrade). For "Free":

> The GitLab Duo Chat is part of GitLab Duo Pro. To ease the transition for Chat beta users who have yet to purchase GitLab Duo Pro, Duo Chat will remain available to existing Premium and Ultimate customers (without the add-on) for a short period of time. We will announce when access will be restricted to Duo Pro subscribers at a later date.

At least one of our admins interpreted their pinky-promise policy page as something binding; at least from the communications we were sent it seems very hard to verify exactly what they are sending over or protections they've put in place.

It's 2024, not putting that clearly front-and-centre is a deliberate choice.

skywhopper · on April 22, 2024

Digging around in the demo videos and assuming they chose some ideal examples for their marketing material, this doesn't really live up to the hype. The "generate a code review summary" adds stuff like "I estimate my comments will not take much time to implement" when the comments are stuff like "consider using a stored procedure for this calculation". And the "explain this code to me" feature misinterprets things, even in their demo, where it "explained" that the highlighted code was in a method with an entirely different name. The explanation also goes overboard in discussing where the code is located in the project, and skips over stuff like "where does this feature flag come from, exactly?" and is clearly making (reasonable with these names, but method names can often be misleading without context) assumptions about what methods named "listProducts()" and "listSorted()" do.

gregoriol · on April 22, 2024

Gitlab really should stop adding features: the thing is bloated with tons of menus and options that most users don't use or understand. PMs seem to need to do PM, stop them, listen to tech people.

solatic · on April 23, 2024

They have product managers doing product management; you're no longer the audience.

Mature Enterprise products will have so many obscure and random features that you practically need certification to understand all of them; but most end-users are not supposed to understand all of them. They just want to understand the ones that affect them. The fact that the features offered are maybe 50% as good (if that) as the ones offered by competing products is completely besides the Actual Value Proposition available to users who get hired to work inside BigEnterprise, which is the following choice:

  a) Use a 50%-as-good product that is already installed and supported and by internal team;
  b) Take the hard road to try to get a new product in: find the budget for a new product, convince the internal (IT?) team to install and maintain it, get them to integrate it with the rest of the BigEnterprise's systems (at least stuff like user management and sending emails), get the new product through Procurement processes.

Is it any wonder that 99% of the time, BigEnterprise employees choose (a) ?

Product strategy when selling to these customers is: be the default choice, not the best choice. Get internal enterprise users to keep choosing (a). Eventually, so much internal enterprise usage depends on 50%-utility features, that anyway the system becomes nearly impossible to rip out, and then you can start to put in yearly price increases that they can't stop paying.

As long as you have a mainline feature that is best-in-class to get your foot in the door with new customers (so that their users can start using the 50%-utility features), you have a coherent product strategy. GitLab's best-in-class mainline feature is the combination of Git repository hosting + GitLab CI; their closest competitor is GitHub Enterprise which is more expensive (also GitHub Actions sucks), and Bitbucket is a sad joke.

q0uaur · on April 22, 2024

for private use or even small teams it's way too late for that, but honestly it seems they're just focusing on enterprise use and i'm kinda glad something open source and self-hostable for that market exists. we're already wayy too dependent on microsoft, but at least my workplace's code can stay local for now.

firemelt · on April 22, 2024

please raise the price coz I need to pay for features that I don't need and don't use :)

remram · on April 23, 2024

They increased it more than 50% last year [1]

[1]: https://about.gitlab.com/blog/2023/03/02/gitlab-premium-upda...

nikola52 · on April 23, 2024

"Listen to tech people" reminded me of this video https://www.youtube.com/watch?v=oeqPrUmVz-o

a2128 · on April 22, 2024

The only AI I'll ever use is CodeBERG AI: https://codeberg.org/Codeberg-Infrastructure/forgejo/pulls/8...

:)

ametrau · on April 22, 2024

h4ch1 · on April 22, 2024

This PR seems to be an April Fool's joke mimicking how GitHub advertises copilot :)

Hamuko · on April 22, 2024

Is this the thing that keeps sending me pointless AI summaries of code reviews that people have made in our project?

Meanwhile all of the fancy security features, for which we pay a massive premium as they're only included in the Ultimate tier, are completely broken and I have to keep explaining to my team members why GitLab keeps falsely insisting their merge requests are full of security vulnerabilities.

Kelteseth · on April 22, 2024

Anyone using this? We use a self-hosted Gitlab for 5 years now and had to upgrade to premium after they killed the basic version. Now I would essentially have to double my spending (again) for this new feature.

slindsey · on April 22, 2024

My team recently paid for this on self-hosted GitLab ultimate. I would not suggest this for self-hosted. We've been having issues that seem to be related to self-hosting that are requiring a lot of effort from our System Admins to work through with GitLab support.

Separate from that, there don't appear to be any benefits to having it "integrated" into GitLab. I'm assuming we'll switch to another tool shortly.

fuzzy2 · on April 22, 2024

Just curious: Why not use GitLab Community Edition? At least for me, it had everything I wanted. (Due to feature creep, I no longer use it.)

palata · on April 22, 2024

> Now I would essentially have to double my spending (again) for this new feature.

I guess either you believe that this AI feature is worth the new prices, or you don't. I don't.

> We use a self-hosted Gitlab for 5 years now and had to upgrade to premium after they killed the basic version.

Why not self-hosting an instance of Forgejo?

mdaniel · on April 22, 2024

> Why not self-hosting an instance of Forgejo?

Have they fixed their pseudo-implementation of GitHub Actions? Last I saw they were trying to use nektos/act instead of actually, no kidding, implementing Actions and a real runner. It's the uncanny valley of CICD

Kelteseth · on April 22, 2024

Yes, Forgejo would be an alternative if we really would need to switch. GitLab pricing is ok for us, not great, not terrible for a small team of 6 devs...

nicce · on April 22, 2024

The pricing of the GitLab has changed so many times in the past years and free tier has changed as well. It has made GitLab so unpredictable that I had to switch away.

Someone could say that enshittification is on the way.

misnome · on April 22, 2024

It changed to match Github, and then it changed to exceed Github.

Probably the biggest sales advantage Gitlab has is that nobody seems to know/remember that it's possible to run Github on-prem. That's exactly the reason our organisation chose it.

jamesponddotco · on April 22, 2024

Considering they still use Claude 2 as the model for many features, I'd pass. Now, if they upgraded to Claude 3, on the other hand... I would still go with Codeium[1] and the Claude 3 API.

[1]: Which, to be fair, has fewer features.

dnsmichi · on April 22, 2024

GitLab team member here. Thanks for your feedback :)

You can follow the migration to Claude 3 in this epic https://gitlab.com/groups/gitlab-org/-/epics/13297

(also shared in this thread https://news.ycombinator.com/item?id=40114647 )

dnsmichi · on April 22, 2024

GitLab team member here. Thanks for your feedback.

GitLab Duo provides AI-assisted features in the DevSecOps lifecycle. GitLab Duo Chat was released as GA last week [0] Code Suggestions are GA, too, and help with code completion and generation [1] The documentation provides more insights on availability and usage [2]

Helpful learning resources:

If you are looking for practical examples and prompt tips for Duo Chat, bookmark this longer tutorial "10 best practices for using AI-powered GitLab Duo Chat" in [3]

We have also updated the documentation to add hands-in GitLab Duo examples for different programming languages and environments, helping with how to integrate AI into your workflows efficiently: [4]

At QCon London 2024 two weeks ago, I spoke about "Efficient DevSecOps workflows with a little help from AI." The slides are publicly available at [5], talk summary in [6]

I also recommend the blog post "How to put generative AI to work in your DevSecOps environment" [7] to tackle important questions such as workflow assessments, AI guardrails and how to measure AI impact in your organization.

Last but not least, I started a learning series called "GitLab Duo Coffee Chat" on YouTube [8], showing AI and GitLab Duo in action. I plan to host more sessions in the coming weeks. [9]

Happy to help with more adoption questions, best practices, and development workflows :-)

[0] https://about.gitlab.com/blog/2024/04/18/gitlab-duo-chat-now...

[1] https://about.gitlab.com/blog/2023/12/22/gitlab-duo-code-sug...

[2] https://docs.gitlab.com/ee/user/ai_features.html

[3] https://about.gitlab.com/blog/2024/04/02/10-best-practices-f...

[4] https://docs.gitlab.com/ee/user/gitlab_duo_examples.html

[5] https://go.gitlab.com/ZDYNXQ

[6] https://qconlondon.com/presentation/apr2024/efficient-devsec...

[7] https://about.gitlab.com/blog/2024/03/07/how-to-put-generati...

[8] https://www.youtube.com/playlist?list=PL05JrBw4t0Kp5uj_JgQiS...

[9] https://gitlab.com/groups/gitlab-com/marketing/developer-rel...

Etheryte · on April 22, 2024

Did you use Gitlab Duo to write this comment?

dnsmichi · on April 22, 2024

Great idea ;-)

But actually, no. I had these helpful tips in mind, and wanted to share them with everyone quickly. There is always an opportunity to learn together, and get inspired from feedback :-)

joshstrange · on April 22, 2024

Here are the models they use [0]. They best they do is Claude 2, hard pass.

[0] https://docs.gitlab.com/ee/user/ai_features.html#language-mo...

dnsmichi · on April 22, 2024

GitLab team member here. Thanks for your feedback.

You can follow the migration to Claude 3 in this public epic: https://gitlab.com/groups/gitlab-org/-/epics/13297

bogwog · on April 22, 2024

A half-assed feature nobody asked for? In my Gitlab?! Unbelievable!!

On a serious note, wtf Gitlab? Every once in a while I get an email notification from a feature request I subscribed to (that has been open for years) because yet another large premium customer requests that it be implemented. Gitlabs response is always that they don't have capacity/manpower to implement it, yet they're able to implement crap like this?

keybored · on April 22, 2024

I don’t know if this is half-assed. But the presentation certainly is.

    Why Gitlab Duo : AI :sparkles:

yjftsjthsd-h · on April 22, 2024

I don't know, isn't that exactly explanation of why they did it? (Joking but also 100% dead serious)

candiddevmike · on April 22, 2024

I wrote a (satirical?) blog post about this behavior: https://candid.dev/blog/becoming-an-ai-company/

Every company seems to be throwing their backlog away and focusing on adding AI features no one asked for. Could be a lot of opportunity for startups that actually listen to their end users to gain some traction instead of chasing GenAI.

bogwog · on April 22, 2024

> Could be a lot of opportunity for startups that actually listen to their end users to gain some traction instead of chasing GenAI.

This made me actually look into it instead of just complaining on the internet. The feature I was referring to was the Conan package repository[1]. Apparently, Forgejo implemented support for that already[2], as well as other features that originally drew me to Gitlab in the first place.

I think it may be time to consider switching.

1: https://gitlab.com/groups/gitlab-org/-/epics/6816

2: https://forgejo.org/docs/latest/user/packages/conan/

andybak · on April 22, 2024

Are your features potentially new revenue streams? This is. Or at least they believe it can be.

pid-1 · on April 22, 2024

~5 years ago GitLab had many features that distinguished itself from competition, chiefly their CI/CD stuff. It was leagues above of what everyone offered.

They've spent the last years building stuff I don't care while not improving the stuff I care. During this period, GitHub built Actions.

Nowadays GitLab is more expensive than GitHub (GitLab Premium vs GitHub Enterprise) while also not having any particular feature that makes me want to use it over GitHub. If I was starting a software team today, there is no reason I'd be using GitLab.

In the long term their strategy of having too many half assed features might bite back instead of generating revenue streams.

mdaniel · on April 22, 2024

I'll also observe that several years ago they weren't a public company, and if my mental model of public companies is correct then they have a fiduciary duty to maximize shareholder value, not software engineering teams value from their product

pid-1 · on April 22, 2024

For most companies maximizing shareholder value and making customers happy are highly correlated. No amount of AI features will save their stock performance if their customer base stalls.

Broadcom-like companies (monopoly) and Google-like companies (you are the product bla bla bla) are exeptions though.

mdaniel · on April 22, 2024

My further life experience has been that markets are highly irrational, and as the other comments have said: if the shareholders believe AI is the bandwagon, and implementing Conan repository support is "well, churn gonna happen" then that's how one ends up with this bullshit

richbell · on April 22, 2024

I'm sure their investors believe announcing something with "AI" will increase the value.

sdesol · on April 22, 2024

> A half-assed feature nobody asked for?

I'll put money on it that Enterprise customers asked for this. The half-assed part, I can't comment on.

lxe · on April 22, 2024

I just want a copilot-like autocomplete and chat plugin that uses the standard oai api so I can hook it up to any local model.

yellow_lead · on April 22, 2024

> Ship more secure software faster withAI throughout the entire software development lifecycle

Is this a typo or some sophisticated marketing?

edit- seems to only be a bug in Firefox.

john_cogs · on April 22, 2024

GitLab team member here. This is a typo. Will be fixed shortly. Thanks.

frizlab · on April 22, 2024

And Safari

bckr · on April 22, 2024

And mobile chrome

frizlab · on April 22, 2024

They used the UTF-8 char `U+2028` to separate `with` and `AI`. This is the LINE SEPARATOR character. How it should be rendered is a good question.

TechTechTech · on April 22, 2024

I run a self hosted Gitlab CE instance for many years now and I am very happy with it. I also experiment a lot with local LLMs. Will there be a CE release which allows for usage combined with a locally running LLM in the future?

mdaniel · on April 22, 2024

Since the relevant code appears to be in the "ee" directory <https://gitlab.com/gitlab-org/gitlab/-/blob/v16.11.0-ee/ee/l...> and is not present in the foss repo, I'm guessing the answer is no, at least for now. They do have a history of "releasing" features from EE back to CE but my suspicion is not for LLM stuff

harshitaneja · on April 22, 2024

No one has mentioned this yet, but their pricing is twice of what I pay for github copilot. What does it offer that's substantially better than copilot?

Hamuko · on April 23, 2024

If the past is any indication, no. GitLab Ultimate is $100/user/month whereas GitHub Enterprise with the Advanced Security addon is about $70/user/month. Well, at least Ultimate used to be $100 before they hid the prices from the pricing page. Could be even more for all I know. And yet, the base product you get isn't really significantly better than GitHub, and using it day-to-day professionally, has quite a lot of bugs too.

firemelt · on April 22, 2024

clippy died for this

TrianguloY · on April 22, 2024

What's with the fading on each element while scrolling? Make the page impossible to fast-read or scan for important points. ...maybe that's the reason?

bdcravens · on April 22, 2024

Now I suppose it's the chance for Github to publish a "we did it first!" blog post (as is common for Gitlab to publish)