🔗 What Chafes My Groin #8


First published: .

Municipal Erections

Yesterday was Israel's municipal elections. What a fuckin' joke. We're supposed to make two votes: one for city council (where you vote for one of the submitted "lists", often related to political parties), and the other for mayor (my city only really had one candidate). The problem is I had no idea who any of the people running were. They're just lists of names.

So as a good, honest, socially and politically concious citizen, I tried to learn about my city's candidates. What is their relevant history? What do they plan to do if elected? What are their credentials? What is their platform? Turns out, this information doesn't exist. There are no platforms, there are no information booklets, and the names of the candidates don't even show up on any search engines. So how the hell am I supposed to choose? Who do I vote for? The lists seem to be suspiciously divided by ethnic origin: a list of Mizrahi names, a list of Ashkenazi names, a list of Russian/Ukranian names, a list of Ethiopian names.

Am I really supposed to just go and randomly choose a paper slip? Should I have just said "fuck it" and voted for the list with last names most similar to mine? I'm pretty sure these people are all just business persons trying to advance their businesses on taxpayer dime anyway. What a fucking joke.

You try getting DALL-E to get this picture right.

Code Generating AIs

Code generating LLMs are gaining traction, and organizations are integrating them into their day-to-day development work, such as through the usage of GitHub CoPilot. I am very conflicted about this, or let's say skeptic about this technology being properly utilized by organizations.

First of all, there's the issue of what these models tell us about the companies we work for: everything we're doing has already been done before many times over. We are not innovating anything, it's all just an endless circle jerk of legal Ponzi schemes.

Second, most organizations suffer from the same issues:

  1. Money hungry management enforcing impossible deadlines and requirements and refusing to continuously train employees, if at all.
  2. Inexperienced developers writing shit code.
  3. Complete lack of knowledge- and work-sharing within the organization.

So code generating AI can have the positive of making it easier for inexperienced or apathetic developers to do basic things like validating input, writing tests, and other tasks that are ironically non-existent in many organizations. But I don't think that that's the real game changer here. I think the most important thing code generating AIs should do is enforce conventions and make it easier to share knowledge. As in "this useless fucking function has already been written 73 times by the 4 developers in your team, who never seem to remember that they've already written this same fucking function before, all with very subtle but different bugs."

Have you ever used the AWS SDK? Have you noticed how inconsistent the SDK is between the different services? In S3 errors are returned one way, in EC2 a different way. In ECS input is provided one way, in Lambda another way. In one service output is XML in one format, in another it's XML in a different format, in yet another it's in JSON. I can easily see that this happened because of different teams writing code for the product in whatever the hell way they wanted, with zero knowledge sharing, no conventions, and under great pressure from management. This is such a big problem for AWS customers that AWS decided to create an entirely new service whose only purpose is to provide a consistent API to all the other services.

That's why I think code generating LLMs should first and foremost be trained on the organization's code, and be used to maintain organizational conventions and prevent unnecessary code duplication. Using a model trained on the hard work of open source maintainers, work that the LLM providers are cashing in on, should come second.

I know that some code generating LLMs can be used for this purpose, in one way or another, but I also know organizations. I know that they won't be used for that purpose, because that would take some time to train the models, integrate them, and start utilizing them. Hooking your IDE to a cloud-based LLM trained on GitHub repositories will take 3 seconds, so that's what organizations will do, and the three issues I've mentioned before will largely remain.

Input validation? What is this, the middle ages?!