Generative AI is going to make us all extinction level stupid

Ddseddse

Distinguished Member
Messages
403
Google absolutely anything. How to get my kid to eat vegetables? Why is my dishwasher making that noise?. What are the wines of the Loire Valley like? At least 5 of the top ten results you get are going to generate d by AI. You can tell by how stupid the articles are. A big give away is that you’ll see the same subject repeated multiple times under different headings.

OK. That’s stupid. But survivable. The real existential danger is that generative AI models are trained using of data from the Internet. What happens when more and more content on the Internet is itself created by generative AI? Well, if you’ve ever seen the movie multiplicity with Michael Keaton in it you’ll know exactly what happens. It’s like taking a photo copy of a photo copy of a photo copy of a photo copy.

Doesn’t matter how good the models get. This isn’t a problem with the models themselves. It’s a problem about the feedback loop with the training data.

I don’t have a link for this. It’s my own original insight, but by now I ‘m basically the guy from the meme at the table with the “prove me wrong” sign.
 
Actually, an amplifier squeal is a better example of the problem of this kind of feedback. The model is going to pick up on some random noise and it’s going to get continue to become more and more distorted.
 
I'm hoping it can turn me into a savvier version of Michael Biehn who can outsmart/kill the Terminator, and live happily ever after with Linda Hamilton
 
I think it's all a scam. Businesses telling everyone how awesome it is to make a buck when it reality it has so many flaws and errors.
 
Not how I understood SkyNet was going to get us, but pretty diabolical.

large GIF
 
Google absolutely anything. How to get my kid to eat vegetables? Why is my dishwasher making that noise?. What are the wines of the Loire Valley like? At least 5 of the top ten results you get are going to generate d by AI. You can tell by how stupid the articles are. A big give away is that you’ll see the same subject repeated multiple times under different headings.

OK. That’s stupid. But survivable. The real existential danger is that generative AI models are trained using of data from the Internet. What happens when more and more content on the Internet is itself created by generative AI? Well, if you’ve ever seen the movie multiplicity with Michael Keaton in it you’ll know exactly what happens. It’s like taking a photo copy of a photo copy of a photo copy of a photo copy.

Doesn’t matter how good the models get. This isn’t a problem with the models themselves. It’s a problem about the feedback loop with the training data.

I don’t have a link for this. It’s my own original insight, but by now I ‘m basically the guy from the meme at the table with the “prove me wrong” sign.
So, I have considered that possibility - and it really comes down to how well future AI can distinguish between AI, and non-AI, created content.
 
I don’t necessarily think AI is a bad thing when it’s used to supplement certain technology products. My company has baked it into the software we sell over the last few years and it has undoubtedly made the product better.

We have also started using AI more for everyday work tasks, and I have mixed feeling about it. For example, we have a tool that automatically takes notes during a call, summarized it into neat bullet points, and has options to automatically generate detailed emails to send as follow ups that are scarily accurate. Idk, on the one hand stuff like that does eliminate a lot of tedious work, but I try not to lean on it too much and completely forget how to take notes, write emails, etc.
 
So, I have considered that possibility - and it really comes down to how well future AI can distinguish between AI, and non-AI, created content.
I agree in the sense that it’s an epistemological problem. But if you assume we are now or shortly be at the point where these models are passing the Turing test, then we’re ******. I think you might be proposing some sort of “super Turing test” (can a model identify another model) which is alternately terrifying and depressing in turns.

But I still don’t think that gets at the root of the episiotomilogical problem. In the real world we have a kind of genetic process that (often enough) identifies and kills off incorrect knowledge. There’s not one baked into the generative AI loop.
 
I can't prove you wrong unless I have a better sense of what you're trying to say. I suspect that you don't really know, or you would have been less vague about it.

1. When you are talking about the "models," do you mean large language models like GPT? That matters because the language models are not built to contain accurate information. When google or MS talk about integrating AI into their search engines, they aren't talking about making browsers into elaborate ChatGPT interfaces. The fact lookup models of AI-powered search engines aren't going to be trained with scraped data. They might not even be "trained" at all, as the fact lookup feature is more like a database query than generative AI.

2. If you are talking about LLMs, then the feedback loop you describe is very unlikely to be an issue. All language is a photocopy of a photocopy of a photocopy. How did you learn to talk? Primarily from your parents. And how did they? From their parents. Over time, this process creates linguistic evolution, aka the reason Shakespeare is hard to read today. I have a feeling that you think your language is just as good to Old English or Middle French or Xhosa, meaning that the feedback language system isn't actually destructive.

So long as the LLM can produce grammatically sound text that replicates the English language, there is nothing wrong with using that text for training the next model.

3. You're ignoring a big part of the feedback process here, which is that nobody is going to want to use a shitty language model. If a language model does take in some noise and propagate it through the system so that is output becomes terrible, nobody will use it and thus its influence will disappear.

4. On a technical level, what you describe is unlikely -- at least with the training technology we currently use. Let's hypothesize what would happen if we introduce some error into scraped data and then models get trained with that data. Will it propagate through the whole model? No, not if the model is any good. One of the central challenges of the machine learning discipline is avoidance of "overfitting," which is basically what you are talking about. Overfitting is when a model overemphasizes certain idiosyncrasies of a data set in its internal weights, and thus does an amazing job of predicting the training data and a bad job of predicting non-training data. So in order for this error in the training data to propagate, the model would have to overfit the data. But we know that models don't overfit, because otherwise they suck.

The genius of the transformer architecture is that it has multiple ways of avoiding overfitting. One way, which isn't unique to the transformer, is the manipulation of the speed of learning. No neural network model can "instantly" learn, and that's especially true of huge models with hundreds of millions of densely interconnected nodes. It takes huge numbers of trials for the system to calibrate all of those interconnected weightings. The error we introduced would have to be everywhere before it could seep into the model. But how would it get everywhere before it gets baked into the model? Even if we posit this huge amount of AI training (and note: there's no law that says every bit of scraped data has to be equally represented in the training set), what you're actually referring to would be a set of sibling errors -- i.e. different errors that get produced from the execution of the same process.

In order for the feedback loop to be effective in the way you fear, the same errors would have to permeate the training data to a high degree. But if the error was so prevalent, it would be noticeable and could be weeded out.

5. And even if our pet error gets propagated into the model, user feedback would eliminate most of it. One reason that OpenAI makes ChatGPT widely available is that it uses the feedback the app periodically requests in training. And there are ways to flag objectionable content, which also gets trained on.

*****
In short, I really don't think this is a significant concern. The bigger issue, by far, is the permanence of systemic bias in our language and discourse. That's not an AI problem; that's a human problem. Arguably AI gives us a better way to eliminate those systemic biases than we've ever had. Expunging a racist image from the collective consciousness of humankind is really, really hard. There are millions or billions of minds that have to be reached, and then those minds have to accept the characterization, and also care about the characterization, and they have to be malleable enough to let go of the received wisdom. With an AI, we just have to click the "is this racist" checkbox and the model will know. Well, it's harder than a checkbox, but the difficulty is on that order of magnitude compared to the difficulty changing human languages as diffused throughout the human population.
 
Back
Top