Not All AI Models Are Made Equal

I'm sure you know that. They differ between companies making them available, and they differ between the models of the same company.

But I'm not here to tell you all that, but rather to tell you a funny thing (in the end) that just happened to me, involving AIs.

So... I was tracking down a nasty bug that happens with my charts when resizing the window. Will likely nail it down in the end, but not being particularly skilled in this area, I used an AI to investigate what was going on.

I wasn't paying attention to the AI model that was selected. Usually ChatGPT is pretty decent at finding little bugs and saying how to fix them without being too verbose about it, which consumes tokens. And this sounded like what was going on now...

Something I'm not sure I like about ChatGPT, in general, is that if you upload a code file, it doesn't seem to process it unless it really has to. Maybe it's trained to be lazy about it, maybe the researchers know that most people upload files not needed or too big that don't really matter. And processing them takes computing power and therefore, tokens.

But that also makes ChatGPT guess a lot before actually taking a look at the code.

That happened today too. It turned out—according to it—that its first guess was wrong (it didn't actually say that). It seemed on the right track though... I would have guessed the same way, at first.

Then, it actually looked at the code, and said something like: "Oh wait, you use this stuff, that means we need to approach the problem differently.".

So far, it sounds almost scientific, apart from the guesswork.

But here's where the funny part starts. It started to talk about things like they were in my code and... they weren't. I mean... function names, calls, and everything.

Despite not remembering what it was talking about, I thought it is possible I might have forgotten or left out some old piece of code by mistake, now unused. So I searched for the function names. Nowhere to be found. I told it that. It insisted I have them and showed me more evidence, none from my code.

Ok, I said, if I don't have it, maybe it's in the D3 code I load in my HTML. Very unlikely, since that is minified, but hey, let's ask ChatGPT. On this, it agreed with me. But ChatGPT still thought that was my code that I uploaded minutes ago.

The only logical explanation I could find other than ChatGPT having a really bad time, was that I discovered I used the conversational model, not the one specialized in coding. I promptly switched, but haven't continued with a new session before writing this while events are fresh in my mind.

I wonder if these models should decline their competence when there are better (free) models from the same company available to serve a certain query, rather than attempt to answer it to the best of their ability and making a fool of themselves.

In my already pretty extensive experience on using AI models to help out with coding, I ran into... many situations when they start "breaking". A fresh session usually helps (kind of like a reboot for computers), but that also means the necessity of a new context provided (i.e. reloading programs and opening documents you worked on, for computers). And you can't do that many times daily, due to token restrictions, at least not for free.