LLMs at the Arcade

I was mentoring at our local fortnightly Coder Dojo session yesterday and heard a couple of students talking about getting help from ChatGPT. My ears pricked up because these are mostly younger kids (mid-late primary, a few in early secondary) and, as much as I try to encourage them to broaden their horizons, usually are firmly in Scratch-land, with a few who have been working in MakeCode Arcade; how were they using LLMs here?

A bit later on I plonked myself down next to one of them and asked what his experience had been like with ChatGPT. He said it had been terrible, sometimes it gave him Javascript, sometimes TypeScript or Python for his MakeCode project, and nothing was working. He showed me a couple of interactions he was going through and pasting the resulting code into Arcade, which was reporting errors all the way through. To be somewhat fair to the tool, the problem specification wasn’t great, being along the lines of “Make me a maze generation game in Makecode Arcade”, but one of the real weaknesses of tools like ChatGPT is just running with it, and not more tightly defining the spec first. But hey, these things are supposed to be magic right?

Because the student wasn’t familiar with Javascript, or debugging techniques in general he was pretty lost. He told me that with some previous prompts he ended up with working code, so it could be turned back into blocks, and, being a fairly competent visual programmer (although lacking much understanding of things like functions, which I’m trying to work with him on 😀) he at least had a base to work from, but this time around it was a mess. He felt like his only option was generating a whole new “solution” from ChatGPT.

It’s a big rough having a conversation with a tween around how LLMs work, and why the suggested code for a predominately visual language is probably going to turn out badly (even for a framework that has equivalent visual, JS, and Python representations you can switch between like MakeCode). Overall, it was a bad experience for him and he ended up writing some Python code, making a start on a Wordle clone and feeling much more successful.

What would have turned this into a more positive experience - the sort that the AI boosters would have held up as a success?

ChatGPT acting like a more effective “helpful assistant” that defines the problem before jumping in and providing “answers”?
“Prompt engineering” skills that give a higher likelihood of a more correct response being generated?
More of a debugging mindset in students earlier on, where interactions with the LLM are more iterative rather than single shot?
Problem decomposition skills that lead to smaller and more tightly focused interactions with the LLM?

Or would it be shoving Copilot into yet another tool so it’s closer to the context of the programming problem? I’d actually be interested to see how that turned out and, from a UI perspective how much of a disaster putting putting ghostly suggestion blocks into a visual programming project, a-la Github Copilot, would turn out. Lots of kids are programming following someone on YouTube step by step, so it isn’t a huge leap, but at least there’s intentionality there, not some exquisite corpse programming.

I actually like Copilot sitting in VSCode for me. When I know what I’m doing I turn on code completion and it makes boilerplate less annoying. When I’m learning new things, I turn off code completion and use the chat mode to explicitly poke at things I don’t quite understand. It’s frequently imperfect or flat out wrong, or just unhelpful. Yesterday I asked a question about the effect of a future change to a default parameter value in a pandas method and it provided example code – twice in a row – that demonstrated identical behavior for both modes of the boolean parameter 🤣. I have a lot of experience of breaking down problems, challenging my assumptions of correctness (so many challenges), debugging code, and generally having to figure out and fix things myself. Where are kids going to get the experience to help them navigate all of the Helpful AI Assistant fails?