I haven’t talked about it a lot but I’ve been working as a contractor for the past couple of months working on “GenAI-powered” college CS course materials. I ended up presenting my work at a summit in December to a group of CS professors from various institutions and wanted to write up some of my personal thoughts and observations about the event! These opinions are my own and don’t represent the views of my employer etc etc.
For folks not familiar, “GenAI” refers to generative artificial intelligence – computer programs that are trained on large amounts of data with the intent to produce novel output. For example: chatbots, image generation, video generation, and most relevant to this topic, code generation. Tools like Github Copilot are trained on many millions of lines of code so that you can ask it a question in plain text and it will spit out (often) correct code.
To add some context on my own GenAI usage, I’ve been experimenting with adding GenAI to my programming for a few years now – first with internal tools when I worked at Google, and now with tools like Copilot and writing my own LLM-powered text completion plugins for fun. I find them really useful, and very often wrong – requiring deep knowledge of the system to get things actually working. This is interesting since it means that without my 10+ years of development experience, using GenAI might slow me down instead of speed me up! Others are having this experience as well.
Anyway, work on the curriculum is continuing, now focusing on incorporating student and instructor feedback. The material is currently in a trial phase, but I personally hope that after it’s been run a few times and the knots straightened out, that it will be publicly released!
I ended up taking away a handful of themes from the summit, which I think are pretty revolutionary to the field of CS education. Folks have been talking about a lot of this for a while, but it seems like CS as a whole is undergoing a radical change which is worth writing some words about!
Total Rewrite
When ChatGPT 3.5 was released publicly in November of 2022, people realized that it was something really special – it was the first publicly available chatbot that dramatically impressed people – and was able to solve homework problems for students in a way that was undetectable. This, of course, was a huge problem for academia.
I remember seeing a short video years ago of someone pointing their phone at their math homework (a sheet of algebra problems) and the phone working for just a few seconds before displaying the answers to every single problem on the page. AI has only gotten exponentially more powerful since then.
One thing covered a lot during the summit was that professors currently take one of three approaches to GenAI: ignore it, fight it, or embrace it. (Well, or quit.) The “ignore” people have mostly gone away – GenAI is obviously here to stay. The “fight” folks are doing things like strictly proctored testing, handwritten tests and assignments, etc. And the “embrace” folks are starting to integrate GenAI into their curriculum and classwork.
The thing that it took me a while to realize at the summit this week was that there’s actually a fourth camp out beyond “embrace”, which is the “rewrite everything” camp. This isn’t just integrating GenAI into an existing curriculum, it’s realizing that instead of changing existing CS courses to integrate GenAI, we need to change the learning goals of CS in general.
As one professor put it, It’s an opportunity to refresh CS classes that are still stuck in the ’90s, teaching students to solve problems from the ’80s. (For the millennials like myself who think that the ’80s are 20 years ago, I will point out that they are actually 45 years ago.)
In this approach, students trained on programming with GenAI no longer need to really learn syntax, for example. In fact, they no longer need to learn how to write code. They need to learn how to prompt GenAI, and they need to learn how to evaluate and debug the generated code. If you’re a CS person, this might raise your hackles – but if you are working in a corporate space, think about how much of your job is writing code versus understanding, evaluating, and debugging code written by others. Now it’ll just be a GenAI writing the code instead of your coworker (or yourself from 10 years ago).
I ran a short exercise at the summit where I asked the attendees to pretend to be students in my class, and work for just fifteen minutes on writing a design document, leaning on GenAI as a tool. And impressively, several teams produced a pretty detailed design doc at the end of just fifteen minutes. The time was spent evaluating GenAI output and thinking about features, not writing.
For those of you who remain skeptical that students actually learn when assisted by GenAI, the data is starting to point the other way: In one study, students using Github’s Copilot GenAI during exercises performed better (1.8x higher scores) on evaluations where they weren’t allowed to use GenAI (source, arXiv). It seems that the fear of students abdicating the fundamentals entirely to the machines may be unfounded.
Industry / Interviews / Academia
Folks in the CS industry have known for years that interviewing is broken. Much digital ink has been spilled here already. Many companies still use stringent “whiteboard” interviews where the candidate is judged on their ability to answer tricky data structures questions. But this is comically misaligned with the actual work that the candidate ends up doing, which often has nothing to do with tricky data structures.
More and more companies are moving away from these, but I recently went through a job search and got several whiteboard interview rounds, so it’s not entirely dead. The incentives are totally off: Industry wants employees who are comfortable using every tool at their disposal, including GenAI, to produce value. But the interview process is more aligned with academia – focused on data structures and algorithms.
The system is slowly changing, and from both ends – companies are getting more comfortable with “build something and then explain it” style interviews, which allow applicants to show off their skills with tools at their side, and academia seems to be realizing that they need to embrace GenAI – or perhaps go farther and change the goals entirely.
The Google CEO recently announced publicly that 25% of code written at Google in a recent quarter was GenAI-created (source). 70% of recent grads reported that there should be more exposure to AI/GenAI in their college courses [citation needed]. And perhaps most importantly, a majority of new job postings had AI as part of the job description [also citation needed]. Universities graduating CS students from 2025 onward without heavy exposure to – and discussion of – GenAI, is doing those students a disservice.
But, what about the fundamentals?
Students are already bad at the fundamentals. Global CS1 failure rates are around 28-33% (source). Computer science is really hard! But what are the fundamentals? Several professors at the summit pointed out that students in the 80’s and 90’s could not have made it through a CS degree without writing a compiler, and yet that is often not required today. We’ve all kind of decided as an industry that learning a “high-level” language like Python or Java as your main language is actually OK, and you’ll still get your degree.
This isn’t to say students won’t need to understand, say, pointers! But they may not need to understand pointers from the approach of writing code. The new fundamentals of code reading, debugging, and problem decomposition are larger and more interesting problems, when the GenAI can take care of the syntax. Do students need to understand pointers to write valuable code? No. Do students need to understand pointers to debug valuable code? Yes. But as we saw earlier, students assisted by GenAI did better on the GenAI-free assessments, so perhaps the students using this new paradigm will actually understand pointers better, despite the worrying about students losing sight of the fundamentals.
There was a lot of disagreement over whether having students write 100 for-loops in a wax-on wax-off style is useful. I personally am not sure. I think some of the really fundamental stuff (tracing code, understanding pointers, recursion, etc) is Really Hard, Actually, and is perhaps harder to pick up on when you don’t have someone forcing you to drill them until you truly understand them.
So, what do we drop, and what do we keep? I think research will tell us this. The next 5-10 years of CS education is going to be a huge experiment, with some universities doing complete rewrites to GenAI-centered education and some clinging to hand-written proctored exams. Papers are already being published about the outcomes of these students, but I think fine-grained details like what the new “fundamentals” are is undecided and will be for a while.
Spending a bunch of time teaching syntax is probably out, though.
Don’t put up artificial walls
A mentor of mine gave me a metaphor which has stuck with me as I’ve been writing various curricula throughout the years: if you want to know how far someone can throw a ball, don’t put a wall in front of them.
The idea is that if you have high performers and you artificially limit them, you’re going to “clip” your data. If you give your 100-student class a 10-question multiple-choice quiz, and 40 of them get all the answers correct, you have no idea what the skill distribution actually is among 40% of your class.
As a teacher you also need to address cheating: either you need to treat your students with distrust and police them, or some of them will just look up the answers / use GenAI. So, how do you correct for this without providing infinite content?
A few approaches that were discussed this weekend:
- Provide infinite content. Use GenAI or custom code to create drills / exercises on demand and allow students to go as far as they can within a certain time limit.
- Have students create code using any tools, but then submit a recording of themselves explaining some aspects of their code along with their project.
- Focus on projects that have no artificial limits – students are welcome to go as far as they can with whatever tools they have, again within a time limit.
- Add friction to the problems (i.e. artificial barriers like having to use a GenAI which purposefully introduces minor errors) – but make sure that the friction is in service of student learning.
One of the big questions: If we’re teaching students to code with GenAI, why do we then evaluate them on their ability to code without GenAI?
This aligns with the previous section. The focus is no longer on having perfect syntax or memorizing the idiosyncrasies of a particular language. We need to teach management skills: delegation, coordination, task decomposition, flexibility and ability to pivot.
And importantly, the professors who had trialed GenAI-powered CS1 courses reported that the students were able to make much more impressive projects than traditional CS1 students. Many of them probably could not hand-code an algorithm to compute the average of the positive numbers in a list given 20 minutes, but then again, about 50% of 2023 CS1 grads couldn’t do it either. And the GenAI-empowered students would still be able to solve the problem if you gave them their tools back.
Does it matter if you can’t reason about a nested loop if you can build a business-value-generating application? Anecdotally, I’ve seen some posts on the orange site saying that they’ve built multi-thousand-dollar MRR applications as a solo founder despite being nontechnical. I really, really don’t know the answer to this. GenAI is getting better and not going away anytime soon. But many students – according to one of the university presenters – are truly motivated by becoming masters of their craft.
I think this depends on how you – and the students – define success. Is generating business value or craft mastery the goal? Perhaps both? Leading and coaching students on this, and the ethical and environmental considerations of their actions, GenAI or not, is perhaps more important than the technical skill, yet is often not considered in these curricula. It’s often at odds with industry – and student GPAs – to critically think about ethical considerations, unfortunately.
Another of the presenters pointed out that students mostly used the tools to help them understand the code/frameworks, not generate it, and most reported that it enhanced their learning and problem-solving. It also got them to write more tests because of lack of trust in GenAI code, and the fact that it’s easier to write tests when GenAI takes care of the boilerplate.
The Junior Gap
There’s an upcoming “Junior Gap” in many fields. Ex-Googler Steve Yegge addressed this in a recent speculative blog post that got a lot of traction. Essentially: if GenAI handles all of the menial tasks, how will we train Juniors (who we previously hired to handle menial tasks)? Except, nobody really enjoyed or learned anything from the menial tasks anyway.
There’s already a dearth of junior roles. As anecdata, a friend of mine was recently looking for a junior developer role. On popular job site BuiltInNYC.com he reported 5,000 senior openings, 3,000 mid-level openings, and 3 junior openings. Not 3,000, not 300, not 30: three.
The thing is, these junior developers are human. They will do what any of us would do: grind and network until they find something in the field, jump sideways to a different field, start something on their own, or become disillusioned and drop out entirely. I think the frustration comes from an unintentional “rug pull”, where four years ago when these students were choosing their majors, the industry was desperate for workers. They were told, “Get a CS degree and folks will be knocking down your door to hire you” – which was true at the time! And now there’s almost nothing.
It’s not the students’ fault that they followed this path and are finding themselves stuck – it’s our failing as educators, policymakers, and hiring managers to push the envelope. My hope is that the curriculum I helped build fills in this gap just a little bit and pushes on the system just a little bit. But the ship is already turning, and it’s a BIG ship.
Convenience is morality’s most cunning foe
I have a lot of artist friends, and they are understandably concerned about GenAI. I make an effort to avoid AI art and purchase directly, but I can because I have the means to do so. I’m also not in a position to sway decision makers who are the ones who set (or remove) the budget for art. I struggle with the idea of using a GenAI trained on data for which nothing was paid. But this returns to the disservice: If this is the future, ignoring it just means falling behind and losing power to help steer the ship.
This brings me to the last theme that was at the forefront of this conference, which is that if we don’t train students to use GenAI well, then they’ll use GenAI badly. By which I mean, in ways that are harmful or unethical. (Although apparently many of them were pretty bad at prompting when starting out, but that’s beside the point.)
The professors and industry folks were pretty adamant about making sure that students understand deeply that it’s their reputation on the line when they use untrusted GenAI code. To understand that representing GenAI code as their own is plagiarism and academic dishonesty that can get them expelled. The lesson: Use GenAI, but think critically about it.
This is the same lesson as before GenAI, it’s just … easier for students to generate untrusted / broken code. Many CS courses already talk about Therac-25 or the Mars Climate Orbiter, for example. But GenAI lets us go really fast and when you’re going really fast, you miss things. This is unsolved, but very smart folks are thinking about it.
In conclusion, we’re at an inflection point in education and folks know about it. But the changes aren’t limited to academia. Policy, interviewing, industry, academia, recruiting – everything is stressed to a point where things are beginning to break or change. I agree with the folks who say we need to rethink everything about what a CS degree means and what goals it accomplishes, but what it needs to be changed to will only be revealed as we do more experiments and gather more data. So, let’s go do that.
Leave a Reply