Lecture 1: Introduction to Deep Learning

Using traditional machine learning algorithms, as you gave them more data, performance would plateau. It was as if older algorithms didn't know what to do with all the data. But if you train a very large neural network, the performance just keeps getting better and better.

My first GPU machine for training neural networks was built by a Stanford undergrad in his dorm room. His name was Ian Goodfellow. That compute server laid the early foundations of using CUDA to train large neural networks. Sometimes the work you do in a dorm room, looking back over some years, can really have a huge impact.

As you scale up neural networks, performance gains are actually quite predictable. You can forecast, if you buy this many GPUs and throw this much data at it, what the performance will be. That predictability drove a lot of the investments in data centers and building very large AI models.

People that really understand computer science, rather than just vibe coding, get things to work much better.

The term neural networks had been around for decades, but around 10 years ago we realized deep learning was just a much better brand. Who doesn't want learning that is really deep?

For a lot of applications, just prompting LLMs doesn't cut it. There are things I cannot get to work just by prompting. I often have to go a layer deeper into deep learning algorithms to get certain things to work.

GenAI is fantastic for text-based applications. But for audio, images, video, and structured data, I end up dipping down directly to use deep learning algorithms.

Use of GenAI tools is relatively inexpensive when prototyping. But when more users start using your product, your AI bill starts to skyrocket. Knowing how to fine-tune smaller models is the critical skill that bends the cost curve back and makes the whole thing affordable.

Every PhD student of mine that became great wound up at 2 AM tuning hyperparameters. Your skill at tuning hyperparameters makes the difference between going home at 3 AM versus 7 AM.

The biggest difference between a team that gets things done in days versus months is the ability to drive a disciplined development process. Less experienced teams pick things at random to work on. They read a newspaper saying AI needs lots of data, then spend six months collecting data. Often, collecting more data does not help your application. But sometimes it's a huge help. How do you decide?

A very large family business bought a lot of GPUs and the CTO pointed to his nephew, a college undergrad, and said, "My nephew knows AI. I'm giving him this budget and I think he'll do AI for me." Knowing how to make decisions and not buying into newspaper hype is really important.

I've seen teams spend six months pursuing an approach that experienced engineers would have told them from day one was not going to work.

When you have deep learning skills, you have a bewildering ability to play in a huge range of applications. I've worked on autonomous helicopters, ad placement, web search rankings, e-commerce, speech recognition, making ships more fuel efficient. Fighting financial fraud is one of the most exhilarating things I've worked on. You wake up to a new scam, build algorithms in real-time, and every hour you're slower, more dollars leak out.

Bizarrely, some of my PhD students work on climate science. What do I know about climate science? But using machine learning tools, we work on climate modeling and geoengineering.

I categorize my software work into two buckets: quick and dirty prototypes, and production-grade robust software. AI-assisted coding has made the biggest difference for prototypes. For production software, I'm more careful. One collaborator used an agent decoder this morning and wiped out all database records. Fortunately, it was a test application with five users.

If you're building a prototype that only runs on your own laptop with no sensitive information, and you're not planning to maliciously hack into your own laptop, the security requirements can be lower. A sandbox environment lets you make decisions faster.

My teams will try 20 things and see what sticks. Some teams feel angst that many proof of concepts never make it into production. I have a different view. If the cost of a proof of concept is low enough, who cares if you do 20 of them to find the one or two that work really well.

The output of a machine learning algorithm depends both on your code and on your data. You control the code 100%, but you don't really know what's in the weird and wonderful data the world has given you.

Even now when I work on speech recognition, I'm surprised that data has certain accents more than I realized, or people speak faster, or there's background noise in cars. The data the world gives us is often weird and wonderful. Only by building a system do you discover these things.

You control your code 100%, but you don't control how users will react to your system. Building quick prototypes to discover what's in the data and what users like allows faster feedback loops than ever before.

"Move fast and break things" got a bad rep because it broke things. Some teams concluded they should not move fast. That's a mistake. Move fast and be responsible. The most responsible teams I know are some of the fastest moving teams. Speed of execution lets you implement, discover what's in your data, discover what users want, and figure out what could go wrong.

Very senior business leaders have advised others not to learn to code because AI will automate it. This will be remembered as some of the worst career advice ever given. When coding becomes easier, more people should do it, not fewer.

When humanity went from punch cards to keyboards, that made coding easier, so more people learned to code. When we went from assembly to modern languages, more people learned to code. I found articles from when COBOL was invented saying "Coding is so easy now. Who needs programmers?" Obviously, the opposite happened.

I know many businesses that would love to hire 1,000 people with GenAI and deep learning skills but are struggling to find them. Meanwhile, universities with curricula unchanged since 2002 are producing grads struggling to find jobs because that older skill set is not in demand.

Today I will not hire a software engineer that doesn't use AI for coding. Same reason I won't hire someone using a punch card. Eventually those jobs just went away. It just doesn't make sense anymore.

I interviewed two engineers back to back. One hadn't graduated college but was on top of GenAI coding, built programs, and shipped quickly. The other had 10 years of full stack experience but the same 2002 skill set. Never tried AI-assisted coding. I picked the fresh grad over someone with 10 years of experience.

I was in a coffee shop and someone next to me was coding by hand. It looked so strange. I asked what they were doing. They were doing homework from another university that required coding by hand.

My collaborator understood art history. He knew artistic genre, inspiration, palette, so he could prompt Midjourney with the language of art and generate beautiful pictures. I don't know art history. All I could type was "please make pretty pictures of robots for me." I could never get his level of control. We used all his pictures and none of mine.

The same thing is happening in computer science. One of the most important future skills is understanding how computers and AI work so you can use the language of AI to tell a computer exactly what you want. There's a huge performance difference between someone who just prompts an LLM without understanding how things work, versus someone who can analyze a problem and tell a computer how to take the next steps.

Let me rank productivity. Least productive: no experience, don't know AI. Next: decade of experience but don't know AI. Better: fresh grad who knows AI. Best: decade of experience and on top of AI. The best developers ship code like no one ever did even three years ago. They're very experienced and very on top of the latest AI.

A lot of employers haven't figured out how to hire appropriately. If a company has no one who knows GenAI, how do they even know how to interview for it?

How do you know if you have enough data for a neural network? It's really difficult to know. For applications others have worked on, you may have a sense. I worked on face recognition, so I know 50,000 unique faces is pretty good to start. But for greenfield projects no one has worked on, it's really hard to tell.

For completely new projects, get a little data and try training a model. Whether your initial model works or not will help you understand how much data you need. Sometimes 100 data points is all you need. Sometimes 100 billion data points later, we're still trying to get more.

The number of people training LLMs from scratch is actually very small. Those jobs are incredibly well-paid, which is why we hear about high salaries. But the vast majority of builders work at the GenAI level or use deep learning tools, not training transformers from scratch.

What we do a lot is take a pre-trained transformer and engineer our own data to fine-tune it. That's important for getting products to work. Training the largest cutting-edge transformers is a very important but niche skill. The number of people doing that is small. The number of people building applications with deep learning is very large.

I was speaking with a mathematician who had stars in his eyes when he told me he chose his career to pursue truth and beauty in the universe. In this course, I'm not going to do any truth and beauty stuff. I want a very practical approach to building applications and software that works.

I encourage you to think of AI courses at Stanford like Pokemon. You got to catch them all.