EP 31 Dig into the research breakthroughs behind ChatGPT – transformers, self-attention and more – with one of the inventors: Noam Shazeer

Listen now on:



The best applications are things we have not thought of.

ChatGPT has been all the rage in recent weeks. But underlying it are several key developments in AI – “transformers”, “self-attention”, LLMs and more.

This week we dive deep with Noam Shazeer, founder of Character.AI, Google veteran, and a key contributor to several key developments in AI including transformers, Mesh-Tensorflow, T5, and Google’s LaMDA dialog system. 

We cover: the evolution of Google and AI, transformers, LLMs, neural networks, commercialization of Google’s research, future of ChatGPT and AI, engineering philosophy, his work at Character.AI, and much more!

Leave a comment, rating, or tag us on Twitter – we repost our favorite ones! 

Where to find Noam Shazeer:

• Character.AI: https://beta.character.ai 

• Google Scholar: Link  

Where to find Us:

• Aarthi and Sriram’s Good Time Show: Youtube, Substack, Twitter

• Aarthi Ramamurthy: Twitter, Instagram

• Sriram Krishnan: Twitter, Instagram, Blog

Notable Quotes: 

  1. “Paul Buchheit asked me how I do a spell corrector and then I ended up writing the first good spell corrector at Google.”

  2. “Larry Page one day decided that managers were bad and essentially made Google into a flat organization; two decades later, when you look at what Elon Musk is doing at Twitter, there are some shades of similarities, which is, let’s cut out a lot of middle management and let’s have the engineers do the things that they do best.” – Sriram

  3. “I like to think of [the prediction process in neural networks] as a really talented improvisational actor.”

  4. “The best applications are things we have not thought of.”


  1. Character.ai – AI powered chatbot built by Noam Shazeer and Daniel De Freitas

  2. Transformer research papers co-authored by Noam Shazeer

  3. Google’s LaMDA: Language Models for Dialog Applications

  4. Paul Buchheit, creator of Gmail

  5. Larry Page’s firing managers at Google to make it a flat organization

  6. Large Language Models (LLMs)

  7. Unsupervised methods: topic modeling and clustering

  8. Latent Dirichlet Allocationgenerative statistical model

  9. Google AdSense

  10. Bayesian Networks

  11. A plan for spam by Paul Graham

  12. Jeff Dean and Google Brain Team

  13. What are neural networks? – article by IBM

  14. Noam Shazeer’s talk at WeCNLP 2018 on NLP

  15. Attention Is All You Need, research paper

  16. Recurrent Neural Networks

  17. Long Short-Term Memory, by Sepp Hochreiter,  Jürgen Schmidhuber

  18. Parallel Computing

  19. What’s the difference between Attention and Self-Attention? By Angelina Yang

  20. DALL.E by OpenAI

  21. Training on TPU Pods

  22. Mesh TensorFlow

  23. Google’s 20% rule

  24. Parasocial Interactions

  25. Github Copilot – AI pair programmer

In this episode, we cover:

[01:05] Breaking into Google

[05:00] Evolution of Google with Noam Shazeer

[09:43] History and Evolution of AI

[16:15] ELI5: What is a neural network?

[18:50] ELI5: What are LLMs?

[28:16] Engineering Philosophy

[31:19 ] Attention Is All You Need

[39:01] Why hasn’t Google productized much of their research?

[44:34] Character.ai

[50:38] Are tech giants slow?

[53:48] Future of ChatGPT & Character.ai

[01:00:16] Advice for AI startups

[01:03:06] What do humans want from AI?