AI/ML/NLP Resources
Timeline
Maintaining a ml_timeline, inspired by @osanseviero
Resources
- Large Language Modles Explained - Timothy Lee (journalist) spends a couple months writing up this explainer of LLMs
- What are Embeddings - Vicki Boykis deep dive into embeddings starting from fundamentals, PDF here
- LLM Bootcamp - from Full Stack Deep Learning, 2 day bootcamp on LLMs run during Spring 2023
- Dive into Deep Learning - Comprehensive guide to deep learning
- Fast.ai
- Google’s Deep learning tuning playbook
- Course Notes from Andrew Ng’s Deep Learning Specialization
- Jay Alammar’s personal site - Visualizing machine learning one concept at a time
- Practical Deep Learning for Coders
- What is Prompt Engineering - like how Googling became a skill (aka “Google-fu”), I think Prompt Engineering is an important skill to develop
- awesome-chatgpt-prompts - A curated list of awesome ChatGPT prompts. I like “Act as a Linux Terminal” prompt.
- Prompt Engineering Guide - “Motivated by the high interest in developing with LLMs, we have created this new prompt engineering guide that contains all the latest papers, learning guides, lectures, references, and tools related to prompt engineering.” Code: repo
Podcasts
- DeepPapers - Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research
YouTube
- @lexfridman - and associated transcripts
- @AndrejKarpathy
- @jamesbriggs
- @ai-explained-
Libraries / Tools
- Github Copilot - I use Copilot in my IDE, VS Code and it’s dramatically improved my producitivity (10-20%?). More than that it makes coding less tedious and lowers the activiation energy for coding tasks. For example generating docstrings is trivial (and happens much more frequently!). And because the recommendations are inline, the developer’s ‘flow’ is not broken. I also moved from Jupyter Notebooks in a browser to using Jupyter in VS Code. Radek Omulski has a blog post for how to set this up.
- LangChain - Building applications with LLMs through composability
- llama_index - LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM’s with external data.
- streamlit - Python framework for buliding UIs. I’ve used this a lot for data science demos. Resources to inspire you: awesome-streamlit and Streamlit’s gallery
- gradio - similar to Streamlit but more for ML/NLP models.
- marvin - Meet Marvin: a batteries-included library for building AI-powered software. Marvin’s job is to integrate AI directly into your codebase by making it look and feel like any other function.
Ethical AI
Transformers
- transformers
- 2023-04-24 - Brandon Rohrer’s transformers from scratch tutorial
- 2023-04-16 - Sebastian Raschka explains the historical evolution of transformers here
- 2023-01-27 - The Transfomer Family Version 2.0
- 2023-01-17 - Let’s build GPT: from scratch, in code, spelled out - fantastic, in-depth tutorial by Andrej Karpathy building up transformers from scratch
- 2022-01-03 - Illustrated Retrieval Transformer
- 2020-12-17 - Explaining Transformers
- 2020-07-27 - How GPT3 Works Visually
- 2020-04-07 - The Transformer Family
- 2019-08-12 - Illustrated GPT2
- 2018-06-27 - Illustrated Transformer
- 2018-04-03 - Annotated Transformer
- 2017-06-12 - Attention is All You Need
GPT
Model | Parameters | Number of Transformer Layers | Tokens Trained On (Estimated) | GPU Hours / Cost to Train | Release Date | Changes | Link to Paper |
---|---|---|---|---|---|---|---|
GPT-1 | 117M | 12 | 8.3 billion | Not publicly disclosed | June 2018 | Initial release of the GPT architecture | paper |
GPT-2 | 1.5B | 48 (largest model) | 40 billion | ~256 GPU hours (estimated) | February 2019 | Increased model size, dataset, and training compute | paper |
GPT-3 | 175B | 96 (largest model) | 45 terabytes (raw text) | ~355,000 GPU hours (estimated) | June 2020 | Alternates dense and sprase self-attention layers. Further increase in model size, dataset, and training compute | papers |
GPT-3.5 / ChatGPT | 355B | ? | ? | ? | 2022-11-30 | RLHF Alignment | post |
GPT-4 | 1 trillion (?) | 196 (largest model) | 100 terabytes (raw text) | Not publicly disclosed | 2023-03-14 | Context windows of 8,192 and 32,768 tokens, introduction of the System Message, multi-modal (images and text) | GPT-4 Technical Report |
The above table was generated by GPT-4 using the prompt “For each of the models GPT-1, GPT-2, GPT-3, and GPT-4, can you provide a table with the following fields? | Model | Parameters | Number of Transformer Layers | Tokens Trained On (Estimated) | GPU Hours / Cost to Train | Release Date | Changes | Link to Paper | Can you provide the table in markdown code but wrap it in ``` so it doesn’t get rendered by the browser” and then edited
Predictions
- 2023-04-29 - Jason Calacanis - The cost of knowledge work will drop by 90% in the next 5-10 quarters. David Friedberg - Not so much worried about jobs but all the new products that will be built based on Generative AI tech: “what’s going to come from that is a whole set of new products and ideas and things that we are certainly not thinking about today, but in six months, is going to become almost mainstay. And many new categories of products, many new industries, many new businesses are going to emerge that we’re not even thinking about. So the Ludite argument of, oh, this is going to destroy jobs and destroy the economy and drop costs by 90% lawyers are going to get cheaper, etc, etc. I think that doesn’t even matter. It’s the tip of the iceberg. What’s more exciting is all the new evolutionary stuff that’s going to hit the market that’s really going to transform the things that we can do, and that we didn’t realize we could do.
- 2023-04-01 - @AllenDowney predicts “The great majority of coding will be LLM-assisted, starting now.” (tweet, blog)
- 2020-05-08
- Ilya Sutskever - Doesn’t think backpropogation will be replaced.
- Ilya Sutskever - It is no longer possible for one person with one GPU to make significant breakthroughs in deep learning research. The deep learning “stack” is too deep. Ilya describes the stack as: ideas, systems to build datasets, distributed programming, building the actual cluster, GPU programming and putting it all together. “It can be quite hard for a single person to become world class in every single layer of the stack.” OpenAI’s technical paper: enumerate the number of people in each list.
- Ilya Sutskever - I think that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today… Humans can reason. So why can’t neural networks?