AI/ML/NLP Resources

Timeline

Maintaining a ml_timeline, inspired by @osanseviero

Resources

Podcasts

  • DeepPapers - Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research

YouTube

Twitter

Newsletters

Libraries / Tools

  • Github Copilot - I use Copilot in my IDE, VS Code and it’s dramatically improved my producitivity (10-20%?). More than that it makes coding less tedious and lowers the activiation energy for coding tasks. For example generating docstrings is trivial (and happens much more frequently!). And because the recommendations are inline, the developer’s ‘flow’ is not broken. I also moved from Jupyter Notebooks in a browser to using Jupyter in VS Code. Radek Omulski has a blog post for how to set this up.
  • LangChain - Building applications with LLMs through composability
  • llama_index - LlamaIndex (GPT Index) is a project that provides a central interface to connect your LLM’s with external data.
  • streamlit - Python framework for buliding UIs. I’ve used this a lot for data science demos. Resources to inspire you: awesome-streamlit and Streamlit’s gallery
  • gradio - similar to Streamlit but more for ML/NLP models.
  • marvin - Meet Marvin: a batteries-included library for building AI-powered software. Marvin’s job is to integrate AI directly into your codebase by making it look and feel like any other function.

Ethical AI

Transformers

GPT

Model Parameters Number of Transformer Layers Tokens Trained On (Estimated) GPU Hours / Cost to Train Release Date Changes Link to Paper
GPT-1 117M 12 8.3 billion Not publicly disclosed June 2018 Initial release of the GPT architecture paper
GPT-2 1.5B 48 (largest model) 40 billion ~256 GPU hours (estimated) February 2019 Increased model size, dataset, and training compute paper
GPT-3 175B 96 (largest model) 45 terabytes (raw text) ~355,000 GPU hours (estimated) June 2020 Alternates dense and sprase self-attention layers. Further increase in model size, dataset, and training compute papers
GPT-3.5 / ChatGPT 355B ? ? ? 2022-11-30 RLHF Alignment post
GPT-4 1 trillion (?) 196 (largest model) 100 terabytes (raw text) Not publicly disclosed 2023-03-14 Context windows of 8,192 and 32,768 tokens, introduction of the System Message, multi-modal (images and text) GPT-4 Technical Report

The above table was generated by GPT-4 using the prompt “For each of the models GPT-1, GPT-2, GPT-3, and GPT-4, can you provide a table with the following fields? | Model | Parameters | Number of Transformer Layers | Tokens Trained On (Estimated) | GPU Hours / Cost to Train | Release Date | Changes | Link to Paper | Can you provide the table in markdown code but wrap it in ``` so it doesn’t get rendered by the browser” and then edited

Predictions

  • 2023-04-29 - Jason Calacanis - The cost of knowledge work will drop by 90% in the next 5-10 quarters. David Friedberg - Not so much worried about jobs but all the new products that will be built based on Generative AI tech: “what’s going to come from that is a whole set of new products and ideas and things that we are certainly not thinking about today, but in six months, is going to become almost mainstay. And many new categories of products, many new industries, many new businesses are going to emerge that we’re not even thinking about. So the Ludite argument of, oh, this is going to destroy jobs and destroy the economy and drop costs by 90% lawyers are going to get cheaper, etc, etc. I think that doesn’t even matter. It’s the tip of the iceberg. What’s more exciting is all the new evolutionary stuff that’s going to hit the market that’s really going to transform the things that we can do, and that we didn’t realize we could do.
  • 2023-04-01 - @AllenDowney predicts “The great majority of coding will be LLM-assisted, starting now.” (tweet, blog)
  • 2020-05-08
    • Ilya Sutskever - Doesn’t think backpropogation will be replaced.
    • Ilya Sutskever - It is no longer possible for one person with one GPU to make significant breakthroughs in deep learning research. The deep learning “stack” is too deep. Ilya describes the stack as: ideas, systems to build datasets, distributed programming, building the actual cluster, GPU programming and putting it all together. “It can be quite hard for a single person to become world class in every single layer of the stack.” OpenAI’s technical paper: enumerate the number of people in each list.
    • Ilya Sutskever - I think that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today… Humans can reason. So why can’t neural networks?