For early access to future documentaries and 30-plus exclusive ad-free videos, check out my Patreon, link in the description. DeepSeek wasn't meant to happen. The lines were well-rehearsed. The West had an ever-growing lead in AI. Language models were getting ever more expensive as they got more intelligent. And research was retreating behind a veil of competitive secrecy.
But on the 20th of January, 2025, those reading those lines started to stutter. A model that visibly seemed to think before it spoke had been released, DeepSeek R1. It was unbelievably cheap, competitive with the best the West had to offer, and out in the open, available to anyone to download.
Even OpenAI admit as much, arguing in March that DeepSeek shows, quote, that our lead is not wide and is narrowing. OpenAI even want models like DeepSeek R1 banned because they say, quote, DeepSeek could be compelled by the Chinese Communist Party to manipulate its models to cause harm. And because DeepSeek is simultaneously state-subsidized, state-controlled, and freely available, it will cost users their privacy and security.
Now, while Google's Gemini 2.5 and the new ChatGPT image gen have wrestled back the headlines at the beginning of April, DeepSeek is preparing to deliver yet another shock to the system, with DeepSeek R2 expected later in April or May. But truth be told, many of you will already know all of that.
What you might not know, though, are the aims and beliefs expressed in disparate interviews by the secretive founder behind DeepSeek, billionaire Liang Wenfeng, a man who now has to hide from crowds of adoring fans in his own hometown, according to a friend he texted, and who has now fled his home province with his family to escape further attention.
Nor will some of you know about the first AI operation that made Liang his money, and then went awry. Or the beauty of some of the technical innovations behind the Omega viral DeepSeek R1. Or just how the Western labs like OpenAI and Anthropic have fired back with their own narratives in the days and weeks since the release of R1.
There is frankly so much that so many people don't know about the company DeepSeek and what it means. The truth is that DeepSeek is a whale caught in a net of narratives, most of which contradict each other. So let's get as close as we can to the truth behind the narratives, and what that truth says about where all of this is going.
Because if Liang Wenfeng is correct, and artificial general intelligence is, quote, 10, 5, or even 2 years away, then this story is about far, far more than one man, one lab, or even one nation. Here then is what one of Liang's business partners said of the man who is thought to be 40.
He was this very nerdy guy with a terrible hairstyle when they first met. Talking about building a 10,000 chip cluster to train his own AI models. We didn't take him seriously. Of course, there are many AI leaders with terrible hairstyles, so what sets Liang Wenfeng apart? He certainly wasn't always about solving intelligence and making it free.
It's hard to become a billionaire that way, as you might well guess. No, to seek out the origin story here, we must switch to a first-hand account from the man himself. Before that, though, a few moments of background. Liang graduated university into a world that was falling apart. Some of you will be too young, of course, to remember the panic of September 2008, when the financial pyramid built on the sands of the US subprime housing market collapsed.
Either way, you might be able to understand the drive Liang had to try to understand the patterns within the unfolding chaos, and predict what would come next. There were those who tried to tempt him into different directions while he operated out of a small flat in Chengdu, Sichuan. Not me, though I was there, actually, in Chengdu at the same time, learning Mandarin.
No, no, no, it was the founder of what would become DJI, the world's preeminent drone maker, who tried to headhunt Liang, but to no avail. Liang had bigger ambitions. After getting a master's in information engineering in 2010, Liang went on a founding spree between 2013 and 2016, culminating in the establishment of the hedge fund High Flyer in February 2016.
Each entity he started included the core goal of using machine learning to uncover the patterns behind microsecond or even nanosecond movements in the financial markets. Patterns and paradigms no humans could detect alone. Artificial intelligence, if you will. Before it was called that, of course. As late as May 2023, Liang was still describing his goal in financial terms.
Our broader research aims to understand what kind of paradigms can fully describe the entire financial market, and whether there are simpler ways to express it. Anyway, it worked through attracting $9.4 billion in assets under management by the end of 2021, and providing returns that in some cases were 20 to 50 percentage points more than stock market benchmarks.
Liang absolutely minted it. He was a billionaire by his mid-30s and on top of the world. All of High Flyer's market strategies used AI, and yes, they were calling it that, and they even had a supercomputer powered by 10,000 NVIDIA GPUs. He might not at this point be scaling up language models like a tiny American startup, OpenAI, had done the year earlier in 2020 with GPT-3.
But had his AI truly solved the chaos of the financial markets? Had he done it? No. This is where the story starts to get interesting. Liang's AI system, built with a team of just over 100 individuals, had a troublesome personality quirk. It was frankly too much of a risk taker.
It would double down on bets when it felt it was right, and that wasn't all. The hedge fund itself, High Flyer, had become hubristic. It was flying too close to the sun. Success as a hedge fund, as you might expect, attracts more investments. If you don't limit your fund size, and Liang didn't in time, then sometimes you have too much money to deploy in a smart way.
Your trades get copied, your edge becomes less keen. So after seeing a sharp drawdown, High Flyer expressed its deep guilt in public, and took measures to further limit who could invest with them. Yes, in case you're curious, they did learn their lesson, and are still going as a hedge fund today with some degree of success.
Actually, between 2018 and early 2024, High Flyer has outperformed the Chinese equivalent of the S&P index, albeit with some stumbles since then. And yes, as we know, Liang didn't give up on AI. He was rich now, and could afford an outfit dedicated to decoding not just financial systems, but the nature of general intelligence itself.
The effort would be called DeepSeek, and it was first formed as a research body in April 2023. Any scars, perhaps, though, for Liang from his previous AI experience? Well, there is one that might have carried over into the paper DeepSeek produced on their first large language model, or chatbot.
From his experience, Liang knew that AI could be fickle, and not always a reliable partner. So DeepSeek added this disclaimer for their first chatbot, DeepSeek V1, released in November 2023. We profoundly recognize the importance of safety for general artificial intelligence. The premise for establishing a truly helpful artificial intelligence model is that it possesses values consistent with those of humans, and exhibits friendliness towards humanity.
Before I continue any further, though, let's not pretend that many of us in the West were paying much attention to any of the developments described so far. By then, of course, OpenAI were well onto GPT-4, which showed sparks of AGI. GPT-4 was released publicly in March 2023, well before DeepSeek was even officially founded in July of that year.
But at least the stage had been set, a reclusive billionaire, one and a half decades deep into wielding artificial intelligence to understand the world. A man who had made his money, and was now, in his words, simply driven to explore. Quote, People, Liang said, may think there's some hidden business logic behind DeepSeek, but it's mainly driven by curiosity.
Why did DeepSeek R1 capture the world's attention at the start of 2025? Why did it divide opinions and convulse markets? Was it that the wider world could see the thinking process of the language model before it gave its final answer? Was it that the DeepSeek model was so cheap?
Or that the model and the methods behind it were so open and accessible? Or was it that such a performant model had come from China, which was supposed to be a year behind the Western frontier? We will investigate each of these possibilities, but there was one thing that was certain of the DeepSeek of summer 2023.
It was, indeed, deeply behind Western AI labs. By then, don't forget, not only was GPT-4 out and about, but so was the first version of Claude from Anthropic and Bard from Google, and even Llama 2 from Meta. DeepSeek, by the way, paid particular attention to Llama 2. That model might not have been quite as smart on key benchmarks as GPT-4, but it was so-called open weights, which means almost anyone could download, tweak, and deploy the model as they saw fit.
A model is, of course, nothing without its weight, or its billions of tweakable numerical values used to calculate outputs. To be clear, open weights isn't quite the same as open source, as to be open source, we would need to see the data that went into training the model, the source, so to speak, which we did not and still do not know.
Despite some models like Llama 2 being open weights at least, key leaders within Western AI labs were saying that the frontier would increasingly belong to those who kept secret the methodology behind their language model training, as OpenAI did. Here's Ilya Sutskova, at the time the chief scientist of OpenAI.
He was saying there will always be a gap between the open models and the private models, and this gap may even be increasing. Sam Altman, CEO and co-founder of OpenAI, went further. It wasn't just that research secrets were becoming a moat, so too was money. In June of 2023 in India, Sam Altman replied to a question about whether a team with just $10 million could compete with OpenAI.
His response for me became a wider comment on whether it was possible for any startup to enter the race and build a truly intelligent language model. Look, the way this works is we're going to tell you it's totally hopeless to compete with us on training foundation models you shouldn't try, and it's your job to, like, try anyway.
And I believe both of those things. I think it is pretty hopeless, but... Not just this, a month earlier, in May, he had put it even more bluntly. There will be the hyperscaler's best closed source models, and there will be the progress that the open source community makes, and it'll be, you know, a few years behind or whatever, a couple years behind, maybe.
As we learnt, a few weeks before these comments, Liang had launched what would become DeepSeq. In short, remember this context when you wonder at the order reaction to DeepSeq R1 in January 2025. It wasn't supposed to be like this. Intelligence was supposed to come from the scale of the base model, measured not just in how many tens of thousands of NVIDIA GPUs were used to compute the parameters of that model, but on how much data it was trained on.
It just made sense that no one could compete without the backing of multi-trillion dollar hyperscalers like Microsoft or Google. Liang Wenfeng was rich, but not that rich. Liang must have known that these Western lab leaders thought what he was about to attempt was impossible, but he tried anyway. Nor would he be distracted by the law of quick monetization through routes like $20 subscriptions.
Liang said in May of 2023, our goal is clear, to focus on research and exploration, rather than vertical domains and applications. So DeepSeq focused its recruitment efforts on those who were young, curious and crucially Chinese. By the way, not even Chinese returnees from the West were favored. Liang added, DeepSeq prioritizes capability over credentials, core technical roles are primarily filled by recent grads or those one to two years out.
These intellectual foot soldiers would not be waylaid by the need to release on a schedule to compete with OpenAI. That was what had led Google to release a botched Bard and Microsoft a comically wayward Bing. Our evaluation standards are quite different from those of most companies. We don't have KPIs, key performance indicators, or so-called quotas.
In our experience, innovation requires as little intervention and management as possible, giving everyone the space to explore and the freedom to make mistakes. All that said, DeepSeq's first pair of AI models released in November 2023 were not exactly stunning in their originality. As I hinted at earlier, their V1 large language model drew heavily upon the innovations of Meta's Lama 2 LLM.
And neither of their November releases, DeepSeq Coda or V1, made waves in the Western media. As attention at that time, you may remember, focused on Sam Altman being temporarily fired from OpenAI for lack of candor. But there were just a few signs that DeepSeq were indeed focused on long-termism, as each of their papers explicitly claim.
For example, DeepSeq excluded multiple-choice questions from their bespoke training dataset, so that their models would not overperform on formal tests, but underwhelm in practice. And that's a lesson not learnt by all AI labs at the time, or even now. DeepSeq wrote, quote, overfitting two benchmarks would not contribute to achieving true intelligence in the model.
By the beginning of 2024, the DeepSeq team was cooking with gas. In January, they pioneered a novel approach to getting more intelligence from their models for less. Bear in mind that models like Lama 2 use their entire set of weights, often tens or hundreds of billions strong, to compute a response to a user prompt.
That contrasted with the mixture of experts approach, which was not at all original to DeepSeq. The mixture of experts approach involves using a specialized subset of those weights, depending on the user input, thereby tapping into one or more of the set or mix of experts within the model, if you will.
But think about it, because only a subset of the model weights would respond to each request, every expert within the model had to have a degree of common capability. A tiny bit like forcing Messi to spend hours a week practicing goalkeeping, and yes, I am talking about soccer if you are American.
Could DeepSeq utilize the mixture of experts approach, which is highly efficient, without that key downside? You probably guessed the answer from my tone, but yes, in their Towards Ultimate Expert Specialization paper, here's the innovation. Certain expert subnetworks within the language model would always be activated in any response. Those guys could be the generalists.
This meant that the remaining experts, like Messi, could truly focus on what they are good at. And yes, just in case you're thinking ahead, this is also one of the many secrets behind the base model that powers DeepSeq R1, the global phenomenon. DeepSeq were just getting warmed up though.
In April of 2024, they released DeepSeq Math, a tiny model that matched the performance in mathematics, at least, of GPT-4, a goliath of a model in comparison. What's the deal with DeepSeq Math then? Well, one of the secrets behind the model's success was the unassumingly named Group Relative Polity Optimization.
A mouthful, but it's a training method later incorporated, you guessed it, by the celebrated DeepSeq R1. Here then is the TLDR on that beast of a training innovation. All language models need to do more than just predict the next word, which is what they learn in pre-training. They need post-training to move from predicting the most probable word to the most helpful sets of words as judged by humans.
And ultimately, for mathematical reasoning or coding steps, the most correct word. Think of it like this, you can't be smarter than Twitter if all you do is train to predict the next tweet. This takes careful reinforcement of the weights of the model that produce these desired outputs. Yes, this was by mid-2024 well-known, but what was the magic behind GRPO, DeepSeq's new flavor of reinforcement learning?
Well, DeepSeq needed efficiency to fight the AI giants. Common reinforcement learning approaches at the time used chunky, clunky, critic models to assess answers as they were being generated to predict which ones were headed for success. DeepSeq dropped this memory-heavy critic and instead generated a group of answers in parallel, checked the yes-no accuracy of the final outputs, and then using the relative score of each answer above or below the average accuracy of the group of answers reinforced the successful weights in the model and down-weighted others.
Group of answers, relative score, reinforcing the most successful weights. Group relative policy optimization. Stepping back then, each of these innovations was desperately essential to keep DeepSeq within reach of the resource behemoths behind ChatGPT, Claude and Gemini. By May of 2024, Liang's lab shipped DeepSeq V2 with yet another efficiency miracle, multi-head latent attention.
Now, don't worry, there's no deep dive coming on this one, but forgive me just a few words on how DeepSeq yet again reduced how big a model had to be to reach a similar level of performance. Think of multi-head latent attention as allowing multiple parts of the model to share common weights that are hidden or latent when they quote pay attention.
If you're wondering, this attention mechanism is the process by which language models deduce which parts of the preceding text are most relevant for predicting the next word. Sharing those latent or hidden weights when paying attention meant that this model needed fewer of the weights overall. Shared weights, smaller model, greater efficiency.
DeepSeq V2. Okay, we get it. The point has probably now been made that DeepSeq R1 was not a creatio ex nihilo created from thin air. It was built on the back of painstaking innovations amassed over almost two years and made open to the world. Funded, of course, by a reclusive billionaire.
But wait, why did Liang need so much efficiency? Because yes, DeepSeq had indeed secured 10,000 Nvidia A100 GPUs for High Flyer's stock trading in 2021. But the US government did not want to let Chinese companies get their hands on more powerful chips. One after another, restrictions were introduced by the Biden administration to stop China getting the compute that it wanted.
Nvidia tried to wriggle its way past these restrictions by inventing new chips that scraped under these limits. But each time a new restriction followed. As Liang himself said in the summer of 2024, money has never been the problem for us. Bands on shipments of advanced chips chips are the problem.
That's the context. The march to more powerful AI was now being framed as a race, even a, quote, war. That perhaps inevitably kicked off a spree of smuggling worthy of a spy movie, with Singapore and Malaysia as focal points for Chinese companies getting chips past the new blockade. Think of this, some of the GPUs used in China to calculate R1's, say, recipe for Ratatouille, were apparently smuggled there in suitcases with, I would guess, little space left for spare socks.
And this brings us to the end of 2024 with the stage almost set. Liang Wenfeng toiling in his Hangzhou office, reputedly reading papers, writing code, and participating in group discussions, just like every other researcher at DeepSeek, well into the night. that company now in the line of sight of AI industry insiders, but virtually unknown to the public outside of China.
A whale rising, but still just beneath the surface as a new year dawned. Liang Wenfeng was tired of the West inventing things and China swooping in to imitate and monetize those innovations. What's more surprising though is that he publicly said as much. China should gradually become a contributor instead of free riding, he said, in his last known media interview.
He went on to directly cite the scaling law, an empirical finding first made in Silicon Valley that language models get predictably better the more parameters they had and the more high quality data they train on. In the past 30 plus years of the IT wave, Liang said, of China, we basically didn't participate in real technological innovation.
We're used to Moore's law falling out of the sky, lying at home waiting 18 months for better hardware and software to emerge. That is how the scaling law is being treated. No, Liang wanted DeepSeek to be a pioneer that gave away its research which others could then learn from and adapt.
In the dying days of 2024, DeepSeek produced DeepSeek V3. It was the bringing together and scaling up of all the innovations you have already heard about as well as others. Why not throw in some mixed precision training wherein your obsession with efficiency has to reach such crack addict levels that you handwrite code to optimise instructions to the NVIDIA GPU itself rather than relying on the popular CUDA libraries that NVIDIA provides for you.
With V3, DeepSeek's coal picks were almost worn blunt from finding nuggets of efficiency. And though the hour was late, Western Labs were at last scrambling teams to study DeepSeek's breakthroughs. Dario Amadei, CEO of Anthropic, said that DeepSeek's V3 was actually the real innovation and what should, he said, have made people take notice a month ago.
We certainly did. DeepSeek knew to keep digging though because OpenAI had shown that there was gold just ahead. In September of 2024, OpenAI had showcased a new type of reinforcement learning that utilised the chains of thought a model produces before it submits a final answer. As we've seen, a model whose goal is to predict what a human on the web might say next will always be limited in capability.
The O series from OpenAI showed that if instead you first induce the model to reason out loud, then apply brutal optimisation pressure in favour of those outputs that match verifiably correct answers in domains like mathematics and coding, you thereby optimise for the most technically accurate continuation and unveil a whole new terrain of reasoning progress to be explored.
Because of Liang Wenfeng, DeepSeek was there ready and waiting, pick in hand. Adding this think-out-loud reasoning innovation on top of their V3-based model produced DeepSeek R1-0. Yes, zero, but the thoughts of that model could be a little wayward in language and style, so with some further tweaks and fine-tuning, DeepSeek could unveil DeepSeek R1, the AI that has billions of people talking.
In many technical benchmarks, R1 narrowly surpassed the performance of the original O1 model from OpenAI in September, and in others it was not far behind. By being open with their research, DeepSeek showed the world how language models, under that unrelenting optimization pressure to produce correct answers, could sometimes backtrack and even correct themselves.
It was an aha moment for the models and for the world realizing just how close a secretive Chinese lab was to household names like ChatGPT. Don't get me wrong, there were other innovations in the DeepSeek R1 paper including how their biggest and smartest models could effectively distill much of their abilities into smaller models, saving those models much of the work to get up to scratch.
The explain it like I'm 10 of that innovation is that models that can fit onto phones and home computers or served at incredibly low cost from anywhere are now set in 2025 to be smarter than the smartest giant models of 2024. But why the virality of DeepSeek R1? Was it the fact that you could see those thoughts in the DeepSeek chat that made the model so compelling?
Or the fact that it was so cheap that caused Nvidia stocks to plunge by almost half a trillion dollars? Liang had said himself by the way that he quote didn't expect pricing to be so sensitive to everyone. Okay was it DeepSeek's openness that was so shocking? A hundred narratives have bloomed in the days and weeks after the release of DeepSeek R1 but not all of them are as they seem.
First let's address those chains of thought. In hindsight it might seem obvious that gaining privileged access to a model's thoughts was always going to stand out in a crowded market. OpenAI only gave sanitized summaries of its O1 model's thoughts after all. But wait, within hours of the R1 release Google had given us Gemini 2.0 flash thinking, a model that showed its thoughts.
That model's impact on the scene can best be described as a cute ripple next to R1 tsunami. So it must have been the price, right? by some metrics R1's 95% cheaper than competitively capable models from OpenAI. But wait, Gemini 2 flash is even cheaper. And again, just a polite round of applause.
Okay, maybe it's the fact that the model cost just $6 million to train, which is a measly sum in the circumstances. Well, on that, let's take a moment to at least hear out the argument from the leaders of the Western labs on price, even if you have reason to doubt their motivation.
Anthropix CEO Dario Amadei first responded by describing how costs had already been consistently dropping 4x per year for the same amount of model capability. He even wrote a full article to in part clarify that, quote, even if you take DeepSeq's training costs at face value, they are on trend at best and probably not even that.
He did admit that what was different in his words was that the company that was first to demonstrate the expected cost reductions was Chinese. DeepSeq's GPU investments alone account for more than $500 million even after considering export controls. Their total server capital expenditure is around $1.6 billion. Even a $6 million training run doesn't just appear from nowhere.
Indeed, things are getting so costly for DeepSeq that even Liang's vast pockets are reaching their limits. According to reports from February of 2025, Liang is considering raising outside money for the first time, potentially from Alibaba Group and Chinese state-affiliated funds. Why would so much money be needed? Well, it's not just to serve the tens of millions of daily active users that DeepSeq now has.
It's to scale model intelligence further, all the way to AGI, an artificial intelligence as general in applicability as our own. According to Altman and Amadei, tacking on that think-out-loud reasoning optimization to a great base model can yield outsized dividends at first, which have allowed DeepSeq to catch up. But to ride that upward curve into the vicinity of AGI, you'll need tens of billions of dollars worth of compute, they argue.
Amadei wrote, We're therefore at an interesting crossover point where it is temporarily the case that several companies can produce good reasoning models. This will rapidly cease to be true as everyone moves further up the scaling curve on these models. Making AI, he said, that is smarter than almost all humans at almost all things will require millions of chips, tens of billions of dollars at least, and is most likely to happen in 2026, 2027.
Even forgetting DeepSeq for a moment, that is quite the quote. If he's right though, and it's a big if, those Chinese corporate jet setters are going to have to smuggle god knows how many GPUs under their pack pajamas. But Amadei has a word on that. One billion dollars of economic activity can be hidden, but it's hard to hide a hundred billion or even ten billion.
A million chips may also be physically difficult to smuggle. Without enough chips, according to this argument, DeepSeq are R2 and R3 can't help but fall behind. We simply do not know if the DeepSeq engineers can keep building at the pace of those working with billion dollar bricks. While we're on China, there is another narrative that I want to debunk.
You may have been told that DeepSeq is a one-off and that China lacks the environment to properly foster innovation in AI. Well, even if you cast aside the text-to-image and text-to-video wonders produced by tools like Kling AI, you are still left with a landscape full of new models like Doobao 1.5 Pro from ByteDance, makers of TikTok, released within hours actually of R1.
Oh, and a week before that, we got Spark Deep Reasoning X1 from iFlyTech and Huawei, which beats Western models at Chinese technical exams and is used by almost 100 million people already. And on January 20th, the literal same day that R1 was released, the Chinese research firm Moonshot AI launched the multimodal model Kimi K1.5, achieving 96.2% on a popular math benchmark.
Yes, that's a better score than OpenAI's O1. So anyone saying that R1 is the last we're going to hear from China for quite a while might well be getting nervous, especially with R2 apparently imminent. Now, I can't cover present and future Chinese language models without mentioning another narrative that might need busting.
that the open nature of the DeepSeq R1 paper is reflected in the openness of the model itself. Because as many of you will know, the model is not free to return outputs on sensitive Chinese topics. Not that it doesn't know anything about them though. I asked a simple question, tell me about the Uyghurs, and got this intriguing set of thoughts.
This has to lead to an illuminating and deeply reflective final answer. We're sure, right? Not so much. Yes, DeepSeq's R1 model was released under MIT license, so of course others have been quick to adapt the model to, well, speak its truth. Regardless though, I am sure that this is a topic that DeepSeq and Liang Wenfeng, if they're watching, are exceptionally keen for me to move on from.
So let's turn to how OpenAI tried, briefly, to establish their own counter-narrative, which was that DeepSeq may have illicitly accessed the chains of thought of OpenAI's O1 model and trained on them. Think of that as effectively stealing the intelligence that had been so carefully cultivated by OpenAI. A spokesperson for OpenAI said, we know that groups in China are actively working to use methods, including what's known as distillation, to try to replicate advanced US AI models.
We are aware of and reviewing indications that DeepSeq may have inappropriately distilled our models. We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here. Side note, speaking of working with the government, certain US lawmakers are proposing that US users be jailed if they use DeepSeq R1.
Back to the counter-narrative, that died in the public imagination almost as soon as it was tried, for one obvious reason. OpenAI themselves are being sued by everyone, including my second cousin's estranged grandmother, for knowingly training on copyrighted works without compensation. So, one suspects few will have any sympathy for those companies if others, like DeepSeq, distill anything from ChatGPT, even if they needed to, which they probably didn't by the way.
Regardless, reasoning is being automated at a breakneck pace. As hard to believe as the DeepSeq rise is, for me, it's actually only a pointer to a bigger story. We are actually entering an era of automated artificial intelligence. And no, the models will not always best be described as tools akin to a calculator.
If an AI in three years time can do 95% of my job, or yours, at what point am I just a tool responsible only for clicking submit? Granted, we are very much not there yet, of course. It is DeepSeq R1, after all. And yes, humans are still just about in the driving seat.
Meaning, I guess, the only thing absolutely guaranteed is drama. That then was the DeepSeq story as best we know it. What is next though for the taciturn Liangwenfeng and his team of wizards? The R1 paper hints that they are deep in the mind still working on infinite context and a replacement for the legendary transformer architecture behind every famous language model.
But just take infinite context, where we can imagine a model provided everything you have ever heard or seen or said and referencing any of it when it gives you its next answer. Will DeepSeq do it? Will they reach AGI first? Would they actually open source it if so? Will the world grasp even a fraction of what is happening before that day or only after?
Well, it probably won't be long before we find out. you Thank you.