9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper

Like buses, AI news can sometimes be slow and sometimes arrive all at once. In the last few days we have had dramatic new leaked insights into the sheer breadth of Google's Gemini. Just today we've had the release of Meta's Code Llama and earlier their impressive multilingual seamless M4T model.

And last but definitely not least, this 88 page AI consciousness report. And yes, I read it all, it's juicy so I'm saving that for the end. But let's start with two major paywalled articles, one from the Information and one from the New York Times, about Google's Gemini model. From both of them I counted a total of 9 new revelations, so let's get straight to it.

To give you a sense of timeline by the way, Google's newly merged AI SWOT team they call it, is preparing for a big fall or autumn launch. The takeaway for me from both articles is that Gemini is going to be the everything model. Did you know it's going to be the rival to Midjourney and Stable Diffusion?

Midjourney only has 11 full time staff so it is more than plausible that Google's Gemini could outperform Midjourney version 5. Next we may be able to create graphics with just text descriptions and control software using only text or voice commands. These next two are speculation so I'm not even counting them in the list of leaks.

But I've already covered in a previous video that Gemini has been trained on YouTube video transcripts. And the speculation is that by integrating video and audio into Gemini, it could perhaps help a mechanic diagnose a problem with a car repair based on a video. Or be a rival to Runway ML by generating advanced text to video based on descriptions of what a user wants to see.

You can start to see why I'm beginning to think of it as the everything model. Another leak is that one of the co-founders of Google, Sergey Brin is working on the front lines of Google Gemini. And lastly from this article I found it really interesting that Google's lawyers have been closely evaluating the training and they made researchers remove training data that had come from textbooks.

Even though those textbooks helped the model answer questions about subjects like astronomy or biology. And I do wonder if they privately benchmarked Gemini before removing that crucial data. But if that's not enough, prepare to also receive life advice. My theory here is that Google wants to compete directly for market share with Inflection's pie.

What if you want scientific, creative or professional writing? Yep, they're working on that too. In fact, we already know that Google has software named Genesis that they're pitching to the New York Times, which can generate news articles, rewrite them, suggest headlines, etc. But some people will be more interested in this feature that Google DeepMind is working on.

The ability to draft critiques of an argument and generate quizzes, word and number puzzles. It's almost easier at this point to ask what might Google Gemini not be able to do. And yes, this is not Gemini, but Google DeepMind is also using AI to design the next generation of semiconductors.

But if the fall seems far away, how about today when we got Code Llama from Meta? I spent much of the last two hours reading most of the 47 page paper and you can see Code Llama in action on screen. Some highlights include that the Code Llama models provide stable generations with up to 100,000 tokens of context.

Obviously, that could be used for generating longer programs or providing the model with more context from your code base to make the generations more relevant. It comes in three versions, Code Llama, Code Llama Instruct, which can better understand natural language instructions, and Code Llama Python, better, of course, at Python.

It's available for commercial use. And as you can see, some of the versions rival GPT 3.5 on human eval. That top score of 53.7% on Passat1 puts it in the same ballpark as Phi1. I've actually done a full video on Phi1, so do check that out. But that got 50.6%.

But it is about 25 times smaller at 1.3 billion parameters. Interestingly, the Code Llama paper, which also came out about two hours ago, mentions Phi1 directly, saying that it follows in a similar, similar spirit, but the difference is that Phi1 is closed source. Anyway, a couple more interesting things before we move on from Code Llama.

And the first one is the self-instruct method that they use. Let me know if you also find this fascinating, because step one was to generate 62,000 interview-style programming questions by prompting Llama2, the 70 billion parameter model. Then they removed duplicates in step two. But here's where it gets interesting.

For each of those questions, they first generated a unit test, by prompting Code Llama 7 billion parameters. Then they generated 10 Python solutions by prompting Code Llama. Finally, they ran unit tests on those 10 solutions, and they added the first solution that passes those tests, along with the corresponding question and test, to the self-instruct dataset.

If that sounded a bit complicated, let me try to distill it a bit. They asked the big brother Llama2 model to generate questions, then got the little brother Code Llama to generate tests for those questions. Then they added the little brother Code Llama to generate tests for those questions.

Then got the model to generate solutions to its own tests, found the good solutions that don't forget it produced, and then used those to further train the model. To be honest, synthetic data and self-instruct seem to be the future of feedback. One final interesting quote from the paper on safety, and that was an argument advanced by one of their red teamers.

They made the point that various scripts and code is readily available on mainstream public websites, hacking forums, or the web. And the advanced malware development is beyond the current capabilities of available LLMs. And even an advanced LLM paired with an expert malware developer is not particularly useful at the moment, as the barrier is not typically writing the malware code itself.

Let me know what you think in the comments. But we must move on to Seamless M4T released a couple of days ago from Meta, which frankly seems amazing for multilingual translation. That speech to text, speech to text, speech to text to speech, text to text, and more. It has speech recognition for nearly 100 languages and can output in 36 languages.

But there's one feature I find particularly cool. Now, let's talk about code switching. Code switching happens when a multilingual speaker switches between languages while they're speaking. Our model Seamless M4T automatically recognizes and translates more than one language when mixed in the same sentence. As a multilingual speaker, this is a very exciting capability for me.

I often switch from Hindi to Telugu when I speak with my dad. Notice in the following example when I change languages. I can speak Hindi, Telugu, and English. Sometimes, I can use all three languages in one sentence. I can speak Hindi, Telugu, and English. Sometimes, I use English. Sometimes, I use English.

Sometimes, I use all three languages in one conversation. Speaking of cool though, we had this epic story out yesterday. AI gave a paralyzed woman her voice back. In a moment, you're going to see her being plugged in to the model. There we go. And the short version is that this woman suffered a stroke that left her unable to speak.

But now, for the first time, her speech and facial expressions can be synthesized from her brain signals. Decoding these signals into text at nearly 80 words per minute, up from 14 words per minute. But let's now end on this, an 88-page report on consciousness in artificial intelligence, which counts as one of its co-authors, Yoshua Bengio, the Turing Award winner.

It was dense and quite technical, but well worth the read. Look at this sentence in just the abstract. "Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to build AI systems which satisfy these indicators." These are the indicators and each one gets a few pages in the report.

And the reason that they're split up is because each one rests on a certain theory of consciousness. Obviously, the key problem is that we don't have a consensus theory on what consciousness is or how it comes about. So in a way, to hedge their bets, they group in different theories and look at the kind of indicators that would satisfy each one.

Let's say that list seems so theoretical, why not just test the model or even ask the model? For more on that approach, see my theory of mind video. But the problem is, as they say on page four, the main alternative to a theory heavy approach is to use behavioral tests for consciousness.

But as I talked about in the other video, that method is unreliable because AI systems can be trained, of course they are, to mimic human behaviors, are working actually in very different ways. Essentially, LLMs are the most efficient way to manipulate human behavior. They can be used to manipulate the behavior of other people.

They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior.

They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior. They can be used to manipulate other people's behavior.

9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper

Transcript