back to indexWhat's next for Semantic Router (v1 update)

Chapters
0:0 Semantic Router
1:13 Keeping Semantic Router Lightweight
3:32 Current State of SR v1
3:58 Modular Routers Encoders and Indexes
6:27 Semantic Router Synchronization
10:30 Full Async Support
12:7 HybridRouter Upgrades
12:46 New Semantic Router Integrations
13:55 Testing and Doc Upgrades
16:4 Getting Started with v1
00:00:00.000 |
Today I wanted to give you a update on mainly a progress of Semantic Router or the V1 major 00:00:09.780 |
Now I know I've been a little quieter on YouTube recently. 00:00:14.440 |
A lot of that is in part due to Semantic Router, this new release. 00:00:19.020 |
In part due to a lot of other projects that are coming, but I'm not quite ready to start 00:00:26.680 |
But the V1 of Semantic Router plus these other projects has made me pretty busy and just 00:00:32.920 |
unable to do all that much in terms of videos. 00:00:36.640 |
But when I do get to release V1, that will be huge and I'll explain why in a minute. 00:00:44.240 |
And also when I get to actually finally start talking about these other projects, they are 00:00:50.960 |
And altogether they're probably going to change a lot. 00:00:55.160 |
In fact, they've already changed a lot for me in terms of how I'm building AI stuff, 00:01:04.680 |
And so yeah, with that, I am pretty excited to actually start talking about that very 00:01:10.000 |
Anyhow, I want to talk about the V1 release of Semantic Router. 00:01:14.000 |
Since the inception of Semantic Router, a lot of what has been built into the library 00:01:22.360 |
has been driven by requirements, whether that's my requirements, my team's requirements, or 00:01:29.240 |
the community's requirements for particular features that we've needed to add. 00:01:34.680 |
That can be things like encoders, indexes, routers, but also a lot of other smaller things 00:01:41.920 |
like the dynamic routes, for example, where we trigger a route and then generate essentially 00:01:55.480 |
And that has been quite useful in building Semantic Router out into a library that does 00:02:01.200 |
things that you need it to do and not adding a load of fluff, which I think is important. 00:02:06.260 |
Even though I do think we have a little more fluff than I would like, and part of the V1 00:02:12.400 |
But anyhow, there has been a lot of organic growth. 00:02:16.520 |
And because of that, I find that there's a lot of things now that have needed to be refactored 00:02:23.040 |
and cleaned up so that Semantic Router continues to be this, you know, very concise code base 00:02:35.120 |
And I want us to really get to a point where if you want to go and add an index, a new 00:02:40.300 |
index to Semantic Router, you only have to modify a few methods, you know, maybe just 00:02:48.120 |
two call methods, your synchronous call method, your async call method. 00:02:52.320 |
You modify both of those, you probably set up the connection to whatever index that is. 00:02:57.960 |
And then you just add a line within testing scripts, say, okay, there's a new index point 00:03:04.880 |
Of course, add your new dependency groups within the poetry files. 00:03:11.520 |
That's all you should really need to do to be able to add a new index. 00:03:15.880 |
In many cases, of course, some cases are a bit more complicated, but that's the ideal. 00:03:22.320 |
And a big part of V1 is kind of marching towards that and just make it much simpler to extend 00:03:32.120 |
So I'll share this, I'll share my screen, this article, which kind of just outlines 00:03:42.800 |
Like what are the main focus points, the objectives of this V1 release? 00:03:48.880 |
And where are we in achieving those objectives? 00:03:52.520 |
The majority of these that we're getting, again, like I said, we're getting pretty close 00:03:58.720 |
So the modular routers, encoders, and indexes. 00:04:09.520 |
New encoders are actually already relatively easy to add, but we want to make it easier. 00:04:19.940 |
So the routers were previously called route layers. 00:04:24.000 |
And the default route layer was the only router that was actually truly supported for everything 00:04:35.840 |
Everything was kind of built around that core. 00:04:40.520 |
But there are other routers, well, there's one other router, which is a hybrid router. 00:04:45.160 |
That's been in there for a long time, like really since very close to the beginning of 00:04:51.820 |
Because routers were just not abstracted enough. 00:04:57.440 |
So actually the PR that I'm working on now is to do that. 00:05:04.740 |
Actually they basically are already within that PR. 00:05:09.600 |
It's already abstracted, everything is there. 00:05:12.640 |
It's working through some testing issues, which are more to do with the test rather 00:05:23.240 |
And one also big part that I do mention here is that we do plan to have many other routers 00:05:29.160 |
Like, there's a lot of stuff we can do with routers. 00:05:33.000 |
Actually probably, I would say it's probably one of the parts of the library with the highest 00:05:40.040 |
And having this, these modular routers, that is going to unlock the ability for us to begin 00:05:48.000 |
adding many other routers very easily without, this is also another very important part, 00:05:56.560 |
we don't just add things for the sake of adding things, because you just get bloated, right? 00:06:02.520 |
A big part of this is keeping it as simple as possible. 00:06:06.040 |
My goal with this is to remove more code than I add, at least within the actual code base, 00:06:12.360 |
I know even within the tests, I would like to try and reduce the amount of code there. 00:06:27.940 |
Synchronization logic, this has been a big one, and this is something that came from 00:06:30.920 |
us implementing semantic router in production, right? 00:06:34.960 |
So in production, a lot of the time we're using, for example, the Pinecone index, which 00:06:39.360 |
is a remote index, and there is a problem with remote indexes in semantic router. 00:06:47.080 |
They're a bit harder to use, because as soon as you have a remote index, you have this 00:06:52.920 |
remote instance of your data, and some of the metadata for that data that you have remotely 00:07:00.680 |
And that can cause issues in synchronizing between those instances, particularly if you 00:07:09.000 |
are treating one of those instances as like a source of truth. 00:07:12.360 |
So for example, if you have your remote index, you put all your data in there, and then you 00:07:20.520 |
have multiple deployments where you have multiple local instances, they all need to be sharing 00:07:28.760 |
that same local metadata that aligns with that remote index. 00:07:34.980 |
And particularly, like, for example, the Pinecone index, that can be kind of hard to, not hard 00:07:41.080 |
to do, it's not hard, but it can be slow to, if you go through and look at every single 00:07:47.340 |
record and make sure it's all synchronized, that's slow. 00:07:54.400 |
And you can't, you don't want things to be slow. 00:07:57.400 |
You can deal with it, we've been dealing with it, and it will work, but it's not ideal. 00:08:02.200 |
So actually a lot of the work I've been doing is on the synchronization component. 00:08:08.440 |
I would say the final thing that I really need to do specifically on synchronization 00:08:14.880 |
is one, I actually need to update this PR here, it links everything in here as well, 00:08:21.480 |
but then the final one is route level synchronization. 00:08:25.400 |
So synchronization now works, you can synchronize asynchronously. 00:08:29.600 |
We have lops between the remote and local instances, so that if, okay, synchronization 00:08:36.640 |
is happening in your remote index, remote instance, and let's say we have those multiple 00:08:43.280 |
local instances, that synchronization might be happening between, like, instance, local 00:08:51.800 |
And then local instance two is like, okay, now I need to sync, I need to sync this new 00:08:57.880 |
Right, it's not going to instantly do that, it's going to see, actually, your remote instance 00:09:03.360 |
is locked, so it's not going to be allowed to just jump into that synchronization immediately. 00:09:11.680 |
Which is actually important, especially as we begin to use semantic router in larger 00:09:18.600 |
projects, right, that's important with multiple instances all over the place. 00:09:23.400 |
Async support for synchronizing, that's basically done, although there is, I think there was 00:09:28.800 |
one, there's probably one other method that I do need to, might actually already be done, 00:09:34.760 |
I think it is already done, sorry, but it's not ready for the hybrid router yet. 00:09:40.320 |
So that will actually be coming here, which is the current PR I'm working on, right? 00:09:44.720 |
We also have this as well, this can be probably pretty useful if you want to go through it, 00:09:50.280 |
so we have this syncing routes notebook, which explains everything. 00:09:56.880 |
I will say all of this as well, it's in a dev version of the library, so if you do want 00:10:02.000 |
to install this and start using it, right, you would use this. 00:10:05.720 |
Right now we're on dev five, very soon once I get that hybrid router PR out, it will be 00:10:14.800 |
If you do like a direct PIP install, because these are dev branches, PIP will not install 00:10:19.560 |
this for you, right, PIP will go ahead and install this 0.0.72, which is already kind 00:10:27.720 |
Okay, so that is synchronization, full async support, this is just super important for 00:10:35.800 |
With AI in particular, if you're using state-of-the-art language models, you spend a lot of your time 00:10:41.360 |
waiting for API responses, and if you are writing your code, if you're writing synchronous 00:10:46.840 |
code, your code is just waiting, right, most of that time, so your Python code, okay, gets 00:10:53.600 |
to the point, it sends a request to your LLM, and then you're waiting like three seconds 00:10:57.560 |
to get everything back and start going again. 00:11:00.300 |
With async code, when done properly, you send your request to your LLM, and then your Python 00:11:07.980 |
code is basically free to do whatever it needs to do during those three seconds before it 00:11:12.620 |
gets a response again, right, so your Python code can go start doing other stuff during 00:11:18.180 |
that time, and of course, when you think about scaling AI applications, I think you need 00:11:26.500 |
async, like I don't, I mean, unless you're doing a ton of multi-processing for everything 00:11:32.700 |
all the time, which just isn't efficient, you do need async, I don't know how else you 00:11:42.380 |
would do it, so, yeah, full async support is one thing that we've been very keen to 00:11:49.060 |
get, and it's there, like, it is in there, it is, it's in the semantic router, it's in 00:11:55.520 |
the Python index, one of the important ones, it's one of the main remote indexes, so that 00:12:02.240 |
is basically there, I'm just adding that support to the hybrid router as well now. 00:12:07.300 |
Then we have upgrading the hybrid router, I kind of mentioned this, yeah, in progress, 00:12:13.100 |
that's been a fairly, fairly big one, and aligning routers, right, so there were a lot 00:12:18.740 |
of methods that were specific to either hybrid router or semantic router, they've mostly 00:12:24.820 |
been cleaned up a lot now. There is also one, I think basically the final thing that I have 00:12:31.460 |
on this PR is just getting the fit and the eval methods working for the hybrid router, 00:12:36.780 |
everything else is basically there. Again, there's test things I'm working through, they're 00:12:43.340 |
absolute nuisance, but we're getting there. And then one of the things that has suffered, 00:12:49.360 |
I will say, with the focus on V1 is that there are a few PRs out there that have just kind 00:12:57.140 |
of been out there for a while, and basically we just need to get to those as soon as V1 00:13:03.660 |
is ready. That does mean, for example, especially with the MOVIS index, there's just going to 00:13:09.500 |
be a bit of work on my side to get all of that kind of up to date with the new V1 setup, 00:13:18.700 |
but that is the priority, as soon as V1 is basically ready, I may even release V1 before 00:13:25.940 |
I integrate this, so it may just be that this is going to be part of like a V1.01 or something, 00:13:32.580 |
I'm not 100% sure, we'll see. So just getting these into the library, there's a few others 00:13:38.720 |
as well. There's like a Yandex index, I think, or Yandex GPT. I honestly have never used 00:13:47.140 |
those services, I never even heard of them before I saw that PR. Yeah, if anyone is using 00:13:51.740 |
those, let me know, just so I understand how popular those actually are. Then testing and 00:13:57.300 |
dubs, right? This is actually such a boring one, but I think just so important, like incredibly 00:14:06.220 |
important, so obviously dubs, that's self explanatory, right? People need to know how 00:14:11.980 |
to use your library, right? And this is something that we've been relatively poor on, to be 00:14:16.380 |
fair, other than maybe these videos, and we do have a lot of like notebook examples within 00:14:21.700 |
the repo, which I think are pretty useful, like if someone wants to know how to, if someone 00:14:25.460 |
asks me how to do something, I usually send them a notebook and then they know how to 00:14:29.660 |
do it, which I think is great. But I think you need a more structure in finding that 00:14:36.060 |
information and also a doc site so that you can just go into Google and say, okay, how 00:14:39.580 |
do I do this? So that is something that we've been doing as well. So we do actually have 00:14:45.780 |
a doc site that is there, it just needs more work, that's all, right? So yeah, you have 00:14:53.780 |
a few sort of almost like article type things here, we have synchronizing the route layer 00:14:59.540 |
with indexes, which kind of covers a lot of stuff I mentioned, and then you have like 00:15:04.580 |
the API reference, which goes through everything. Again, more work is needed there, but it's 00:15:10.380 |
definitely a step in the right direction. Then yes, merge your routes, that's what I'm 00:15:16.300 |
doing now in that PR. Update all Jupyter notebooks for V1, also in progress. And then full mark 00:15:24.460 |
of Pinecone index. Pinecone index is just such a nightmare in the test. And it's also 00:15:30.340 |
just hard to, it's hard to fully mock Pinecone. So that's just, that's its own thing that 00:15:37.460 |
needs doing. That is going to make a big difference as well. So I would say a lot of the testing 00:15:43.500 |
stuff has been done to some degree already. A lot of the modularity of tests, a lot of 00:15:48.420 |
the cleanup of tests has been done. Still more to do. I would say also some stuff that 00:15:53.380 |
I'm missing here is probably mocking other encoders and indexes. I think that needs to 00:15:57.900 |
be done a little bit, but for the most part they're ready. And yeah, I mean that is it 00:16:03.500 |
for the current state of V1 of Symantec Router. As I mentioned, and as you can probably see, 00:16:10.220 |
there's a lot of stuff in there that's coming. You can already test it out, and there are 00:16:15.020 |
some docs that have been updated already for V1. You can see those in here. So for example, 00:16:24.060 |
this one, Pinecone Sync Routes, and all of this code here should work with the latest 00:16:28.980 |
version of the library or the V1 version of the library. So you can actually start going 00:16:32.420 |
through that and testing it as well, but it's still in the dev state. There's still work 00:16:38.780 |
being done. So be aware of that. There will be some things that are weird, but for the 00:16:44.020 |
most part, we're very close. I think my goal is to have this done within the next few weeks, 00:16:51.820 |
and then I can start talking about it and sharing some of the cool new stuff that we 00:16:56.020 |
So yeah, I mean that is it for the update. I won't go on any more about it. If anyone 00:17:04.100 |
has any input on any of this stuff, anything that you feel like we're missing, we should 00:17:08.940 |
cover in the V1 release or beyond, that's completely fine as well. Do let me know, but 00:17:17.860 |
for now I'll leave it there. So thank you very much for watching. I hope it's been useful, 00:17:22.100 |
especially for those of you that are using Symantec Router. And yeah, thank you. I'll