-
Notifications
You must be signed in to change notification settings - Fork 233
feat: filtering in hnsw #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: filtering in hnsw #1718
Conversation
Signed-off-by: jupyterjazz <[email protected]>
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1718 +/- ##
==========================================
- Coverage 85.54% 84.49% -1.06%
==========================================
Files 133 133
Lines 8608 8733 +125
==========================================
+ Hits 7364 7379 +15
- Misses 1244 1354 +110
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Signed-off-by: jupyterjazz <[email protected]>
Signed-off-by: jupyterjazz <[email protected]>
Signed-off-by: jupyterjazz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it in general. Do we have an idea on the amount of extra space it takes to index all these into SQLIte
? Should we have a DBConfig
entry abou tthe fields that are to be filterable?
}, | ||
# `None` is not a Type, but we allow it here anyway | ||
None: {}, # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really know why it was here, but why removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's because python_type_to_db_type function was returning None for types other than vector, but I changed it with defaultdict which returns an empty dict by default
@@ -206,6 +217,15 @@ def python_type_to_db_type(self, python_type: Type) -> Any: | |||
if safe_issubclass(python_type, allowed_type): | |||
return np.ndarray | |||
|
|||
type_map = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we must be clear on what items are filterable and which are not. To be mindful when documenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll update docs here #1678
Signed-off-by: jupyterjazz <[email protected]>
📝 Docs are deployed on https://ft-feat-hnsw-prefiltering--jina-docs.netlify.app 🎉 |
Support Filtering for
HnswDocumentIndex
, separately as well as inside the query builderMain changes:
id
,doc_blob
, and fields that can be filtered)filter
function which does not touch hnsw and executes filtering on the sqlite tableNotes:
query builder became quite complex even tho I tried decomposing it as much as possible.
i think codecov has some problems?