-
Notifications
You must be signed in to change notification settings - Fork 233
refactor: hnswlib performance #1727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Joan Fontanals Martinez <[email protected]>
…ctor-hnswlib-performance
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1727 +/- ##
==========================================
- Coverage 84.46% 84.45% -0.01%
==========================================
Files 134 134
Lines 8798 8834 +36
==========================================
+ Hits 7431 7461 +30
- Misses 1367 1373 +6
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Signed-off-by: Joan Fontanals Martinez <[email protected]>
f7bd4d0
to
3f56d44
Compare
Signed-off-by: Joan Fontanals Martinez <[email protected]>
4f6c270
to
f8bb5f9
Compare
Signed-off-by: Joan Fontanals Martinez <[email protected]>
…ctor-hnswlib-performance
…ay/docarray into refactor-hnswlib-performance Signed-off-by: Joan Fontanals Martinez <[email protected]>
3b0922a
to
79b0fea
Compare
Signed-off-by: Joan Fontanals Martinez <[email protected]>
…into refactor-hnswlib-performance
…ctor-hnswlib-performance
fdfcb56
to
74a187e
Compare
Signed-off-by: Joan Fontanals Martinez <[email protected]>
74a187e
to
8c0bf6c
Compare
📝 Docs are deployed on https://ft-refactor-hnswlib-performance--jina-docs.netlify.app 🎉 |
@@ -332,7 +340,19 @@ def _filter( | |||
limit: int, | |||
) -> DocList: | |||
rows = self._execute_filter(filter_query=filter_query, limit=limit) | |||
return DocList[self.out_schema](self._doc_from_bytes(blob) for _, blob in rows) # type: ignore[name-defined] | |||
hashed_ids = [doc_id for doc_id, _ in rows] | |||
embeddings: OrderedDict[str, list] = OrderedDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why specifically OrderedDict? I think normal dict in python will already be ordered by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still like to be specific about the requirement for order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any measure on how much faster hnsw works now?
@@ -127,7 +132,12 @@ def __init__(self, db_config=None, **kwargs): | |||
else: | |||
self._hnsw_indices[col_name] = self._create_index(col_name, col) | |||
self._logger.info(f'Created a new index for column `{col_name}`') | |||
if self._hnsw_indices[col_name].space == 'cosine': | |||
cosine_metric_index_exist = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we care about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because for cosine
, HNSWLib normalizes
the vectors, and then if we retrieve, they have chanded, so no consistent API can be provided
We have observed around 10% improvement |
No description provided.