-
Notifications
You must be signed in to change notification settings - Fork 233
refactor: count items from hnsw index #1765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
Get the number of documents using the HNSW method. | ||
""" | ||
# Access the first HNSW index from self._hnsw_indices | ||
first_hnsw_index = self._hnsw_indices[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a dictionary, not a list, so u should not access it like this
Also, please remember that you need to signofff every commit so that we csn merge the PR |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1765 +/- ##
==========================================
- Coverage 83.74% 82.70% -1.05%
==========================================
Files 134 134
Lines 8845 8850 +5
==========================================
- Hits 7407 7319 -88
- Misses 1438 1531 +93
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
Hey, i'm so sorry joan. for the mistakes... I'll remember to sign off my every commit from now. and i'll try to solve this issue and commit again(signed off). Thanks Joan! |
Signed-off-by: Shobhit Kumar <[email protected]>
Nothing to be sorrry about. This is about Open Source and supporting each other |
docarray/index/backends/hnswlib.py
Outdated
first_hnsw_index = self._hnsw_indices[first_index_key] | ||
hnsw_num_docs = ( | ||
first_hnsw_index.get_document_count() | ||
) # Replace with actual method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment is not anymore needed
Signed-off-by: Shobhit Kumar <[email protected]>
Hey Joan, could you please check the commit, and if there are no issues in it, can you please merge it Joan. If there are any issues , please let me know Joan. I'll try to solve it. |
The tests are failing as seen here https://github.com/docarray/docarray/actions/runs/5978323747/job/16260526492?pr=1765 I believe the method you use does not exist, please check what is the appropiate method name |
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
@@ -435,7 +435,7 @@ def _get_num_docs_hnsw(self) -> int: | |||
""" | |||
first_index_key = next(iter(self._hnsw_indices)) # Get the first key | |||
first_hnsw_index = self._hnsw_indices[first_index_key] | |||
hnsw_num_docs = first_hnsw_index.get_document_count() | |||
hnsw_num_docs = first_hnsw_index.element_count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not a method, it is an attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey joan I've fixed this....can you please say me what is the issue again...so that the tests are failing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check what is faling here https://github.com/docarray/docarray/actions/runs/6003026515/job/16306992708?pr=1765
And try to run the tests urself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okey, I see that the test asserts the number of docs after deletion, and this HNSWlib element_count may not count deleted
Signed-off-by: Shobhit Kumar <[email protected]>
I have realised about some challenges of this implementation, would u be able to join our discord to have better way to discuss how to tackle them? |
hey thanks joan, for the invitation of the discord, for sure I'll join joan. Thanks joan! let's fix this issue! |
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
@@ -438,6 +438,14 @@ def _get_num_docs_hnsw(self) -> int: | |||
hnsw_num_docs = first_hnsw_index.element_count | |||
return hnsw_num_docs | |||
|
|||
def _count_docs_in_hnswlib(self) -> int: | |||
# Use HNSWlib's method to count documents | |||
return self._hnsw_indices.element_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remember from last iteration, this is a dictionary, not an index itself, u need to get the first index in the dictionary
Signed-off-by: Shobhit Kumar <[email protected]>
Signed-off-by: Shobhit Kumar <[email protected]>
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
return self._num_docs | ||
|
||
def is_index_empty(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method needs to be defined at parent class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this method here as it is not needed now
This reverts commit f12bf20.
Signed-off-by: Shobhit Kumar <[email protected]>
docarray/index/backends/hnswlib.py
Outdated
return self._num_docs | ||
|
||
def is_index_empty(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this method here as it is not needed now
docarray/index/backends/hnswlib.py
Outdated
@@ -428,9 +426,17 @@ def num_docs(self) -> int: | |||
Get the number of documents. | |||
""" | |||
if self._num_docs == 0: | |||
self._num_docs = self._get_num_docs_sqlite() | |||
self._num_docs = self._get_num_docs_hnsw() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this change
docarray/index/backends/hnswlib.py
Outdated
@@ -428,9 +426,17 @@ def num_docs(self) -> int: | |||
Get the number of documents. | |||
""" | |||
if self._num_docs == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change this call to use the new method
Signed-off-by: Shobhit Kumar <[email protected]>
:return: True if the index is empty, False otherwise. | ||
""" | ||
return self.num_docs() == 0 | ||
if self.is_index_empty == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does not make sense, is_index_empty should not be compared to 0, it already gives a boolean
Let's take over again when I merge the first step of this epic #1801 |
Okay!, so now should I commit on #1801 ? |
No, you should create a new PR with the next steps. |
Okay!, got it. |
Fix: #1764