Skip to content

refactor: count items from hnsw index #1765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

shobhit9957
Copy link

Fix: #1764

Get the number of documents using the HNSW method.
"""
# Access the first HNSW index from self._hnsw_indices
first_hnsw_index = self._hnsw_indices[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a dictionary, not a list, so u should not access it like this

@JoanFM
Copy link
Member

JoanFM commented Aug 25, 2023

Also, please remember that you need to signofff every commit so that we csn merge the PR

@codecov
Copy link

codecov bot commented Aug 25, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -1.05% ⚠️

Comparison is base (587ab5b) 83.74% compared to head (13a9ebf) 82.70%.
Report is 19 commits behind head on main.

❗ Current head 13a9ebf differs from pull request most recent head 436937b. Consider uploading reports for the commit 436937b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1765      +/-   ##
==========================================
- Coverage   83.74%   82.70%   -1.05%     
==========================================
  Files         134      134              
  Lines        8845     8850       +5     
==========================================
- Hits         7407     7319      -88     
- Misses       1438     1531      +93     
Flag Coverage Δ
docarray 82.70% <100.00%> (-1.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
docarray/index/backends/hnswlib.py 60.70% <100.00%> (-13.58%) ⬇️

... and 16 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shobhit9957
Copy link
Author

Hey, i'm so sorry joan. for the mistakes... I'll remember to sign off my every commit from now. and i'll try to solve this issue and commit again(signed off). Thanks Joan!

@JoanFM
Copy link
Member

JoanFM commented Aug 25, 2023

Hey, i'm so sorry joan. for the mistakes... I'll remember to sign off my every commit from now. and i'll try to solve this issue and commit again(signed off). Thanks Joan!

Nothing to be sorrry about. This is about Open Source and supporting each other

first_hnsw_index = self._hnsw_indices[first_index_key]
hnsw_num_docs = (
first_hnsw_index.get_document_count()
) # Replace with actual method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is not anymore needed

@shobhit9957
Copy link
Author

Hey Joan, could you please check the commit, and if there are no issues in it, can you please merge it Joan. If there are any issues , please let me know Joan. I'll try to solve it.

@JoanFM JoanFM changed the title Fix: #1764 , assigned the first indices. refactor: count items from hnsw index Aug 28, 2023
@JoanFM
Copy link
Member

JoanFM commented Aug 28, 2023

Hey Joan, could you please check the commit, and if there are no issues in it, can you please merge it Joan. If there are any issues , please let me know Joan. I'll try to solve it.

The tests are failing as seen here https://github.com/docarray/docarray/actions/runs/5978323747/job/16260526492?pr=1765

I believe the method you use does not exist, please check what is the appropiate method name

@@ -435,7 +435,7 @@ def _get_num_docs_hnsw(self) -> int:
"""
first_index_key = next(iter(self._hnsw_indices)) # Get the first key
first_hnsw_index = self._hnsw_indices[first_index_key]
hnsw_num_docs = first_hnsw_index.get_document_count()
hnsw_num_docs = first_hnsw_index.element_count()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not a method, it is an attribute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey joan I've fixed this....can you please say me what is the issue again...so that the tests are failing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check what is faling here https://github.com/docarray/docarray/actions/runs/6003026515/job/16306992708?pr=1765

And try to run the tests urself

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okey, I see that the test asserts the number of docs after deletion, and this HNSWlib element_count may not count deleted

@JoanFM
Copy link
Member

JoanFM commented Aug 30, 2023

@shobhit9957 ,

I have realised about some challenges of this implementation, would u be able to join our discord to have better way to discuss how to tackle them?

https://discord.gg/gWEbQNcg

@shobhit9957
Copy link
Author

hey thanks joan, for the invitation of the discord, for sure I'll join joan. Thanks joan! let's fix this issue!

Signed-off-by: Shobhit Kumar <[email protected]>
@@ -438,6 +438,14 @@ def _get_num_docs_hnsw(self) -> int:
hnsw_num_docs = first_hnsw_index.element_count
return hnsw_num_docs

def _count_docs_in_hnswlib(self) -> int:
# Use HNSWlib's method to count documents
return self._hnsw_indices.element_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember from last iteration, this is a dictionary, not an index itself, u need to get the first index in the dictionary

Signed-off-by: Shobhit Kumar <[email protected]>
return self._num_docs

def is_index_empty(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method needs to be defined at parent class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this method here as it is not needed now

return self._num_docs

def is_index_empty(self) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this method here as it is not needed now

@@ -428,9 +426,17 @@ def num_docs(self) -> int:
Get the number of documents.
"""
if self._num_docs == 0:
self._num_docs = self._get_num_docs_sqlite()
self._num_docs = self._get_num_docs_hnsw()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this change

@@ -428,9 +426,17 @@ def num_docs(self) -> int:
Get the number of documents.
"""
if self._num_docs == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this call to use the new method

:return: True if the index is empty, False otherwise.
"""
return self.num_docs() == 0
if self.is_index_empty == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not make sense, is_index_empty should not be compared to 0, it already gives a boolean

@JoanFM JoanFM closed this Sep 20, 2023
@JoanFM
Copy link
Member

JoanFM commented Sep 20, 2023

Let's take over again when I merge the first step of this epic #1801

@shobhit9957
Copy link
Author

Okay!, so now should I commit on #1801 ?

@JoanFM
Copy link
Member

JoanFM commented Sep 22, 2023

Okay!, so now should I commit on #1801 ?

No, you should create a new PR with the next steps.

@shobhit9957
Copy link
Author

Okay!, got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants