Skip to content

feat: support redis #1550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 10, 2023
Merged

feat: support redis #1550

merged 36 commits into from
Jul 10, 2023

Conversation

jupyterjazz
Copy link
Contributor

@jupyterjazz jupyterjazz commented May 17, 2023

Redis as a Document Index Backend

Simple usage:

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[10]

docs = [MyDoc(text=f'text {i}', embedding=np.random.rand(10)) for i in range(10)]
query = np.random.rand(10)
db = RedisDocumentIndex[MyDoc](host='localhost')
db.index(docs)
results = index.find(query, search_field='embedding', limit=10)

Supported functionality:

  • Find (vector search)
  • Filter (filtering on textual and numeric data using redis syntax)
  • Text Search (text search using various approaches, e.g. BM25..)
  • Get/Del
  • Hybrid Search (currently, only combining find and filter)
  • Subindex

Signed-off-by: jupyterjazz <[email protected]>
@jupyterjazz jupyterjazz changed the title [wip ]feat: support redis feat: support redis May 17, 2023
@jupyterjazz jupyterjazz marked this pull request as draft May 17, 2023 14:25
jupyterjazz and others added 5 commits June 12, 2023 16:11
Signed-off-by: jupyterjazz <[email protected]>
Signed-off-by: jupyterjazz <[email protected]>
Signed-off-by: jupyterjazz <[email protected]>
@github-actions github-actions bot added size/xl and removed size/m labels Jun 28, 2023
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

github-actions bot commented Jul 5, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jul 5, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Copy link
Member

@JohannesMessner JohannesMessner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments, but looking good!


:return: Number of documents in the index.
"""
num_docs = self._client.ft(self._index_name).info()['num_docs']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Maybe I'm wrong, but it seems like this 'num_docs' key is not something that is built into redis. Do we manually keep track of this somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, num_docs is included in the index info

but apparently there's a better way to get number of docs
return self._client.dbsize()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think we should stick with the old way because dbsize returns number of keys the client stores, combining all indices so it's inaccurate if you have more than one index


ids = [self._prefix + id for id in doc_ids]
docs = self._client.json().mget(ids, '$')
docs = [doc[0] for doc in docs if doc]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
docs = [doc[0] for doc in docs if doc]
docs = (doc[0] for doc in docs if doc)

I think this avoids doing an iteration over the data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but the abstract class expects a list and not a generator, we can refactor that in a separate pr maybe

Comment on lines +596 to +609
def __contains__(self, item: BaseDoc) -> bool:
"""
Checks if a given document exists in the index.

:param item: The document to check.
It must be an instance of BaseDoc or its subclass.
:return: True if the document exists in the index, False otherwise.
"""
if safe_issubclass(type(item), BaseDoc):
return self._doc_exists(item.id)
else:
raise TypeError(
f"item must be an instance of BaseDoc or its subclass, not '{type(item).__name__}'"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worthy of a separete PR, but I think we should move __contains__ to the abstract class and tell backends to implement _doc_exitsts().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

github-actions bot commented Jul 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jul 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

1 similar comment
@github-actions
Copy link

github-actions bot commented Jul 6, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

Signed-off-by: jupyterjazz <[email protected]>
@github-actions
Copy link

github-actions bot commented Jul 9, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jul 9, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@jupyterjazz jupyterjazz requested a review from JoanFM July 9, 2023 12:02
@github-actions
Copy link

github-actions bot commented Jul 9, 2023

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

github-actions bot commented Jul 9, 2023

📝 Docs are deployed on https://ft-feat-add-redis--jina-docs.netlify.app 🎉

@JoanFM JoanFM merged commit 069aa3a into main Jul 10, 2023
@JoanFM JoanFM deleted the feat-add-redis branch July 10, 2023 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add redis as a document index backend
3 participants