Skip to content

feat: update for inmemory index #1724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 24, 2023
Merged

feat: update for inmemory index #1724

merged 2 commits into from
Jul 24, 2023

Conversation

jupyterjazz
Copy link
Contributor

Implements Update for InMemoryExactNNIndex by creating a mapping between document IDs and their corresponding positions in the DocList.

In terms of performance, we see significant ~130x (from 0.0266s to 0.0002s) improvement in get, and a slight increase in the index time - approx. from 3.50s to 3.57s; Other operation times stayed very similar. Experiment was made on 200 000 docs, vector dim 128.

@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

Patch coverage: 46.15% and project coverage change: -0.04 ⚠️

Comparison is base (410665a) 85.57% compared to head (6ace117) 85.54%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1724      +/-   ##
==========================================
- Coverage   85.57%   85.54%   -0.04%     
==========================================
  Files         133      133              
  Lines        8592     8608      +16     
==========================================
+ Hits         7353     7364      +11     
- Misses       1239     1244       +5     
Flag Coverage Δ
docarray 85.54% <46.15%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/index/backends/in_memory.py 45.45% <46.15%> (+1.93%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@JoanFM
Copy link
Member

JoanFM commented Jul 24, 2023

Can you also add how much more memory is used? I am not sure this is desired.

@jupyterjazz
Copy link
Contributor Author

jupyterjazz commented Jul 24, 2023

@JoanFM

Not much change in terms of memory too.

Here's the experiment code

from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc
from docarray.typing import NdArray
import numpy as np
import tracemalloc

tracemalloc.start()

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[128]


data = [MyDoc(text=f'text {i}', embedding=np.random.rand(128)) for i in range(200000)]
doc_index = InMemoryExactNNIndex[MyDoc]()

doc_index.index(data)

docs, scores = doc_index.find(data[0], search_field='embedding')

if data[10] in doc_index:
    print('wohoo')

ids_to_get = [data[200].id, data[250].id, data[350].id]
docs = doc_index[ids_to_get]

ids_to_del = [data[200].id, data[250].id, data[350].id]
del doc_index[ids_to_del]


print(tracemalloc.get_traced_memory())

tracemalloc.stop()

Before change prints (768773718, 1032141402) showing current memory usage and max memory usage
After change - (784741328, 1048109012)
So there's ~2% increase

@github-actions
Copy link

📝 Docs are deployed on https://ft-feat-inmemory-update--jina-docs.netlify.app 🎉

@JoanFM JoanFM merged commit 7ad70bf into main Jul 24, 2023
@JoanFM JoanFM deleted the feat-inmemory-update branch July 24, 2023 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: support update for inmemory index
2 participants