feat: update for inmemory index #1724

jupyterjazz · 2023-07-24T06:35:40Z

Implements Update for InMemoryExactNNIndex by creating a mapping between document IDs and their corresponding positions in the DocList.

In terms of performance, we see significant ~130x (from 0.0266s to 0.0002s) improvement in get, and a slight increase in the index time - approx. from 3.50s to 3.57s; Other operation times stayed very similar. Experiment was made on 200 000 docs, vector dim 128.

Signed-off-by: jupyterjazz <[email protected]>

codecov · 2023-07-24T06:42:09Z

Codecov Report

Patch coverage: 46.15% and project coverage change: -0.04 ⚠️

Comparison is base (410665a) 85.57% compared to head (6ace117) 85.54%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1724      +/-   ##
==========================================
- Coverage   85.57%   85.54%   -0.04%     
==========================================
  Files         133      133              
  Lines        8592     8608      +16     
==========================================
+ Hits         7353     7364      +11     
- Misses       1239     1244       +5

Flag	Coverage Δ
docarray	`85.54% <46.15%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
docarray/index/backends/in_memory.py	`45.45% <46.15%> (+1.93%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

JoanFM · 2023-07-24T06:45:27Z

Can you also add how much more memory is used? I am not sure this is desired.

jupyterjazz · 2023-07-24T06:55:53Z

@JoanFM

Not much change in terms of memory too.

Here's the experiment code

from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc
from docarray.typing import NdArray
import numpy as np
import tracemalloc

tracemalloc.start()

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[128]


data = [MyDoc(text=f'text {i}', embedding=np.random.rand(128)) for i in range(200000)]
doc_index = InMemoryExactNNIndex[MyDoc]()

doc_index.index(data)

docs, scores = doc_index.find(data[0], search_field='embedding')

if data[10] in doc_index:
    print('wohoo')

ids_to_get = [data[200].id, data[250].id, data[350].id]
docs = doc_index[ids_to_get]

ids_to_del = [data[200].id, data[250].id, data[350].id]
del doc_index[ids_to_del]


print(tracemalloc.get_traced_memory())

tracemalloc.stop()

Before change prints (768773718, 1032141402) showing current memory usage and max memory usage
After change - (784741328, 1048109012)
So there's ~2% increase

github-actions · 2023-07-24T06:59:43Z

📝 Docs are deployed on https://ft-feat-inmemory-update--jina-docs.netlify.app 🎉

feat: update for inmemory index

e327450

Signed-off-by: jupyterjazz <[email protected]>

github-actions bot added size/m area/core area/testing labels Jul 24, 2023

jupyterjazz linked an issue Jul 24, 2023 that may be closed by this pull request

feat: support update for inmemory index #1725

Closed

Merge branch 'main' into feat-inmemory-update

6ace117

jupyterjazz requested review from JoanFM, JohannesMessner and samsja July 24, 2023 06:59

JoanFM approved these changes Jul 24, 2023

View reviewed changes

JoanFM merged commit 7ad70bf into main Jul 24, 2023

JoanFM deleted the feat-inmemory-update branch July 24, 2023 09:10

jupyterjazz mentioned this pull request Aug 1, 2023

Release Notes v0.37.0 #1740

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: update for inmemory index #1724

feat: update for inmemory index #1724

Uh oh!

jupyterjazz commented Jul 24, 2023

Uh oh!

codecov bot commented Jul 24, 2023 •

edited

Loading

Uh oh!

JoanFM commented Jul 24, 2023

Uh oh!

jupyterjazz commented Jul 24, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Jul 24, 2023

Uh oh!

Uh oh!

feat: update for inmemory index #1724

feat: update for inmemory index #1724

Uh oh!

Conversation

jupyterjazz commented Jul 24, 2023

Uh oh!

codecov bot commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JoanFM commented Jul 24, 2023

Uh oh!

jupyterjazz commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 24, 2023

Uh oh!

Uh oh!

codecov bot commented Jul 24, 2023 •

edited

Loading

jupyterjazz commented Jul 24, 2023 •

edited

Loading