LangChain’s Post

View organization page for LangChain, graphic

189,856 followers

1mo

🤝 Optimizing vector retrieval with advanced graph-based metadata techniques using LangChain and Neo4j 🤝 Text embeddings and vector similarity search help us find documents by understanding their meanings and how similar they are to each other. However, text embeddings aren’t as effective when sorting information based on specific criteria like dates or categories -- for example, if you need to find all documents created in a particular year or documents tagged under a specific category like “science fiction.” This is where metadata filtering or filtered vector search comes into play, as it can effectively handle those structured filters, allowing users to narrow their search results according to specific attributes. 💡 Metadata filtering followed by vector similarity search can increase the accuracy and relevance of the search results. Graph databases like Neo4j can store highly complex and connected structured data alongside unstructured data. Together, you can greatly narrow down the relevant document subset using a structured graph-based metadata filter. Read more in this blog post to learn how to implement graph-based metadata filtering using LangChain and Neo4j ➡ https://lnkd.in/gYUj4WEF

Graph-based metadata filtering for improving vector search in RAG applications

blog.langchain.dev

17 Comments

André Lindenberg

Fellow at Exxeta

1mo

interesting. would have stored the unstructured data as attribute to the corresponding node. what's the benefits you see storing unstructured data not as attributes, but as additional nodes (if any)?

2 Reactions

Ninad Jagtap

Helping businesses deliver complex GenAI Solution | Oxford MBA

1mo

This is very useful and a need for critical business use case - however, is there similar approach using NetworkX graph object. I have been trying to integrate my use case with networkx - creating a separate library - similar handy technique can help.

Rafael Aslanian

Mathematics & Data Science Student at City, University of London

1mo

You can do this simply in SurrealDB by storing metadata alongside your embeddings

Joaquin Bonifacino

Computer Systems Analyst | Advanced Bachelor of Systems Student at ORT | PONUS Founder | AI Developer at Promtior

1mo

So it would be similar to a self retriever?

Sascha Gering

The best way to predict the future is to create it.

1mo

Hannes Voigt ... spoke about this yesterday... ;-)

1 Reaction

Orçun Özdemir

MSc in Data Science and Society, Tilburg University | Data Scientist | Arctic Code Vault Contributor

1mo

Dr. Ben E. Kuzey

1 Reaction

Mehdi Roussaky

Manager Data Analytics & AI - Technology @PwC Luxembourg

1mo

Tudor Stoian … table of contents discussion.

Aniel Villegas

Artificial Intelligence Lead | Technical Lead at Cognitive Perception

1mo

María Elena Martínez-Manzanares

1 Reaction

Aniel Villegas

Artificial Intelligence Lead | Technical Lead at Cognitive Perception

1mo

Luis Ángel García Mata

See more comments

To view or add a comment, sign in

More Relevant Posts

Chinmay Maganur

Data Science Intern@CCDS | Data Scientist | Data Analyst | Machine Learning Engineer | NLP | LLM | SQL | AWS
1mo
Report this post
LLMs: Vector Databases vs. Traditional Databases Vector databases store data in the form of high-dimensional vector embeddings, which are dense numerical representations of unstructured data like text, images, or audio. These embeddings capture semantic relationships and allow for similarity searching based on vector distances (e.g., cosine similarity). This is fundamentally different from traditional databases that store data in structured rows and tables. In traditional relational databases, data retrieval is based on exact keyword matches or specific filters on structured fields. In contrast, vector databases retrieve data based on semantic similarity between the query vector and the stored vectors. This enables more flexible and context-aware searches, especially for unstructured data types like text or images. Let us consider an example, Traditional Relational Database: To find the "Harry Potter" book in a fantasy genre, you can search for the exact keywords "fantasy" and "Harry Potter" in the genre and title/summary fields. However, if the specific keywords are missing, you'd need to include multiple criteria like genre, year, author name, etc., making the query more complex and time-consuming. Vector Database: A vector database can convert the book's summary, title, and other relevant information into a vector embedding that captures the semantic meaning. Even if the keyword "Harry Potter" is missing, you can search for a "fantasy movie" that has the phrase "Expecto Patronum" (a famous spell from the Harry Potter series). The vector database will understand the semantic relationship between "Expecto Patronum" and the Harry Potter book, and it will be returned in the search results based on the similarity of the vector embeddings. #GenAI #LLMs #datascience #machinelearning #llmops

1 Comment
Like Comment
To view or add a comment, sign in
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
3w
Report this post
Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
12 Comments
Like Comment
To view or add a comment, sign in
Arnab Chattopadhyay

Exploring confluence of artificial intelligence, art, music, dancing, yoga and spirituality for wellness of one and all. YouTube Arnab Kumar, X, Instagram @arnabch01, Investor, Author, Dancer, Artist, AI, philanthropy.
2w
Report this post
Very good post🙏. It is not always either this or that imho. You need to analyze your dataset. Let's say you have a purely stochastic dataset versus a dataset where clusters of related data exists. Obviously the data structure changes. In some problems given the nature of data one may need to partition the problem itself. This is what I think. What do you think? #artificialintelligence #ai #llm https://lnkd.in/g-HYKG-P
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
2w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in
Mitch Haile

Predictive AI for developers - no data prep, no tool sprawl
2w
Report this post
Great summary. As I like to say, anyone can write down a vector. Writing down a good vector (embedding) that carries representation and defines a useful metric space is a different story.
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
2w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
1 Comment
Like Comment
To view or add a comment, sign in
Basant Singh

Looking for Roles(open work permit)| LLMs | NLP | Data Science
2w Edited
Report this post
Discussion ++ 1. As Vector dbs store approximated index into RAM and information into SSD/disk(not sure about it). 2. To speed up retrieval is it possible to store some vectors indexes in memory cache form to speed up inference. Suppose retrieval follows 80-20% principle.
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
2w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in
Jesse H.

M.S. in AI and Machine Learning | Data Scientist | Generative AI Researcher | ML Engineer
3w
Report this post
Great explanation of the different methods to keep data in a way ready for generative AI to use.
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
3w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in
Sudip Das

Microservices Engineer | Product Engineer |Digital Transformation|GenerativeAI
2w
Report this post
Customer experience agents will be the most beneficial with this approach , machine to machine communication will provide the best features on data collection and semantic understandings
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
3w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in
Kevin Doyle

Enterprise Sales
2w
Report this post
This is a very informative post on #vector database and #graph database for Retrieval Augmented Generation #RAG. Couchbases 7.6, which was just launched in March 2024, checks both of these boxes with AI and machine learning integration with Vector Search and LangChain integration, empowering developers to build more intelligent and responsive applications. Key to this version is the introduction of advanced Graph Traversal capabilities, opening new avenues for complex data relationships and network analysis. This enhancement enhances developer efficiency and seamlessly integrates RDBMS use cases with the agility and scalability of NoSQL. Check us out https://lnkd.in/gmSKvgkJ #vector #graph #capella #nosql #RAG #mlops
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
2w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
1 Comment
Like Comment
To view or add a comment, sign in
Osama Hussein

Certified AI Engineer | Machine Learning | Google Certified
2w
Report this post
Vector databases allow the ability to search for similarity (approximate nearest neighbor search) on the partitioned data chunks utilizing fuzzy string matching. While graph databases are constructed to model relationship between data entities, it is pretty much stronger and concise in its search, yet it can be disappointing as it can result in failure of retrieval if a difference in the lexicality is present. Now imagine a RAG application that is built on top of a database that utilzies both approaches? it shall have techhnically the best of both approaches, various databases could be found online that does just that such as FalkorDB
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
2w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in
Monika Kumar

Head of Data and Analytics | Creating Data Strategy Solutions for Smart AI Adoption || Driving Business Success Through Insightful Analytics
3w
Report this post
Interesting insight into which Database architecture is best suited to power innovative systems such as retrieval-augmented generation(RAG) systems - Vector or Graph databases? the Choice between the the two, can significantly impact performance and scalability. To further simply it, if speed and efficient retrieval are the top priorities, and the relationships between data points are relatively simple, vector databases are a strong choice. If a model is seeing intricate relationships and need to leverage them for complex retrieval within your RAG system, then graph databases might be a better fit. In many cases, a combination of both types of DB's might be beneficial -Using a relational database to store the main corpus of documents and their metadata, and a graph database to store and query the relationships between entities. Going beyond the post - An alternative to both the above can be a knowledge graph. They are used for storing and managing interconnected information. They combine elements of both vector and graph databases, offering efficient retrieval and the ability to capture rich semantic relationships. Knowledge graphs are becoming increasingly popular for large-scale RAG applications. #RAG #Amazech #LLM #AI #ML #CDO
Damien Benveniste Damien Benveniste is an Influencer

The ML Tech Lead | Founder @ TheAiEdge.io | Follow me to learn about Machine Learning Engineering, Machine Learning System Design, MLOps, and the latest techniques and news about the field
3w

Graph Databases should be the better choice for Retrieval Augmented Generation (RAG)! We have seen the debate RAG vs fine-tuning, but what about Vector databases vs Graph databases? In both cases, we maintain a database of information that an LLM can query to answer a specific question. In the case of vector databases, we partition the data into chunks, encode the chunks into vector representations using an LLM, and index the data by their vector representations. Once we have a question, we retrieve the nearest neighbors to the vector representation of the question. The advantage is the fuzzy matching of the question to chunks of data. We don't need to query a specific word or concept; we simply retrieve semantically similar vectors. The problem is that the retrieved data may contain a lot of irrelevant information, which might confuse the LLM. In the context of graphs, we extract the relationships between the different entities in the text, and we construct a knowledge base of the information contained within the text. An LLM is good at extracting that kind of triplet information: [ENTITY A] -> [RELATIONSHIP] -> [ENTITY B] For example: - A [cow] IS an [animal] - A [cow] EATS [plants] - An [animal] IS a [living thing] - A [plant] IS a [living thing] Once the information is parsed, we can store it in a graph database. The information stored is the knowledge base, not the original text. For information retrieval, the LLM needs to come up with an Entity query related to the question to retrieve the related entities and relationships. The retrieved information is much more concise and to the point than in the case of vector databases. This context should provide much more useful information for the LLM to answer the question. The problem is that the query matching needs to be exact, and if the entities captured in the database are slightly semantically or lexically different, the query will not return the right information. I know that there are databases that now merge the advantages of vector and graph databases. We can parse the entities and relationships, but we index them by their vector representations in a graph database. This way, the information retrieval can be performed using approximate nearest neighbor search instead of exact matching. -- 👉 Don't forget to subscribe to my ML newsletter: https://lnkd.in/g4iKyRmS --
Like Comment
To view or add a comment, sign in

189,856 followers

View Profile Follow

LangChain’s Post

More Relevant Posts

Explore topics