diff --git a/README.md b/README.md index ab23e1946..5be2f0e90 100644 --- a/README.md +++ b/README.md @@ -97,10 +97,10 @@ SELECT pgml.transform( ``` ## Tabular data -- [47+ classification and regression algorithms](https://postgresml.org/docs/api/sql-extension/pgml.train/) +- [47+ classification and regression algorithms](https://postgresml.org/docs/open-source/pgml/api/pgml.train) - [8 - 40X faster inference than HTTP based model serving](https://postgresml.org/blog/postgresml-is-8x-faster-than-python-http-microservices) - [Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second) -- [Horizontal scalability](https://github.com/postgresml/pgcat) +- [Horizontal scalability](https://postgresml.org/docs/open-source/pgcat/) **Training a classification model** @@ -142,7 +142,7 @@ docker run \ sudo -u postgresml psql -d postgresml ``` -For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/resources/developer-docs/quick-start-with-docker) documentation. +For more details, take a look at our [Quick Start with Docker](https://postgresml.org/docs/open-source/pgml/developers/quick-start-with-docker) documentation. # Getting Started @@ -1105,7 +1105,7 @@ pgml: SELECT logs->>'epoch' AS epoch, logs->>'step' AS step, logs->>'loss' AS lo During training, model is periodically uploaded to Hugging Face Hub. You will find the model at `https://huggingface.co//`. An example model that was automatically pushed to Hugging Face Hub is [here](https://huggingface.co/santiadavani/imdb_review_sentiement). ### 6. Inference using fine-tuned model -Now, that we have fine-tuned model on Hugging Face Hub, we can use [`pgml.transform`](https://postgresml.org/docs/introduction/apis/sql-extensions/pgml.transform/text-classification) to perform real-time predictions as well as batch predictions. +Now, that we have fine-tuned model on Hugging Face Hub, we can use [`pgml.transform`](/docs/open-source/pgml/api/pgml.transform) to perform real-time predictions as well as batch predictions. **Real-time predictions** @@ -1506,7 +1506,7 @@ Configuring these dataset arguments ensures that the model is trained on the app Once the fine-tuning is completed, you will see the model in your Hugging Face repository (example: https://huggingface.co/santiadavani/fingpt-llama2-7b-chat). Since we are using LoRA to fine tune the model we only save the adapter weights (~2MB) instead of all the 7B weights (14GB) in Llama2-7b model. ## Inference -For inference, we will be utilizing the [OpenSourceAI](https://postgresml.org/docs/use-cases/opensourceai) class from the [pgml SDK](https://postgresml.org/docs/api/client-sdk/getting-started). Here's an example code snippet: +For inference, we will be utilizing the [OpenSourceAI](https://postgresml.org/docs/open-source/korvus/guides/opensourceai) class from the [pgml SDK](https://postgresml.org/docs/open-source/korvus/). Here's an example code snippet: ```python import pgml diff --git a/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md b/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md index 1e0b3ec5f..b24297452 100644 --- a/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md +++ b/pgml-cms/blog/semantic-search-in-postgres-in-15-minutes.md @@ -56,7 +56,7 @@ SELECT pgml.embed('mixedbread-ai/mxbai-embed-large-v1', 'Generating embeddings i !!! -We used the [pgml.embed](/docs/api/sql-extension/pgml.embed) PostresML function to generate an embedding of the sentence "Generating embeddings in Postgres is fun!" using the [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model from mixedbread.ai. +We used the [pgml.embed](/docs/open-source/pgml/api/pgml.embed) PostresML function to generate an embedding of the sentence "Generating embeddings in Postgres is fun!" using the [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model from mixedbread.ai. The output size of the vector varies per model, and in `mxbai-embed-large-v1` outputs vectors with 1024 dimensions: each vector contains 1024 floating point numbers. diff --git a/pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md b/pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md index 56f836db3..3cd127dd9 100644 --- a/pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md +++ b/pgml-cms/blog/sentiment-analysis-using-express-js-and-postgresml.md @@ -24,7 +24,7 @@ Express is a mature JS backend framework touted as being fast and flexible. It i Sentiment analysis is a valuable tool for understanding the emotional polarity of text. You can determine if the text is positive, negative, or neutral. Common use cases include understanding product reviews, survey questions, and social media posts. -In this application, we'll be applying sentiment analysis to note taking. Note taking and journaling can be an excellent practice for work efficiency and self improvement. However, if you are like me, it quickly becomes impossible to find and make use of anything I've written down. Notes that are useful must be easy to navigate. With this motivation, let's create a demo that can record notes throughout the day. Each day will have a summary and sentiment score. That way, if I'm looking for that time a few weeks ago when we were frustrated with our old MLOps platform — it will be easy to find. +In this application, we'll be applying sentiment analysis to note taking. Note taking and journaling can be an excellent practice for work efficiency and self improvement. However, if you are like me, it quickly becomes impossible to find and make use of anything I've written down. Notes that are useful must be easy to navigate. With this motivation, let's create a demo that can record notes throughout the day. Each day will have a summary and sentiment score. That way, if I'm looking for that time a few weeks ago when we were frustrated with our old MLOps platform — it will be easy to find. We will perform all the Machine Learning heavy lifting with the pgml extension function `pgml.transform()`. This brings Hugging Face Transformers into our data layer. @@ -36,7 +36,7 @@ You can see the full code on [GitHub](https://github.com/postgresml/example-expr This app is composed of three main parts, reading and writing to a database, performing sentiment analysis on entries, and creating a summary. -We are going to use [postgresql-client](https://www.npmjs.com/package/postgresql-client) to connect to our DB. +We are going to use [postgresql-client](https://www.npmjs.com/package/postgresql-client) to connect to our DB. When the application builds we ensure we have two tables, one for notes and one for the the daily summary and sentiment score. @@ -62,7 +62,7 @@ const day = await connection.execute(` We also have three endpoints to hit: -* `app.get(“/", async (req, res, next)` which returns all the notes for that day and the daily summary. +* `app.get(“/", async (req, res, next)` which returns all the notes for that day and the daily summary. * `app.post(“/add", async (req, res, next)` which accepts a new note entry and performs a sentiment analysis. We simplify the score by converting it to 1, 0, -1 for positive, neutral, negative and save it in our notes table. ```postgresql @@ -146,8 +146,8 @@ not bad for less than an hour of coding. ### Final Thoughts -This app is far from complete but does show an easy and scalable way to get started with ML in Express. From here I encourage you to head over to our [docs](https://postgresml.org/docs/api/sql-extension/) and see what other features could be added. +This app is far from complete but does show an easy and scalable way to get started with ML in Express. From here I encourage you to head over to our [docs](https://postgresml.org/docs) and see what other features could be added. -If SQL is not your thing, no worries. Check out or [JS SDK](https://postgresml.org/docs/api/client-sdk/getting-started) to streamline all our best practices with simple JavaScript. +If SQL is not your thing, no worries. Check out or [JS SDK](https://postgresml.org/docs/open-source/korvus/) to streamline all our best practices with simple JavaScript. -We love hearing from you — please reach out to us on [Discord ](https://discord.gg/DmyJP3qJ7U)or simply [Contact Us](https://postgresml.org/contact) here if you have any questions or feedback. +We love hearing from you — please reach out to us on [Discord ](https://discord.gg/DmyJP3qJ7U)or simply [Contact Us](https://postgresml.org/contact) here if you have any questions or feedback. diff --git a/pgml-cms/blog/using-postgresml-with-django-and-embedding-search.md b/pgml-cms/blog/using-postgresml-with-django-and-embedding-search.md index 0ad6d6820..d37a0230f 100644 --- a/pgml-cms/blog/using-postgresml-with-django-and-embedding-search.md +++ b/pgml-cms/blog/using-postgresml-with-django-and-embedding-search.md @@ -28,7 +28,7 @@ PostgresML allows anyone to integrate advanced AI capabilities into their applic Advanced search engines like Google use this technique to extract the meaning of search queries and rank the results based on what the user actually _wants_, unlike simple keyword matches which can easily give irrelevant results. -To accomplish this, for each document in our app, we include an embedding column stored as a vector. A vector is just an array of floating point numbers. For each item in our to-do list, we automatically generate the embedding using the PostgresML [`pgml.embed()`](https://postgresml.org/docs/introduction/apis/sql-extensions/pgml.embed) function. This function runs inside the database and doesn't require the Django app to install the model locally. +To accomplish this, for each document in our app, we include an embedding column stored as a vector. A vector is just an array of floating point numbers. For each item in our to-do list, we automatically generate the embedding using the PostgresML [`pgml.embed()`](/docs/open-source/pgml/api/pgml.embed) function. This function runs inside the database and doesn't require the Django app to install the model locally. An embedding model running inside PostgresML is able to extract the meaning of search queries & compare it to the meaning of the documents it stores, just like a human being would if they were able to search millions of documents in just a few milliseconds. diff --git a/pgml-cms/docs/README.md b/pgml-cms/docs/README.md index 37b7ac1e1..ff9a697d1 100644 --- a/pgml-cms/docs/README.md +++ b/pgml-cms/docs/README.md @@ -23,16 +23,14 @@ PostgresML allows you to take advantage of the fundamental relationship between These capabilities are primarily provided by two open-source software projects, that may be used independently, but are designed to be used together with the rest of the Postgres ecosystem: -* [**pgml**](/docs/api/sql-extension/) - an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _inside_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs. -* [**PgCat**](/docs/product/pgcat/) - an open source connection pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the pooler, which handles load balancing, sharding, and failover, outside of any single database server. +* [**pgml**](/docs/open-source/pgml/) - an open source extension for PostgreSQL. It adds support for GPUs and the latest ML & AI algorithms _inside_ the database with a SQL API and no additional infrastructure, networking latency, or reliability costs. +* [**PgCat**](/docs/open-source/pgcat/) - an open source connection pooler for PostgreSQL. It abstracts the scalability and reliability concerns of managing a distributed cluster of Postgres databases. Client applications connect only to the pooler, which handles load balancing, sharding, and failover, outside of any single database server.
PostgresML architectural diagram
-To learn more about how we designed PostgresML, take a look at our [architecture overview](/docs/resources/architecture/). - ## Client SDK -The PostgresML team also provides [native language SDKs](/docs/api/client-sdk/) which implement best practices for common ML & AI applications. The JavaScript and Python SDKs are generated from the a core Rust library, which provides a uniform API, correctness and efficiency across all environments. +The PostgresML team also provides [native language SDKs](/docs/open-source/korvus/) which implement best practices for common ML & AI applications. The JavaScript and Python SDKs are generated from the a core Rust library, which provides a uniform API, correctness and efficiency across all environments. While using the SDK is completely optional, SDK clients can perform advanced machine learning tasks in a single SQL request, without having to transfer additional data, models, hardware or dependencies to the client application. diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index 59687e3e7..780b05a32 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -23,16 +23,7 @@ * [PGML](open-source/pgml/README.md) * [API](open-source/pgml/api/README.md) * [pgml.embed()](open-source/pgml/api/pgml.embed.md) - * [pgml.transform()](open-source/pgml/api/pgml.transform/README.md) - * [Fill-Mask](open-source/pgml/api/pgml.transform/fill-mask.md) - * [Question answering](open-source/pgml/api/pgml.transform/question-answering.md) - * [Summarization](open-source/pgml/api/pgml.transform/summarization.md) - * [Text classification](open-source/pgml/api/pgml.transform/text-classification.md) - * [Text Generation](open-source/pgml/api/pgml.transform/text-generation.md) - * [Text-to-Text Generation](open-source/pgml/api/pgml.transform/text-to-text-generation.md) - * [Token Classification](open-source/pgml/api/pgml.transform/token-classification.md) - * [Translation](open-source/pgml/api/pgml.transform/translation.md) - * [Zero-shot Classification](open-source/pgml/api/pgml.transform/zero-shot-classification.md) + * [pgml.transform()](open-source/pgml/api/pgml.transform.md) * [pgml.transform_stream()](open-source/pgml/api/pgml.transform_stream.md) * [pgml.deploy()](open-source/pgml/api/pgml.deploy.md) * [pgml.decompose()](open-source/pgml/api/pgml.decompose.md) @@ -40,14 +31,7 @@ * [pgml.generate()](open-source/pgml/api/pgml.generate.md) * [pgml.predict()](open-source/pgml/api/pgml.predict/README.md) * [Batch Predictions](open-source/pgml/api/pgml.predict/batch-predictions.md) - * [pgml.train()](open-source/pgml/api/pgml.train/README.md) - * [Regression](open-source/pgml/api/pgml.train/regression.md) - * [Classification](open-source/pgml/api/pgml.train/classification.md) - * [Clustering](open-source/pgml/api/pgml.train/clustering.md) - * [Decomposition](open-source/pgml/api/pgml.train/decomposition.md) - * [Data Pre-processing](open-source/pgml/api/pgml.train/data-pre-processing.md) - * [Hyperparameter Search](open-source/pgml/api/pgml.train/hyperparameter-search.md) - * [Joint Optimization](open-source/pgml/api/pgml.train/joint-optimization.md) + * [pgml.train()](open-source/pgml/api/pgml.train.md) * [pgml.tune()](open-source/pgml/api/pgml.tune.md) * [Guides](open-source/pgml/guides/README.md) * [Embeddings](open-source/pgml/guides/embeddings/README.md) @@ -56,11 +40,27 @@ * [Aggregation](open-source/pgml/guides/embeddings/vector-aggregation.md) * [Similarity](open-source/pgml/guides/embeddings/vector-similarity.md) * [Normalization](open-source/pgml/guides/embeddings/vector-normalization.md) + * [LLMs](open-source/pgml/guides/llms/README.md) + * [Fill-Mask](open-source/pgml/guides/llms/fill-mask.md) + * [Question answering](open-source/pgml/guides/llms/question-answering.md) + * [Summarization](open-source/pgml/guides/llms/summarization.md) + * [Text classification](open-source/pgml/guides/llms/text-classification.md) + * [Text Generation](open-source/pgml/guides/llms/text-generation.md) + * [Text-to-Text Generation](open-source/pgml/guides/llms/text-to-text-generation.md) + * [Token Classification](open-source/pgml/guides/llms/token-classification.md) + * [Translation](open-source/pgml/guides/llms/translation.md) + * [Zero-shot Classification](open-source/pgml/guides/llms/zero-shot-classification.md) + * [Supervised Learning](open-source/pgml/guides/supervised-learning/README.md) + * [Regression](open-source/pgml/guides/supervised-learning/regression.md) + * [Classification](open-source/pgml/guides/supervised-learning/classification.md) + * [Clustering](open-source/pgml/guides/supervised-learning/clustering.md) + * [Decomposition](open-source/pgml/guides/supervised-learning/decomposition.md) + * [Data Pre-processing](open-source/pgml/guides/supervised-learning/data-pre-processing.md) + * [Hyperparameter Search](open-source/pgml/guides/supervised-learning/hyperparameter-search.md) + * [Joint Optimization](open-source/pgml/guides/supervised-learning/joint-optimization.md) * [Search](open-source/pgml/guides/improve-search-results-with-machine-learning.md) * [Chatbots](open-source/pgml/guides/chatbots/README.md) - * [Supervised Learning](open-source/pgml/guides/supervised-learning.md) * [Unified RAG](open-source/pgml/guides/unified-rag.md) - * [Natural Language Processing](open-source/pgml/guides/natural-language-processing.md) * [Vector database](open-source/pgml/guides/vector-database.md) ## Embeddings are vectors diff --git a/pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md index 98c32b299..9d46c3848 100644 --- a/pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md +++ b/pgml-cms/docs/open-source/pgml/guides/embeddings/in-database-generation.md @@ -30,7 +30,7 @@ If you'd like to use a different model you can also provision dedicated resource ## Creating Embeddings -You can generate embeddings using [pgml.embed(model_name, text)](../../api/sql-extension/pgml.embed.md). For example: +You can generate embeddings using [pgml.embed(model_name, text)](/docs/open-source/pgml/api/pgml.embed). For example: !!! generic diff --git a/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md b/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md index 31cddab00..2b97b8363 100644 --- a/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md +++ b/pgml-cms/docs/open-source/pgml/guides/embeddings/vector-normalization.md @@ -12,7 +12,7 @@ Vector normalization converts a vector into a unit vector — that is, a vector ## Storing and Normalizing Data -Assume you've created a table in your database that stores embeddings generated using [pgml.embed()](../../api/sql-extension/pgml.embed.md), although you can normalize any vector. +Assume you've created a table in your database that stores embeddings generated using [pgml.embed()](/docs/open-source/pgml/api/pgml.embed), although you can normalize any vector. ```postgresql CREATE TABLE documents ( diff --git a/pgml-cms/docs/open-source/pgml/guides/llms/README.md b/pgml-cms/docs/open-source/pgml/guides/llms/README.md new file mode 100644 index 000000000..e238eb905 --- /dev/null +++ b/pgml-cms/docs/open-source/pgml/guides/llms/README.md @@ -0,0 +1,37 @@ +# LLMs + +PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers) to bring state-of-the-art models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw inputs into useful results. Many state of the art deep learning architectures have been published and made available for download. You will want to browse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset) and [task](https://huggingface.co/tasks). For instance, with PostgresML you can: + +* Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation +* Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from :hugs: HuggingFace model hub +* Fine tune large language models (LLMs) on your own text data for different tasks +* Use your existing PostgreSQL database as a vector database by generating embeddings from text stored in the database. + +See [pgml.transform](/docs/open-source/pgml/api/pgml.transform "mention") for examples of using transformers or [pgml.tune](/docs/open-source/pgml/api/pgml.tune "mention") for fine tuning. + +## Supported tasks + +PostgresML currently supports most LLM tasks for Natural Language Processing available on Hugging Face: + +| Task | Name | Description | +|---------------------------------------------------------|-------------|---------| +| [Fill mask](fill-mask.md) | `key-mask` | Fill in the blank in a sentence. | +| [Question answering](question-answering.md) | `question-answering` | Answer a question based on a context. | +| [Summarization](summarization.md) | `summarization` | Summarize a long text. | +| [Text classification](text-classification.md) | `text-classification` | Classify a text as positive or negative. | +| [Text generation](text-generation.md) | `text-generation` | Generate text based on a prompt. | +| [Text-to-text generation](text-to-text-generation.md) | `text-to-text-generation` | Generate text based on an instruction in the prompt. | +| [Token classification](token-classification.md) | `token-classification` | Classify tokens in a text. | +| [Translation](translation.md) | `translation` | Translate text from one language to another. | +| [Zero-shot classification](zero-shot-classification.md) | `zero-shot-classification` | Classify a text without training data. | +| Conversational | `conversational` | Engage in a conversation with the model, e.g. chatbot. | + +## Structured inputs + +Both versions of the `pgml.transform()` function also support structured inputs, formatted with JSON. Structured inputs are used with the conversational task, e.g. to differentiate between the system and user prompts. Simply replace the text array argument with an array of JSONB objects. + + +## Additional resources + +- [Hugging Face datasets](https://huggingface.co/datasets) +- [Hugging Face tasks](https://huggingface.co/tasks) diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/fill-mask.md b/pgml-cms/docs/open-source/pgml/guides/llms/fill-mask.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/fill-mask.md rename to pgml-cms/docs/open-source/pgml/guides/llms/fill-mask.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/question-answering.md b/pgml-cms/docs/open-source/pgml/guides/llms/question-answering.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/question-answering.md rename to pgml-cms/docs/open-source/pgml/guides/llms/question-answering.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/summarization.md b/pgml-cms/docs/open-source/pgml/guides/llms/summarization.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/summarization.md rename to pgml-cms/docs/open-source/pgml/guides/llms/summarization.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/text-classification.md b/pgml-cms/docs/open-source/pgml/guides/llms/text-classification.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/text-classification.md rename to pgml-cms/docs/open-source/pgml/guides/llms/text-classification.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/text-generation.md b/pgml-cms/docs/open-source/pgml/guides/llms/text-generation.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/text-generation.md rename to pgml-cms/docs/open-source/pgml/guides/llms/text-generation.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/text-to-text-generation.md b/pgml-cms/docs/open-source/pgml/guides/llms/text-to-text-generation.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/text-to-text-generation.md rename to pgml-cms/docs/open-source/pgml/guides/llms/text-to-text-generation.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/token-classification.md b/pgml-cms/docs/open-source/pgml/guides/llms/token-classification.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/token-classification.md rename to pgml-cms/docs/open-source/pgml/guides/llms/token-classification.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/translation.md b/pgml-cms/docs/open-source/pgml/guides/llms/translation.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/translation.md rename to pgml-cms/docs/open-source/pgml/guides/llms/translation.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.transform/zero-shot-classification.md b/pgml-cms/docs/open-source/pgml/guides/llms/zero-shot-classification.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.transform/zero-shot-classification.md rename to pgml-cms/docs/open-source/pgml/guides/llms/zero-shot-classification.md diff --git a/pgml-cms/docs/open-source/pgml/guides/natural-language-processing.md b/pgml-cms/docs/open-source/pgml/guides/natural-language-processing.md deleted file mode 100644 index 97d05e50d..000000000 --- a/pgml-cms/docs/open-source/pgml/guides/natural-language-processing.md +++ /dev/null @@ -1,10 +0,0 @@ -# Natural Language Processing - -PostgresML integrates [🤗 Hugging Face Transformers](https://huggingface.co/transformers) to bring state-of-the-art models into the data layer. There are tens of thousands of pre-trained models with pipelines to turn raw inputs into useful results. Many state of the art deep learning architectures have been published and made available for download. You will want to browse all the [models](https://huggingface.co/models) available to find the perfect solution for your [dataset](https://huggingface.co/dataset) and [task](https://huggingface.co/tasks). For instance, with PostgresML you can: - -* Perform natural language processing (NLP) tasks like sentiment analysis, question and answering, translation, summarization and text generation -* Access 1000s of state-of-the-art language models like GPT-2, GPT-J, GPT-Neo from :hugs: HuggingFace model hub -* Fine tune large language models (LLMs) on your own text data for different tasks -* Use your existing PostgreSQL database as a vector database by generating embeddings from text stored in the database. - -See [pgml.transform](../api/sql-extension/pgml.transform/ "mention") for examples of using transformers or [pgml.tune.md](../api/sql-extension/pgml.tune.md "mention") for fine tuning. diff --git a/pgml-cms/docs/open-source/pgml/guides/supervised-learning.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/README.md similarity index 97% rename from pgml-cms/docs/open-source/pgml/guides/supervised-learning.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/README.md index 786cfc330..342cd67c3 100644 --- a/pgml-cms/docs/open-source/pgml/guides/supervised-learning.md +++ b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/README.md @@ -46,7 +46,7 @@ target | ### Training a Model -Now that we've got data, we're ready to train a model using an algorithm. We'll start with a classification task to demonstrate the basics. See [pgml.train](/docs/api/sql-extension/pgml.train/) for a complete list of available algorithms and tasks. +Now that we've got data, we're ready to train a model using an algorithm. We'll start with a classification task to demonstrate the basics. See [pgml.train](/docs/open-source/pgml/api/pgml.train) for a complete list of available algorithms and tasks. ```postgresql SELECT * FROM pgml.train( @@ -106,7 +106,7 @@ The `pgml.predict()` function is the key value proposition of PostgresML. It pro The API for predictions is very simple and only requires two arguments: the project name and the features used for prediction. ```postgresql -select pgml.predict ( +select pgml.predict( project_name TEXT, features REAL[] ) @@ -195,7 +195,7 @@ SELECT * FROM pgml.deployed_models; PostgresML will automatically deploy a model only if it has better metrics than existing ones, so it's safe to experiment with different algorithms and hyperparameters. -Take a look at [pgml.deploy](/docs/api/sql-extension/pgml.deploy) documentation for more details. +Take a look at [pgml.deploy](/docs/open-source/pgml/api/pgml.deploy) documentation for more details. ### Specific Models diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/classification.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/classification.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/classification.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/classification.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/clustering.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/clustering.md similarity index 95% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/clustering.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/clustering.md index 5c0558dd7..0691b0059 100644 --- a/pgml-cms/docs/open-source/pgml/api/pgml.train/clustering.md +++ b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/clustering.md @@ -27,7 +27,7 @@ LIMIT 10; ## Algorithms -All clustering algorithms implemented by PostgresML are online versions. You may use the [pgml.predict](../../../api/sql-extension/pgml.predict/ "mention")function to cluster novel data points after the clustering model has been trained. +All clustering algorithms implemented by PostgresML are online versions. You may use the [pgml.predict](/docs/open-source/pgml/api/pgml.predict/ "mention")function to cluster novel data points after the clustering model has been trained. | Algorithm | Reference | | ---------------------- | ----------------------------------------------------------------------------------------------------------------- | diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/data-pre-processing.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/data-pre-processing.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/data-pre-processing.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/data-pre-processing.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/decomposition.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/decomposition.md similarity index 94% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/decomposition.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/decomposition.md index abe3b88ef..ab11d1ee3 100644 --- a/pgml-cms/docs/open-source/pgml/api/pgml.train/decomposition.md +++ b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/decomposition.md @@ -29,7 +29,7 @@ Note that the input vectors have been reduced from 64 dimensions to 3, which exp ## Algorithms -All decomposition algorithms implemented by PostgresML are online versions. You may use the [pgml.decompose](../../../api/sql-extension/pgml.decompose "mention") function to decompose novel data points after the model has been trained. +All decomposition algorithms implemented by PostgresML are online versions. You may use the [pgml.decompose](/docs/open-source/pgml/api/pgml.decompose "mention") function to decompose novel data points after the model has been trained. | Algorithm | Reference | |---------------------------|---------------------------------------------------------------------------------------------------------------------| diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/hyperparameter-search.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/hyperparameter-search.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/hyperparameter-search.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/hyperparameter-search.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/joint-optimization.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/joint-optimization.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/joint-optimization.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/joint-optimization.md diff --git a/pgml-cms/docs/open-source/pgml/api/pgml.train/regression.md b/pgml-cms/docs/open-source/pgml/guides/supervised-learning/regression.md similarity index 100% rename from pgml-cms/docs/open-source/pgml/api/pgml.train/regression.md rename to pgml-cms/docs/open-source/pgml/guides/supervised-learning/regression.md diff --git a/pgml-cms/docs/open-source/pgml/guides/vector-database.md b/pgml-cms/docs/open-source/pgml/guides/vector-database.md index bdc12a456..f53792480 100644 --- a/pgml-cms/docs/open-source/pgml/guides/vector-database.md +++ b/pgml-cms/docs/open-source/pgml/guides/vector-database.md @@ -10,7 +10,7 @@ In Postgres, a vector is just another data type that can be stored in regular ta ### Installing pgvector -If you're using our [cloud](https://postgresml.org/signup) or our Docker image, your database has _pgvector_ installed already. If you're self-hosting PostgresML, take a look at our [Self-hosting](../resources/developer-docs/self-hosting/) documentation. +If you're using our [cloud](https://postgresml.org/signup) or our Docker image, your database has _pgvector_ installed already. If you're self-hosting PostgresML, take a look at our [Self-hosting](/docs/open-source/pgml/developers/self-hosting/) documentation. ### Working with vectors @@ -24,10 +24,8 @@ Using the example from [Tabular data](../../../introduction/import-your-data/sto {% tab title="SQL" %} ```postgresql -ALTER TABLE - usa_house_prices -ADD COLUMN - embedding VECTOR(384); +ALTER TABLE usa_house_prices +ADD COLUMN embedding VECTOR(384); ``` {% endtab %} @@ -43,14 +41,13 @@ ALTER TABLE #### Generating embeddings -At first, the column is empty. To generate embeddings, we can use the PostgresML [pgml.embed()](/docs/api/sql-extension/pgml.embed) function and generate an embedding of another column in the same (or different) table. This is where machine learning inside the database really shines: +At first, the column is empty. To generate embeddings, we can use the PostgresML [pgml.embed()](/docs/open-source/pgml/api/pgml.embed) function and generate an embedding of another column in the same (or different) table. This is where machine learning inside the database really shines: {% tabs %} {% tab title="SQL" %} ```postgresql -UPDATE - usa_house_prices +UPDATE usa_house_prices SET embedding = pgml.embed( 'Alibaba-NLP/gte-base-en-v1.5', address @@ -77,8 +74,7 @@ SELECT address, (embedding::real[])[1:5] FROM usa_house_prices -WHERE - address = '1 Infinite Loop, Cupertino, California'; +WHERE address = '1 Infinite Loop, Cupertino, California'; ``` @@ -116,8 +112,7 @@ For example, if we wanted to find three closest matching addresses to `1 Infinit {% tab title="SQL" %} ```postgresql -SELECT - address +SELECT address FROM usa_house_prices ORDER BY embedding <=> pgml.embed( @@ -142,7 +137,7 @@ LIMIT 3; {% endtab %} {% endtabs %} -This query uses [pgml.embed()](/docs/api/sql-extension/pgml.embed) to generate an embedding on the fly and finds the exact closest neighbors to that embedding in the entire dataset. +This query uses [pgml.embed()](/docs/open-source/pgml/api/pgml.embed) to generate an embedding on the fly and finds the exact closest neighbors to that embedding in the entire dataset. ### Approximate nearest neighbors @@ -185,8 +180,7 @@ You can create an IVFFlat index with just one query: {% tab title="SQL" %} ```postgresql -CREATE INDEX ON - usa_house_prices +CREATE INDEX ON usa_house_prices USING ivfflat(embedding vector_cosine_ops) WITH (lists = 71); ``` @@ -207,8 +201,8 @@ CREATE INDEX {% tab title="SQL" %} ```postgresql -EXPLAIN SELECT - address +EXPLAIN +SELECT address FROM usa_house_prices ORDER BY embedding <=> pgml.embed( @@ -242,8 +236,7 @@ On the other hand, because of the nature of centroids, if the dataset changes in {% tab title="SQL" %} ```postgresql -REINDEX INDEX CONCURRENTLY - usa_house_prices_embedding_idx; +REINDEX INDEX CONCURRENTLY usa_house_prices_embedding_idx; ``` {% endtab %} @@ -270,10 +263,8 @@ You can create an HNSW index with just one query: {% tab title="SQL" %} ```postgresql -CREATE INDEX ON - usa_house_prices -USING - hnsw(embedding vector_cosine_ops); +CREATE INDEX ON usa_house_prices +USING hnsw(embedding vector_cosine_ops); ``` {% endtab %} diff --git a/pgml-dashboard/src/api/cms.rs b/pgml-dashboard/src/api/cms.rs index 4fd1690bd..f22603cb7 100644 --- a/pgml-dashboard/src/api/cms.rs +++ b/pgml-dashboard/src/api/cms.rs @@ -56,17 +56,17 @@ lazy_static! { "Docs", false, HashMap::from([ - ("sdks/tutorials/semantic-search-using-instructor-model", "api/client-sdk/tutorials/semantic-search-using-instructor-model"), - ("data-storage-and-retrieval/documents", "resources/data-storage-and-retrieval/documents"), - ("guides/setup/quick_start_with_docker", "resources/developer-docs/quick-start-with-docker"), - ("guides/transformers/setup", "resources/developer-docs/quick-start-with-docker"), - ("transformers/fine_tuning/", "api/sql-extension/pgml.tune"), - ("guides/predictions/overview", "api/sql-extension/pgml.predict/"), - ("machine-learning/supervised-learning/data-pre-processing", "api/sql-extension/pgml.train/data-pre-processing"), + ("sdks/tutorials/semantic-search-using-instructor-model", "open-source/korvus/example-apps/semantic-search"), + ("data-storage-and-retrieval/documents", "introduction/import-your-data/storage-and-retrieval/documents"), + ("guides/setup/quick_start_with_docker", "open-source/pgml/developers/quick-start-with-docker"), + ("guides/transformers/setup", "open-source/pgml/developers/quick-start-with-docker"), + ("transformers/fine_tuning/", "open-source/pgml/api/pgml.tune"), + ("guides/predictions/overview", "open-source/pgml/api/pgml.predict/"), + ("machine-learning/supervised-learning/data-pre-processing", "open-source/pgml/guides/supervised-learning/data-pre-processing"), ("introduction/getting-started/import-your-data/", "introduction/import-your-data/"), ("introduction/getting-started/import-your-data/foreign-data-wrapper", "introduction/import-your-data/foreign-data-wrappers"), ("use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml", "open-source/pgml/guides/embeddings/in-database-generation"), - ("use-cases/natural-language-processing", "open-source/pgml/guides/natural-language-processing"), + ("use-cases/natural-language-processing", "open-source/pgml/guides/llms/"), ]) ); } @@ -866,9 +866,7 @@ pub async fn careers_apply(title: PathBuf, cluster: &Cluster) -> Result Redirect { match path.to_str().unwrap() { "apis" => Redirect::permanent("/docs/open-source/korvus/"), - "client-sdk/search" => { - Redirect::permanent("/docs/open-source/korvus/guides/document-search") - } + "client-sdk/search" => Redirect::permanent("/docs/open-source/korvus/guides/document-search"), "client-sdk/getting-started" => Redirect::permanent("/docs/open-source/korvus/"), "sql-extensions/pgml.predict/" => Redirect::permanent("/docs/open-source/pgml/api/pgml.predict/"), "sql-extensions/pgml.deploy" => Redirect::permanent("/docs/open-source/pgml/api/pgml.deploy"), diff --git a/pgml-dashboard/src/components/cms/index_link/index_link.scss b/pgml-dashboard/src/components/cms/index_link/index_link.scss index c3f6a3dc6..72617f6e0 100644 --- a/pgml-dashboard/src/components/cms/index_link/index_link.scss +++ b/pgml-dashboard/src/components/cms/index_link/index_link.scss @@ -5,7 +5,7 @@ div[data-controller="cms-index-link"] { .level-2-list, .level-3-list { margin-left: 4px; - padding-left: 19px; + padding-left: 10px; border-left: 1px solid #{$gray-600}; } diff --git a/pgml-dashboard/src/components/navigation/left_nav/docs/docs.scss b/pgml-dashboard/src/components/navigation/left_nav/docs/docs.scss index ad3b22233..c27bf348c 100644 --- a/pgml-dashboard/src/components/navigation/left_nav/docs/docs.scss +++ b/pgml-dashboard/src/components/navigation/left_nav/docs/docs.scss @@ -52,7 +52,11 @@ div[data-controller="navigation-left-nav-docs"] { padding: 8px 0px 8px 8px; border-radius: 4px; } - + + .nav { + font-size: 16px; + } + .nav-link { padding: 8px; } diff --git a/pgml-dashboard/src/components/navigation/navbar/marketing/template.html b/pgml-dashboard/src/components/navigation/navbar/marketing/template.html index c35420b68..66468c869 100644 --- a/pgml-dashboard/src/components/navigation/navbar/marketing/template.html +++ b/pgml-dashboard/src/components/navigation/navbar/marketing/template.html @@ -17,16 +17,16 @@ ]; let solutions_use_cases_links = vec![ + StaticNavLink::new("RAG".to_string(), "/rag".to_string()).icon("manage_search"), StaticNavLink::new("Search".to_string(), "/docs/open-source/pgml/guides/improve-search-results-with-machine-learning".to_string()).icon("feature_search"), StaticNavLink::new("Chatbots".to_string(), "/chatbot".to_string()).icon("smart_toy"), ]; let solutions_tasks_links = vec![ - StaticNavLink::new("RAG".to_string(), "/rag".to_string()).icon("manage_search"), - StaticNavLink::new("NLP".to_string(), "/docs/open-source/pgml/guides/natural-language-processing".to_string()).icon("description"), - StaticNavLink::new("Supervised Learning".to_string(), "/docs/open-source/pgml/guides/supervised-learning".to_string()).icon("model_training"), + StaticNavLink::new("LLMs".to_string(), "/docs/open-source/pgml/guides/llms/".to_string()).icon("token"), StaticNavLink::new("Embeddings".to_string(), "/docs/open-source/pgml/guides/embeddings/".to_string()).icon("subtitles"), StaticNavLink::new("Vector Database".to_string(), "/docs/open-source/pgml/guides/vector-database".to_string()).icon("open_with"), + StaticNavLink::new("Supervised Learning".to_string(), "/docs/open-source/pgml/guides/supervised-learning/".to_string()).icon("model_training"), ]; let company_links = vec![ diff --git a/pgml-dashboard/src/components/sections/footers/marketing_footer/mod.rs b/pgml-dashboard/src/components/sections/footers/marketing_footer/mod.rs index 0e60e6535..85dd55d41 100644 --- a/pgml-dashboard/src/components/sections/footers/marketing_footer/mod.rs +++ b/pgml-dashboard/src/components/sections/footers/marketing_footer/mod.rs @@ -17,14 +17,18 @@ impl MarketingFooter { product: vec![ StaticNavLink::new("Korvus".into(), "https://github.com/postgresml/korvus".into()), StaticNavLink::new("PGML".into(), "https://github.com/postgresml/postgresml".into()), - StaticNavLink::new("PpCat Learning".into(), "https://github.com/postgresml/pgcat".into()), + StaticNavLink::new("PgCat".into(), "https://github.com/postgresml/pgcat".into()), StaticNavLink::new("PostgresML".into(), "/docs/cloud/overview".into()), StaticNavLink::new("VPC".into(), "/docs/cloud/enterprise/vpc".into()), ], solutions: vec![ StaticNavLink::new( - "NLP".into(), - "/docs/open-source/pgml/guides/natural-language-processing".into(), + "LLMs".into(), + "/docs/open-source/pgml/guides/llms/".into(), + ), + StaticNavLink::new( + "Embeddings".into(), + "/docs/open-source/pgml/guides/embeddings/".into(), ), StaticNavLink::new( "Supervised Learning".into(), diff --git a/pgml-dashboard/static/css/scss/abstracts/variables.scss b/pgml-dashboard/static/css/scss/abstracts/variables.scss index 4825500cb..220dba202 100644 --- a/pgml-dashboard/static/css/scss/abstracts/variables.scss +++ b/pgml-dashboard/static/css/scss/abstracts/variables.scss @@ -253,7 +253,7 @@ $left-nav-w: 17rem; $left-nav-w-collapsed: 88px; // Docs Left Nav -$docs-left-nav-w: 260px; +$docs-left-nav-w: 300px; // WebApp Content Container $webapp-content-max-width: 1224px; diff --git a/pgml-extension/README.md b/pgml-extension/README.md index 228f94546..263a98823 100644 --- a/pgml-extension/README.md +++ b/pgml-extension/README.md @@ -1 +1 @@ -Please see the [quick start instructions](https://postgresml.org/docs/resources/developer-docs/quick-start-with-docker) for general information on installing or deploying PostgresML. A [developer guide](https://postgresml.org/docs/resources/developer-docs/contributing) is also available for those who would like to contribute. +Please see the [quick start instructions](https://postgresml.org/docs/open-source/pgml/developers/quick-start-with-docker) for general information on installing or deploying PostgresML. A [developer guide](https://postgresml.org/docs/open-source/pgml/developers/contributing) is also available for those who would like to contribute.