쿼리 기반 모델

아티클
06/13/2024

이 문서에서는 기본 모델에 대한 쿼리 요청의 형식을 지정하고 엔드포인트를 제공하는 모델에 보내는 방법을 알아봅니다.

기존 ML 또는 Python 모델 쿼리 요청의 경우 사용자 지정 모델에 대한 엔드포인트를 제공하는 쿼리를 참조하세요.

Databricks Model Serving 는 파운데이션 모델에 액세스하기 위한 파운데이션 모델 API 및 외부 모델을 지원하고 통합 OpenAI 호환 API 및 SDK를 사용하여 쿼리합니다. 이렇게 하면 지원되는 클라우드 및 공급자에서 프로덕션을 위한 기본 모델을 실험하고 사용자 지정할 수 있습니다.

Databricks 모델 서비스 제공에서는 기본 모델에 점수 매기기 요청을 보내기 위한 다음 옵션을 제공합니다.

메서드	세부 정보
OpenAI 클라이언트	OpenAI 클라이언트를 사용하여 Databricks 모델 서비스 엔드포인트에서 호스트되는 모델을 쿼리합니다. 엔드포인트 이름을 입력으로 제공하는 모델을 지정합니다 `model` . Foundation Model API 또는 외부 모델에서 사용할 수 있는 채팅, 포함 및 완성 모델에 대해 지원됩니다.
서비스 UI	서비스 엔드포인트 페이지에서 쿼리 엔드포인트를선택합니다. JSON 형식 모델 입력 데이터를 삽입하고 요청 보내기를 클릭합니다. 모델에 기록된 입력 예제가 있는 경우 예제 표시를 사용하여 로드합니다.
REST API	REST API를 사용하여 모델을 호출하고 쿼리합니다. 자세한 내용은 POST /serving-endpoints/{name}/invocations를 참조 하세요 . 여러 모델을 제공하는 엔드포인트에 대한 요청의 점수를 매기려면 엔드포인트 뒤에서 개별 모델 쿼리를 참조하세요.
MLflow 배포 SDK	MLflow Deployments SDK의 predict() 함수를 사용하여 모델을 쿼리합니다.
Databricks GenAI SDK	Databricks GenAI SDK는 REST API 위에 있는 계층입니다. 인증 및 모델 ID를 엔드포인트 URL에 매핑하는 것과 같은 하위 수준 세부 정보를 처리하므로 모델과 보다 쉽게 상호 작용할 수 있습니다. SDK는 Databricks Notebook 내부에서 사용하도록 설계되었습니다.
SQL 함수	SQL 함수를 사용하여 SQL에서 직접 모델 유추를 호출합니다 `ai_query` . ai_query()를 사용하여 제공된 모델 쿼리를 참조하세요.

요구 사항

엔드포인트를 제공하는 모델입니다.
지원되는 지역의 Databricks 작업 영역입니다.
- 파운데이션 모델 API 지역
- 외부 모델 영역
OpenAI 클라이언트, REST API 또는 MLflow 배포 SDK를 통해 점수 매기기 요청을 보내려면 Databricks API 토큰이 있어야 합니다.

Important

프로덕션 시나리오에 대한 보안 모범 사례로 Databricks는 프로덕션 중에 인증을 위해 머신-머신 OAuth 토큰을 사용하는 것이 좋습니다.

테스트 및 개발을 위해 Databricks는 작업 영역 사용자 대신 서비스 주체에 속하는 개인용 액세스 토큰을 사용하는 것이 좋습니다. 서비스 주체에 대한 토큰을 만들려면 서비스 주체에 대한 토큰 관리를 참조하세요.

패키지 설치

쿼리 방법을 선택한 후에는 먼저 클러스터에 적절한 패키지를 설치해야 합니다.

OpenAI 클라이언트

OpenAI 클라이언트를 사용하려면 클러스터에 openai 패키지를 설치해야 합니다. Notebook 또는 로컬 터미널에서 다음을 실행합니다.

!pip install openai

다음은 Databricks Notebook에 패키지를 설치하는 경우에만 필요합니다.

dbutils.library.restartPython()

REST API

서비스 REST API에 대한 액세스는 Machine Learning용 Databricks 런타임에서 사용할 수 있습니다.

MLflow 배포 SDK

!pip install mlflow

다음은 Databricks Notebook에 패키지를 설치하는 경우에만 필요합니다.

dbutils.library.restartPython()

Databricks GenAI SDK

프로비전된 처리량 워크로드와 함께 사용 databricks-genai-inference 하려면 버전 0.2.2 이상을 사용해야 합니다.

 !pip install databricks-genai-inference

다음은 Databricks Notebook에 패키지를 설치하는 경우에만 필요합니다.

 dbutils.library.restartPython()

채팅 완료 모델 쿼리

다음은 채팅 모델을 쿼리하는 예제입니다.

일괄 처리 유추 예제는 Foundation Model API를 사용한 Batch 유추를 참조 하세요.

OpenAI 클라이언트

다음은 작업 영역의 토큰당 종량제 API databricks-dbrx-instruct 에서 사용할 수 있는 DBRX 지시 모델에 대한 채팅 요청입니다.

OpenAI 클라이언트를 사용하려면 엔드포인트 이름을 입력으로 model 제공하는 모델을 지정합니다. 다음 예제에서는 Databricks API 토큰이 있고 openai 컴퓨팅에 설치되어 있다고 가정합니다. OpenAI 클라이언트를 Databricks에 연결하려면 Databricks 작업 영역 인스턴스도 필요합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.chat.completions.create(
    model="databricks-dbrx-instruct",
    messages=[
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "What is a mixture of experts model?",
      }
    ],
    max_tokens=256
)

REST API

Important

다음 예제에서는 REST API 매개 변수를 사용하여 기본 모델을 제공하는 서비스 엔드포인트를 쿼리합니다. 이러한 매개 변수는 공개 미리 보기 이며 정의가 변경될 수 있습니다. POST /serving-endpoints/{name}/invocations를 참조 하세요.

다음은 작업 영역의 토큰당 종량제 API databricks-dbrx-instruct 에서 사용할 수 있는 DBRX 지시 모델에 대한 채팅 요청입니다.

curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": " What is a mixture of experts model?"
    }
  ]
}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-dbrx-instruct/invocations \

MLflow 배포 SDK

Important

다음 예제에서는 MLflow 배포 SDK의 API를 사용합니다predict().

다음은 작업 영역의 토큰당 종량제 API databricks-dbrx-instruct 에서 사용할 수 있는 DBRX 지시 모델에 대한 채팅 요청입니다.


import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

chat_response = client.predict(
    endpoint="databricks-dbrx-instruct",
    inputs={
        "messages": [
            {
              "role": "user",
              "content": "Hello!"
            },
            {
              "role": "assistant",
              "content": "Hello! How can I assist you today?"
            },
            {
              "role": "user",
              "content": "What is a mixture of experts model??"
            }
        ],
        "temperature": 0.1,
        "max_tokens": 20
    }
)

Databricks GenAI SDK

다음은 작업 영역의 토큰당 종량제 API databricks-dbrx-instruct 에서 사용할 수 있는 DBRX 지시 모델에 대한 채팅 요청입니다.

from databricks_genai_inference import ChatCompletion

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

response = ChatCompletion.create(model="databricks-dbrx-instruct",
                                messages=[{"role": "system", "content": "You are a helpful assistant."},
                                          {"role": "user","content": "What is a mixture of experts model?"}],
                                max_tokens=128)
print(f"response.message:{response.message}")

LangChain

LangChain을 사용하여 기초 모델 엔드포인트를 쿼리하려면 다음 중 하나를 수행할 수 있습니다.

LLM 클래스를 Databricks 가져오고 다음을 endpoint_name 지정합니다 transform_input_fn.
ChatModel 클래스를 ChatDatabricks 가져오고 .endpoint

다음 예제에서는 LangChain의 Databricks LLM 클래스를 사용하여 토큰당 종량제 엔드포인트 databricks-dbrx-instruct인 Foundation Model API를 쿼리합니다. 기본 모델 API는 messages 요청 사전에서 예상하는 반면 LangChain Databricks LLM은 기본적으로 요청 사전에 제공합니다 prompt . 함수를 transform_input 사용하여 요청 사전을 예상 형식으로 준비합니다.

from langchain.llms import Databricks
from langchain_core.messages import HumanMessage, SystemMessage

def transform_input(**request):
  request["messages"] = [
    {
      "role": "user",
      "content": request["prompt"]
    }
  ]
  del request["prompt"]
  return request

llm = Databricks(endpoint_name="databricks-dbrx-instruct", transform_input_fn=transform_input)
llm("What is a mixture of experts model?")

다음 예제에서는 ChatModel 클래스를 ChatDatabricks 사용하고 .endpoint

from langchain.chat_models import ChatDatabricks
from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You're a helpful assistant"),
    HumanMessage(content="What is a mixture of experts model?"),
]
chat_model = ChatDatabricks(endpoint="databricks-dbrx-instruct", max_tokens=500)
chat_model.invoke(messages)

SQL

Important

다음 예제에서는 기본 제공 SQL 함수 ai_query 사용합니다. 이 함수는 공개 미리 보기 이며 정의가 변경될 수 있습니다. ai_query()를 사용하여 제공된 모델 쿼리를 참조하세요.

다음은 작업 영역의 토큰당 종량제 API databricks-llama-2-70b-chat 에서 사용할 수 있는 채팅 요청 llama-2-70b-chat 입니다.

참고 항목

이 함수는 ai_query() DBRX 또는 DBRX 지시 모델을 제공하는 쿼리 엔드포인트를 지원하지 않습니다.

SELECT ai_query(
    "databricks-llama-2-70b-chat",
    "Can you explain AI in ten words?"
  )

다음은 채팅 모델에 필요한 요청 형식입니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 매개 변수를 포함할 수 있습니다. 추가 쿼리 매개 변수를 참조하세요.

{
  "messages": [
    {
      "role": "user",
      "content": "What is a mixture of experts model?"
    }
  ],
  "max_tokens": 100,
  "temperature": 0.1
}

다음은 예상 응답 형식입니다.

{
  "model": "databricks-dbrx-instruct",
  "choices": [
    {
      "message": {},
      "index": 0,
      "finish_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 74,
    "total_tokens": 81
  },
  "object": "chat.completion",
  "id": null,
  "created": 1698824353
}

채팅 세션

Databricks GenAI SDK는 다라운드 채팅 대화를 관리하는 클래스를 제공합니다 ChatSession . 제공하는 기능은 다음과 같습니다.

함수	Return	설명
`reply (string)`		새 사용자 메시지 사용
`last`	string	도우미의 마지막 메시지
`history`	받아쓰기 목록	역할을 포함하여 채팅 기록의 메시지입니다.
`count`	int	지금까지 수행된 채팅 라운드 수입니다.

초기화 ChatSession하려면 동일한 인수 ChatCompletion집합을 사용하고 이러한 인수는 채팅 세션 전체에서 사용됩니다.


from databricks_genai_inference import ChatSession

chat = ChatSession(model="llama-2-70b-chat", system_message="You are a helpful assistant.", max_tokens=128)
chat.reply("Knock, knock!")
chat.last # return "Hello! Who's there?"
chat.reply("Guess who!")
chat.last # return "Okay, I'll play along! Is it a person, a place, or a thing?"

chat.history
# return: [
#     {'role': 'system', 'content': 'You are a helpful assistant.'},
#     {'role': 'user', 'content': 'Knock, knock.'},
#     {'role': 'assistant', 'content': "Hello! Who's there?"},
#     {'role': 'user', 'content': 'Guess who!'},
#     {'role': 'assistant', 'content': "Okay, I'll play along! Is it a person, a place, or a thing?"}
# ]

포함 모델 쿼리

다음은 Foundation Model API에서 사용할 수 있는 모델에 대한 bge-large-en 포함 요청입니다.

OpenAI 클라이언트

OpenAI 클라이언트를 사용하려면 엔드포인트 이름을 입력으로 model 제공하는 모델을 지정합니다. 다음 예제에서는 Databricks API 토큰이 있고 openai 클러스터에 설치되어 있다고 가정합니다.


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

response = client.embeddings.create(
  model="databricks-bge-large-en",
  input="what is databricks"
)

REST API

Important


curl \
-u token:$DATABRICKS_TOKEN \
-X POST \
-H "Content-Type: application/json" \
-d  '{ "input": "Embed this sentence!"}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-bge-large-en/invocations

MLflow 배포 SDK

Important

다음 예제에서는 MLflow 배포 SDK의 API를 사용합니다predict().


import mlflow.deployments

export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

embeddings_response = client.predict(
    endpoint="databricks-bge-large-en",
    inputs={
        "input": "Here is some text to embed"
    }
)

Databricks GenAI SDK


from databricks_genai_inference import Embedding

# Only required when running this example outside of a Databricks Notebook
export DATABRICKS_HOST="https://<workspace_host>.databricks.com"
export DATABRICKS_TOKEN="dapi-your-databricks-token"

response = Embedding.create(
    model="bge-large-en",
    input="3D ActionSLAM: wearable person tracking in multi-floor environments")
print(f'embeddings: {response.embeddings}')

LangChain

LangChain에서 Databricks Foundation Model API 모델을 포함 모델로 사용하려면 클래스를 가져오 DatabricksEmbeddings 고 다음과 같이 매개 변수를 endpoint 지정합니다.

from langchain.embeddings import DatabricksEmbeddings

embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
embeddings.embed_query("Can you explain AI in ten words?")

SQL

Important


SELECT ai_query(
    "databricks-bge-large-en",
    "Can you explain AI in ten words?"
  )

다음은 embeddings 모델에 대한 예상 요청 형식입니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 매개 변수를 포함할 수 있습니다. 추가 쿼리 매개 변수를 참조하세요.


{
  "input": [
    "embedding text"
  ]
}

다음은 예상 응답 형식입니다.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": []
    }
  ],
  "model": "text-embedding-ada-002-v2",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

텍스트 완성 모델 쿼리

다음은 Foundation Model API에서 databricks-mpt-30b-instruct 사용할 수 있는 모델에 대한 완료 요청입니다. 매개 변수 및 구문은 완료 작업을 참조하세요.

OpenAI 클라이언트


import os
import openai
from openai import OpenAI

client = OpenAI(
    api_key="dapi-your-databricks-token",
    base_url="https://example.staging.cloud.databricks.com/serving-endpoints"
)

completion = client.completions.create(
  model="databricks-mpt-30b-instruct",
  prompt="what is databricks",
  temperature=1.0
)

REST API

Important


curl \
 -u token:$DATABRICKS_TOKEN \
 -X POST \
 -H "Content-Type: application/json" \
 -d '{"prompt": "What is a quoll?", "max_tokens": 64}' \
https://<workspace_host>.databricks.com/serving-endpoints/databricks-mpt-30b-instruct/invocations

MLflow 배포 SDK

Important

다음 예제에서는 MLflow 배포 SDK의 API를 사용합니다predict().


import os
import mlflow.deployments

# Only required when running this example outside of a Databricks Notebook

os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

client = mlflow.deployments.get_deploy_client("databricks")

completions_response = client.predict(
    endpoint="databricks-mpt-30b-instruct",
    inputs={
        "prompt": "What is the capital of France?",
        "temperature": 0.1,
        "max_tokens": 10,
        "n": 2
    }
)

# Print the response
print(completions_response)

Databricks GenAI SDK

import os
from databricks_genai_inference import Completion

# Only required when running this example outside of a Databricks Notebook
os.environ['DATABRICKS_HOST'] = "https://<workspace_host>.databricks.com"
os.environ['DATABRICKS_TOKEN'] = "dapi-your-databricks-token"

response = Completion.create(
    model="databricks-mpt-30b-instruct",
    prompt="Write 3 reasons why you should train an AI model on domain specific data sets.",
    max_tokens=128)
print(f"response.text:{response.text:}")

SQL

Important

SELECT ai_query(
    "databricks-mpt-30b-instruct",
    "Can you explain AI in ten words?"
  )

다음은 완성 모델에 대한 예상 요청 형식입니다. 외부 모델의 경우 지정된 공급자 및 엔드포인트 구성에 유효한 추가 매개 변수를 포함할 수 있습니다. 추가 쿼리 매개 변수를 참조하세요.

{
  "prompt": "What is mlflow?",
  "max_tokens": 100,
  "temperature": 0.1,
  "stop": [
    "Human:"
  ],
  "n": 1,
  "stream": false,
  "extra_params":{
    "top_p": 0.9
  }
}

다음은 예상 응답 형식입니다.

{
  "id": "cmpl-8FwDGc22M13XMnRuessZ15dG622BH",
  "object": "text_completion",
  "created": 1698809382,
  "model": "gpt-3.5-turbo-instruct",
  "choices": [
    {
    "text": "MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for tracking experiments, managing and deploying models, and collaborating on projects. MLflow also supports various machine learning frameworks and languages, making it easier to work with different tools and environments. It is designed to help data scientists and machine learning engineers streamline their workflows and improve the reproducibility and scalability of their models.",
    "index": 0,
    "logprobs": null,
    "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 83,
    "total_tokens": 88
  }
}

AI 플레이그라운드를 사용하여 지원되는 LLM과 채팅

AI Playground를 사용하여 지원되는 대규모 언어 모델과 상호 작용할 수 있습니다. AI 플레이그라운드는 Azure Databricks 작업 영역에서 LLM을 테스트, 프롬프트 및 비교할 수 있는 채팅과 유사한 환경입니다.

AI 플레이그라운드

다음을 통해 공유

쿼리 기반 모델

요구 사항

패키지 설치

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks GenAI SDK

채팅 완료 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks GenAI SDK

LangChain

SQL

채팅 세션

포함 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks GenAI SDK

LangChain

SQL

텍스트 완성 모델 쿼리

OpenAI 클라이언트

REST API

MLflow 배포 SDK

Databricks GenAI SDK

SQL

AI 플레이그라운드를 사용하여 지원되는 LLM과 채팅

추가 리소스

피드백

피드백

추가 리소스