Hi,
I have a llama3-70b-001 model deployed to Vertex AI via the Model Garden. I want to get predictions via the REST API from a Node.js application.
Here's the request I am making:
const response = await fetch(`https://${region}-aiplatform.googleapis.com/v1/projects/${project}/locations/us-west4/endpoints/${endpoint}:predict`, {
method: 'POST',
headers: {
Authorization: `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
instances: [
{
prompt: 'You are a career advisor. Give me 10 tips for a good CV.',
},
],
parameters: {
max_output_tokens: maxTokens,
temperature,
},
}),
cache: 'no-store',
});
Here's the response I am getting.
{
predictions: [
'Prompt:\n' +
'You are a career advisor. Give me 10 tips for a good CV.\n' +
'Output:\n' +
' _Use the phrases in the box_.\n' +
'\\begin{tabular}{l'
],
deployedModelId: <redacted>,
model: <redacted>,
modelDisplayName: 'llama3-70b-001',
modelVersionId: '1'
}
I have a couple of questions:
I have tried with llama-3-70b-chat-001 as well, with similar results. The documentation on how to pass parameters to specific models is lacking, or at least I couldn't find it.
Thanks!
Can you try using special tokens?
<|start_header_id|>system<|end_header_id|>
{System}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{User}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
See this for reference. (URL Removed by Staff)
That didn't work:
const userPrompt = 'Give me 10 tips for making a great CV';
const systemPrompt = 'You are a helpful assistant.';
const response = await fetch(`https://us-west4-aiplatform.googleapis.com/v1/projects/${project}/locations/us-west4/endpoints/596894076394012672:predict`, {
method: 'POST',
headers: {
Authorization: `Bearer ${token}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
instances: [
{
prompt: `<|start_header_id|>system<|end_header_id|>
${systemPrompt}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
${userPrompt}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>`,
},
],
parameters: {
// max_output_tokens: maxTokens,
temperature,
},
}),
cache: 'no-store',
});
Returns:
{
predictions: [
'Prompt:\n' +
'<|start_header_id|>system<|end_header_id|>\n' +
' \n' +
' <|eot_id|>\n' +
' <|start_header_id|>user<|end_header_id|>\n' +
' \n' +
' <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n' +
'Output:\n' +
' that are easy to aver fit absorbance and have an amount of heat, presents'
],
...
}
Did string interpolation happen? looks like userPrompt and systemPrompt are missing?
I am interpolating userPrompt and systemPrompt correctly when I construct my request. However, I am probably not passing the body/instances that llama 3 on vertex expects. Do you have a real working example of making a request to llama 3 on vertex via REST?
I'm having the same issue. Can anyone share example object to send for this to work? I deployed llama3-8b-chat001 model on Vertex but the answers I get are totally random. Please give us an example object where we can set system and user prompt properly.
I got it to work after finding that example https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/....
Request body looks something like this according to the example:
instances = [
{
"prompt": prompt,
"max_tokens": max_tokens,
"temperature": temperature,
"top_p": top_p,
"top_k": top_k,
"raw_response": raw_response,
},
]
Setting "raw_response" to true only give you the generated output.
User | Count |
---|---|
17 | |
2 | |
1 | |
1 | |
1 |