Question about apply_chat_template in examples #1752

EganGu · 2024-06-18T03:33:08Z

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

 def process(row): 

 row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False) 

 row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

The text was updated successfully, but these errors were encountered:

AIR-hl · 2024-06-18T06:06:20Z

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):

row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)

row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.
[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]
I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

EganGu · 2024-06-18T07:36:40Z

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):

row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)

row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.
[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]
I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.
@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108
Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.

AIR-hl · 2024-06-18T07:41:07Z

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):

row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)

row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.
[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]
I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.
@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template
https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108 Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.

@EganGu 没问题的，无论是aligmenthandbook，firefly还是llama-factory都是按刚才说的那样处理的

EganGu · 2024-06-18T07:54:47Z

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):

row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)

row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.
[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]
I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.
@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template
https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108 Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.
@EganGu 没问题的，无论是aligmenthandbook，firefly还是llama-factory都是按刚才说的那样处理的

感谢解答！

muzhi1991 · 2024-06-21T18:20:46Z

I have similar problem
First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False)
or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt)
Is this the expected behavior? Shouldn't these tokens be added only before the prompt?

And Second Question:
row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

AIR-hl · 2024-06-22T11:44:09Z

I have similar problem First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False) or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt) Is this the expected behavior? Shouldn't these tokens be added only before the prompt?

And Second Question: row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

@muzhi1991 Hi! You make a good question.

what these codes want to illustrate is that for multi-turn chat data where only the final response is different, for chosen and rejected response we should only apply templates to the final response instead of the entire conversation. This is just a simple example and does not cover all cases. For the 'chat_template' containing the system prompt, I think you should need to do additional processing.
I think whether to set add_generation_prompt=True depends on the chat template of your tokenizer and only needs to be set when inferencing . For example, if your chat_template always adds special tokens at the end of the user prompt to mark the beginning of the assistant response, then you dont need to set add_generation_prompt=False for the prompt. For some tokenizers such as HuggingFaceH4/zephyr-7b-beta you should set it True for prompt to tell model to generate. If you set it True when training, you will get a wrong chat as follows:

muzhi1991 · 2024-06-25T06:24:52Z

I have similar problem First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False) or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt) Is this the expected behavior? Shouldn't these tokens be added only before the prompt?
And Second Question: row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

@muzhi1991 Hi! You make a good question.

what these codes want to illustrate is that for multi-turn chat data where only the final response is different, for chosen and rejected response we should only apply templates to the final response instead of the entire conversation. This is just a simple example and does not cover all cases. For the 'chat_template' containing the system prompt, I think you should need to do additional processing.

I think whether to set add_generation_prompt=True depends on the chat template of your tokenizer and only needs to be set when inferencing . For example, if your chat_template always adds special tokens at the end of the user prompt to mark the beginning of the assistant response, then you dont need to set add_generation_prompt=False for the prompt. For some tokenizers such as HuggingFaceH4/zephyr-7b-beta you should set it True for prompt to tell model to generate. If you set it True when training, you will get a wrong chat as follows:

Thank you for your reply! it answered my question, I’m in agreement with you about whether to add_generation_token or not, but I found an inconsistency inside the trl library. I don’t know if I misunderstood the code or not. In the implementation of DataCollatorForCompletionOnlyLM, a mask operation is performed on the response token before (and including itself), and I understand that the response token itself does not participate in the calculation of the loss, which does not seem to be equivalent to the behavior above

Here:

trl/trl/trainer/utils.py

Lines 195 to 198 in 94d53e6

 response_token_ids_end_idx = response_token_ids_start_idx + len(self.response_token_ids) 

 # Make pytorch loss function ignore all tokens up through the end of the response key 

 batch["labels"][i, :response_token_ids_end_idx] = self.ignore_index

And here

trl/trl/trainer/utils.py

Line 211 in 94d53e6

response_token_ids_idxs.append(assistant_idx + len(self.response_token_ids))

AIR-hl mentioned this issue Jun 18, 2024

change the process function in the example of DPO #1753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about apply_chat_template in examples #1752

Question about apply_chat_template in examples #1752

EganGu commented Jun 18, 2024

AIR-hl commented Jun 18, 2024

EganGu commented Jun 18, 2024

AIR-hl commented Jun 18, 2024

EganGu commented Jun 18, 2024

muzhi1991 commented Jun 21, 2024

AIR-hl commented Jun 22, 2024

muzhi1991 commented Jun 25, 2024

Question about apply_chat_template in examples #1752

Question about apply_chat_template in examples #1752

Comments

EganGu commented Jun 18, 2024

AIR-hl commented Jun 18, 2024

EganGu commented Jun 18, 2024

AIR-hl commented Jun 18, 2024

EganGu commented Jun 18, 2024

muzhi1991 commented Jun 21, 2024

AIR-hl commented Jun 22, 2024

muzhi1991 commented Jun 25, 2024