Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about apply_chat_template in examples #1752

Open
EganGu opened this issue Jun 18, 2024 · 7 comments
Open

Question about apply_chat_template in examples #1752

EganGu opened this issue Jun 18, 2024 · 7 comments

Comments

@EganGu
Copy link

EganGu commented Jun 18, 2024

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@AIR-hl
Copy link
Contributor

AIR-hl commented Jun 18, 2024

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

@EganGu
Copy link
Author

EganGu commented Jun 18, 2024

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108
Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.

@AIR-hl
Copy link
Contributor

AIR-hl commented Jun 18, 2024

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108 Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.

@EganGu 没问题的,无论是aligmenthandbook,firefly还是llama-factory都是按刚才说的那样处理的

@EganGu
Copy link
Author

EganGu commented Jun 18, 2024

When I looked at the examples I found that the example script for DPO uses apply_chat_template for chosen and rejected but not for prompt.

trl/examples/scripts/dpo.py

Lines 150 to 152 in d1ed730

def process(row):
row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False)
row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False)

And it seems that chosen is a complete conversation.

[ { "content": "Hi, I want to learn to play horseshoes. Can you teach me?", "role": "user" }, { "content": "I can, but maybe I should begin by telling you that a typical game consists of 2 players and 6 or 8 horseshoes.", "role": "assistant" }, { "content": "Okay. What else is needed to play, and what are the rules?", "role": "user" }, { "content": "A horseshoe is usually made out of metal and is about 3 to 3.5 inches long and around 1 inch thick. The horseshoe should also have a 2 inch by 3 inch flat at the bottom where the rubber meets the metal. We also need two stakes and six horseshoes.", "role": "assistant" } ]

I think that using chat_template for the input prompt and only remaining the assistant output as chosen/rejected will be consistent with the inference phase.

@EganGu Hi! There is a same question in #1541 . I think the example is wrong , the chosen and rejected reponse should only include the final turn response, and the prompt also should be applied the chat template

https://github.com/huggingface/alignment-handbook/blob/606d2e954fd17999af40e6fb4f712055ca11b2f0/src/alignment/data.py#L42-L108 Yes! Thanks for the reply. And I found that the implement in alignment seems to make sense.

@EganGu 没问题的,无论是aligmenthandbook,firefly还是llama-factory都是按刚才说的那样处理的

感谢解答!

@muzhi1991
Copy link

I have similar problem
First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False)
or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt)
Is this the expected behavior? Shouldn't these tokens be added only before the prompt?

And Second Question:
row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

@AIR-hl
Copy link
Contributor

AIR-hl commented Jun 22, 2024

I have similar problem First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False) or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt) Is this the expected behavior? Shouldn't these tokens be added only before the prompt?

And Second Question: row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

@muzhi1991 Hi! You make a good question.

  1. what these codes want to illustrate is that for multi-turn chat data where only the final response is different, for chosen and rejected response we should only apply templates to the final response instead of the entire conversation. This is just a simple example and does not cover all cases. For the 'chat_template' containing the system prompt, I think you should need to do additional processing.
  2. I think whether to set add_generation_prompt=True depends on the chat template of your tokenizer and only needs to be set when inferencing . For example, if your chat_template always adds special tokens at the end of the user prompt to mark the beginning of the assistant response, then you dont need to set add_generation_prompt=False for the prompt. For some tokenizers such as HuggingFaceH4/zephyr-7b-beta you should set it True for prompt to tell model to generate. If you set it True when training, you will get a wrong chat as follows:
    image

@muzhi1991
Copy link

I have similar problem First, row["chosen"] = tokenizer.apply_chat_template(row["chosen"][-1], tokenize=False) or tokenizer.apply_chat_template(row["rejected"][-1], tokenize=False) all will add some token at begin of sequences(such as system prompt) Is this the expected behavior? Shouldn't these tokens be added only before the prompt?
And Second Question: row["prompt"] = tokenizer.apply_chat_template(row["chosen"][:-1], tokenize=False) Should add add_generation_prompt=True params?

@muzhi1991 Hi! You make a good question.

  1. what these codes want to illustrate is that for multi-turn chat data where only the final response is different, for chosen and rejected response we should only apply templates to the final response instead of the entire conversation. This is just a simple example and does not cover all cases. For the 'chat_template' containing the system prompt, I think you should need to do additional processing.
  2. I think whether to set add_generation_prompt=True depends on the chat template of your tokenizer and only needs to be set when inferencing . For example, if your chat_template always adds special tokens at the end of the user prompt to mark the beginning of the assistant response, then you dont need to set add_generation_prompt=False for the prompt. For some tokenizers such as HuggingFaceH4/zephyr-7b-beta you should set it True for prompt to tell model to generate. If you set it True when training, you will get a wrong chat as follows:
    image

Thank you for your reply! it answered my question, I’m in agreement with you about whether to add_generation_token or not, but I found an inconsistency inside the trl library. I don’t know if I misunderstood the code or not. In the implementation of DataCollatorForCompletionOnlyLM, a mask operation is performed on the response token before (and including itself), and I understand that the response token itself does not participate in the calculation of the loss, which does not seem to be equivalent to the behavior above

Here:

trl/trl/trainer/utils.py

Lines 195 to 198 in 94d53e6

response_token_ids_end_idx = response_token_ids_start_idx + len(self.response_token_ids)
# Make pytorch loss function ignore all tokens up through the end of the response key
batch["labels"][i, :response_token_ids_end_idx] = self.ignore_index

And here

response_token_ids_idxs.append(assistant_idx + len(self.response_token_ids))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants