[Code Improvement] Support concatnate forward in reward trainer #1769

1485840691 · 2024-06-24T12:02:16Z

This PR is to address a previous code improvement suggestion that in reward trainer, we could borrow the same idea from DPOTrainer to concatenate chosen and rejected tokens to save one model forward call(). The pitfall of this concatenate forward is increase GPU memory. So add a flag to control on/off of this improvement feature.

vwxyzjn · 2024-06-24T15:36:42Z

Looks like a great change! Thanks @1485840691 for the PR

HuggingFaceDocBuilderDev · 2024-06-24T15:40:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vwxyzjn · 2024-06-24T16:03:56Z

Make sure you do make precommit

1485840691-eng · 2024-06-27T00:34:27Z

Make sure you do make precommit

Done precommit check. Please help review. Thanks

Support concatnate forward in reward trainer

2cf3902

1485840691 marked this pull request as draft June 24, 2024 12:02

Fix format error

65be756

1485840691 marked this pull request as ready for review June 25, 2024 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Code Improvement] Support concatnate forward in reward trainer #1769

[Code Improvement] Support concatnate forward in reward trainer #1769

1485840691 commented Jun 24, 2024

vwxyzjn commented Jun 24, 2024

HuggingFaceDocBuilderDev commented Jun 24, 2024

vwxyzjn commented Jun 24, 2024

1485840691-eng commented Jun 27, 2024

[Code Improvement] Support concatnate forward in reward trainer #1769

Are you sure you want to change the base?

[Code Improvement] Support concatnate forward in reward trainer #1769

Conversation

1485840691 commented Jun 24, 2024

vwxyzjn commented Jun 24, 2024

HuggingFaceDocBuilderDev commented Jun 24, 2024

vwxyzjn commented Jun 24, 2024

1485840691-eng commented Jun 27, 2024