-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Insights: huggingface/trl
Overview
Could not load contribution data
Please try again later
4 Pull requests merged by 3 people
-
Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig
#1794 merged
Jul 3, 2024 -
[SFT] add model_init_kwargs to training_args
#1787 merged
Jul 3, 2024 -
Fixed typo in SFT trainer docs
#1788 merged
Jun 29, 2024 -
[DOCS] fix docs and cli example script
#1780 merged
Jun 27, 2024
6 Pull requests opened by 4 people
-
Nash md
#1779 opened
Jun 27, 2024 -
Fix `start` index under `batched_forward_pass`
#1782 opened
Jun 27, 2024 -
Clean examples
#1791 opened
Jul 1, 2024 -
online dpo trainer based on rloo trainer
#1795 opened
Jul 3, 2024 -
DPO Llava support
#1797 opened
Jul 3, 2024 -
Remove extra print in reward_trainer.py
#1799 opened
Jul 3, 2024
6 Issues closed by 2 people
-
OOM with DPO Trainer on A100 GPU
#1667 closed
Jul 3, 2024 -
Llama 3 Unsloth Fixes
#1668 closed
Jul 3, 2024 -
Seq2SeqTrainer with DataCollatorForCompletionOnlyLM: incorrect masking for evaluation
#1634 closed
Jul 1, 2024 -
Bug in example DPO script in dataloading
#1541 closed
Jun 30, 2024 -
When I used galore on orpo, the learning rate was set to 8e-6, but the training rate was 0.01
#1638 closed
Jun 30, 2024 -
DPO rewards stucks at zero
#1311 closed
Jun 27, 2024
12 Issues opened by 12 people
-
Drop `use_cache=False if training_args.gradient_checkpointing`
#1798 opened
Jul 3, 2024 -
[Feature] Add DiscoPOP algorithm
#1796 opened
Jul 3, 2024 -
what's the difference between PPO Trainer and PPOv2 Trainer?
#1793 opened
Jul 1, 2024 -
In PPOv2Config and RLOOConfig, the base_model parameter doesn't seem to be used, why does it exist?
#1792 opened
Jul 1, 2024 -
Low loss but can't get the expected output during inference
#1790 opened
Jul 1, 2024 -
Optimizing an LLM Using DPO: nan Loss Values During Evaluation
#1789 opened
Jun 29, 2024 -
Lora seems to be invalid when using vsft_llava.py
#1786 opened
Jun 28, 2024 -
Error with SFT of LLaVA-Next
#1785 opened
Jun 28, 2024 -
Supports of SFTTrainer / PPOTrainer / DPOTrainer for LLaVA-alike model
#1784 opened
Jun 27, 2024 -
Clarification on reward/value heads in PPOV2
#1783 opened
Jun 27, 2024 -
Conflict in start index under `batched_forward_pass`
#1781 opened
Jun 27, 2024 -
The DPO 'grad_norm': 0.0,
#1778 opened
Jun 27, 2024
15 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (values = values * mask)
#1691 commented on
Jun 29, 2024 • 3 new comments -
fix model to save in ppov2
#1776 commented on
Jul 1, 2024 • 3 new comments -
rloo and ppov2 trainer with trainer callbacks
#1729 commented on
Jun 27, 2024 • 2 new comments -
DDPO trained model error when used to generate images
#1775 commented on
Jun 27, 2024 • 1 new comment -
Path for support generative eval in DPOTrainer
#1671 commented on
Jun 27, 2024 • 1 new comment -
ImportError: cannot import name 'DPOConfig' from 'trl'
#1642 commented on
Jun 29, 2024 • 1 new comment -
Can we use SFTTrainer for pre-training?
#1657 commented on
Jun 30, 2024 • 1 new comment -
DPOTrainer deepspeed.initialize cause ref_model not fixed
#1652 commented on
Jun 30, 2024 • 1 new comment -
vsft_llava: ValueError: Expected input batch_size (78528) to match target batch_size (41728).
#1685 commented on
Jul 1, 2024 • 1 new comment -
FSDP Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32
#1723 commented on
Jul 2, 2024 • 1 new comment -
Can bert be used for dpo training?
#1768 commented on
Jul 3, 2024 • 1 new comment -
Issue #1751 Fix
#1754 commented on
Jul 2, 2024 • 1 new comment -
[DRAFT] Vllm integration
#1628 commented on
Jul 3, 2024 • 0 new comments -
Add ppov2 sentiment example (as a replacement to imdb example)
#1759 commented on
Jul 1, 2024 • 0 new comments -
Add SRPO algorithm.
#1772 commented on
Jul 2, 2024 • 0 new comments