CUDA OOM Issues #8

rikabi89 · 2023-02-17T09:32:29Z

Sorry to be here again.

I have a 3070 8GB

Now my dataset is fine. I keep getting cuda errros. I've identified 3 places in the yml I can edit to reduce batch sizes but even putting it to 1 gets me an error.

I've also tried changing mega_batch_factor: as your notes.

I tried a much smaller dataset of 600 wav files.

I get this :

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ H:\DL-Art-School\codes\train.py:370 in <module> │ │ │ │ 367 │ │ torch.cuda.set_device(torch.distributed.get_rank()) │ │ 368 │ │ │ 369 │ trainer.init(args.opt, opt, args.launcher) │ │ ❱ 370 │ trainer.do_training() │ │ 371 │ │ │ │ H:\DL-Art-School\codes\train.py:325 in do_training │ │ │ │ 322 │ │ │ │ │ 323 │ │ │ _t = time() │ │ 324 │ │ │ for train_data in tq_ldr: │ │ ❱ 325 │ │ │ │ self.do_step(train_data) │ │ 326 │ │ │ 327 │ def create_training_generator(self, index): │ │ 328 │ │ self.logger.info('Start training from epoch: {:d}, iter: {:d}'.format(self.start │ │ │ │ H:\DL-Art-School\codes\train.py:206 in do_step │ │ │ │ 203 │ │ │ print("Update LR: %f" % (time() - _t)) │ │ 204 │ │ _t = time() │ │ 205 │ │ self.model.feed_data(train_data, self.current_step) │ │ ❱ 206 │ │ gradient_norms_dict = self.model.optimize_parameters(self.current_step, return_g │ │ 207 │ │ iteration_rate = (time() - _t) / batch_size │ │ 208 │ │ if self._profile: │ │ 209 │ │ │ print("Model feed + step: %f" % (time() - _t)) │ │ │ │ H:\DL-Art-School\codes\trainer\ExtensibleTrainer.py:302 in optimize_parameters │ │ │ │ 299 │ │ │ new_states = {} │ │ 300 │ │ │ self.batch_size_optimizer.focus(net) │ │ 301 │ │ │ for m in range(self.batch_factor): │ │ ❱ 302 │ │ │ │ ns = step.do_forward_backward(state, m, step_num, train=train_step, no_d │ │ 303 │ │ │ │ # Call into post-backward hooks. │ │ 304 │ │ │ │ for name, net in self.networks.items(): │ │ 305 │ │ │ │ │ if hasattr(net.module, "after_backward"): │ │ │ │ H:\DL-Art-School\codes\trainer\steps.py:214 in do_forward_backward │ │ │ │ 211 │ │ local_state = {} # <-- Will store the entire local state to be passed to inject │ │ 212 │ │ new_state = {} # <-- Will store state values created by this step for returning │ │ 213 │ │ for k, v in state.items(): │ │ ❱ 214 │ │ │ local_state[k] = v[grad_accum_step] │ │ 215 │ │ local_state['train_nets'] = str(self.get_networks_trained()) │ │ 216 │ │ loss_accumulator = self.loss_accumulator if loss_accumulator is None else loss_a │ │ 217 │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ IndexError: list index out of range

The text was updated successfully, but these errors were encountered:

152334H · 2023-02-17T10:06:01Z

I have a 3070 8GB

That's a big problem... batch size and vram usage are only partially correlated, there is a minimum amount of vram to load the full optimizer states of the GPT model.

There is one immediate thing you could attempt -- enable FP16 training. Keep the batch sizes at a reasonable level (some multiple of mega_batch_factor) to prevent the error from occurring.

If that is not sufficient, then more complicated efforts will be required to reduce vram usage. One possible effort would be to preprocess the dataset into quantized mels, rather than making use of the vqvae on the fly. But personally I would recommend using the new colab notebook if it doesn't work at that point.

rikabi89 · 2023-02-17T10:14:55Z

ah ok fair enough. I did mess around with FP16 and other settings.

Now when I changed heads to 1, I got this :

I guess this means nothing is happening?

152334H · 2023-02-17T10:37:21Z

changed heads to 1,

That will not work. The architecture of the model must not be adjusted; you will get nonsense results if the model isn't fully loaded.

152334H · 2023-02-17T12:52:18Z

if someone knew how to implement LORA, it might be applicable to this situation

but I think colab is the best option for now. I will close this issue until the situation changes.

Anomyous1 · 2023-02-22T16:27:45Z

have you tried implementing it in colossal AI? it claims to get 1.5x to 8x speedup on PCs for training OPT and GPT type models through larger ram pagefile magic

if someone knew how to implement LORA, it might be applicable to this situation

but I think colab is the best option for now. I will close this issue until the situation changes.

152334H · 2023-02-23T03:29:31Z

have you tried implementing it in colossal AI?

The primary problem with using ColossalAI, or any other "GPT-2 infer/train Speedup" project, is that the GPT model here is not exactly the same as a normal GPT-2 model. It injects the conditional latents into the input embeddings on the first forward pass (or on all forward passes when there is no kv_cache). A speedup framework that doesn't expose callbacks at the forward pass (which is all I have seen) has to be redeveloped in some manner

It is possible I am missing some obvious performance gains, but so far integration has not been an immediately obvious process for me

152334H · 2023-02-23T07:22:38Z

bitsandbytes

152334H · 2023-02-23T13:58:24Z

Following the mrq implementation, I have added 8bit training with bitsandbytes in 091c6b1

However, this will only work for Linux, because Windows has issues with direct pip installations for bnb. Paging @devilismyfriend for help here

152334H · 2023-02-23T14:00:27Z

no bnb, bs=125

bnb (without fp16), bs=125

for some reason my training seems to be substantially slower when I apply bnb with fp16 checked, while also not lowering memory use at all. To investigate.

devilismyfriend · 2023-02-23T22:54:13Z

Yeah should be an easy fix for windows

152334H closed this as completed Feb 17, 2023

152334H reopened this Feb 23, 2023

152334H mentioned this issue Feb 24, 2023

The addition of 'bitsandbytes' may have broke training #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA OOM Issues #8

CUDA OOM Issues #8

rikabi89 commented Feb 17, 2023

152334H commented Feb 17, 2023

rikabi89 commented Feb 17, 2023

152334H commented Feb 17, 2023

152334H commented Feb 17, 2023

Anomyous1 commented Feb 22, 2023 •

edited

Loading

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

devilismyfriend commented Feb 23, 2023

CUDA OOM Issues #8

CUDA OOM Issues #8

Comments

rikabi89 commented Feb 17, 2023

152334H commented Feb 17, 2023

rikabi89 commented Feb 17, 2023

152334H commented Feb 17, 2023

152334H commented Feb 17, 2023

Anomyous1 commented Feb 22, 2023 • edited Loading

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

152334H commented Feb 23, 2023

devilismyfriend commented Feb 23, 2023

Anomyous1 commented Feb 22, 2023 •

edited

Loading