Releases · databricks/megablocks

@tgale96

What's Changed

Update dependencies and package organization. by @tgale96 in #52
Remove errant "*" in README by @tgale96 in #54
Update Megatron-LM scripts and integration for latest Docker container. by @tgale96 in #55
Update setup.py to support multiple device capabilities by @simon-mo in #56
enable arg enabled normalization of routing weights by @vchiley in #58
More customizable norm for expert weights by @snarayan21 in #60
Update README.md by @eltociear in #63
enable custom activation functions by @vchiley in #65
Skip updating load balancing loss on eval by @sedrick-keh-tri in #69
Change router weight norm from in-place by @sashaDoubov in #70
add mem optimized grouped glu by @vchiley in #66
Add cast to tensor for DTensor inputs for groupedmlp by @eracah in #71
Dtensor to all paths by @mvpatel2000 in #73
Refactor dtesnor by @mvpatel2000 in #74
Mem opt glu bkwd by @mvpatel2000 in #72
Add dmlp registry args by @j316chuck in #75
Fix default to be sparse by @mvpatel2000 in #76
Fix moe_normalize_expert_weights when top_k=1 by @152334H in #87
Updt triton pin by @vchiley in #89

New Contributors

@simon-mo made their first contribution in #56
@snarayan21 made their first contribution in #60
@eltociear made their first contribution in #63
@sedrick-keh-tri made their first contribution in #69
@eracah made their first contribution in #71
@j316chuck made their first contribution in #75
@152334H made their first contribution in #87

Full Changelog: v0.5.0...v0.5.1

@mvpatel2000

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed

Update version by @mvpatel2000 in #36
Avoid duplicate .cpu() call by @mvpatel2000 in #37
Have megablocks rely on torch default precision by @mvpatel2000 in #39
Add GLU support by @sashaDoubov in #38
Enable generic dimentionality for input by @vchiley in #41
Removing an extra size call by @bcui19 in #43
Fix bug in topology kernel for ffn_hidden_size>4096. by @tgale96 in #47

New Contributors

@sashaDoubov made their first contribution in #38
@bcui19 made their first contribution in #43

Full Changelog: v0.4.0...v0.5.0

@mvpatel2000

What's Changed

Unpack saved context once by @mvpatel2000 in #33
Refactoring class hierarchy for FSDP wrapping by @tgale96 in #34

Full Changelog: v0.3.3...v0.4.0

@vchiley

What's Changed

Enable running MegaBlocks MoE without bias by @vchiley in #31

Full Changelog: v0.3.2...v0.3.3

@dblalock

What's Changed

Support for bfloat16
Optimizations for top_k > 1
Support for fully-sharded data parallelism
Support tensor model parallelism when expert_parallel_world_size > num_experts
Optimizations for activation memory
Support activation quantization (thanks @dblalock!)
Optimizations for SM90 (Hopper)
Lots of bug fixes, cleanup and small optimizations

New Contributors

@vchiley made their first contribution in #9
@deepakn94 made their first contribution in #16
@b-chu made their first contribution in #19

Full Changelog: v0.1...v0.3.2

Initial release documenting repository state prior to MLSys'23 camera-ready publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's New

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: databricks/megablocks

v0.5.1

What's Changed

New Contributors

Contributors

v0.5.0

What's New

What's Changed

New Contributors

Contributors

v0.4.0

What's Changed

Contributors

v0.3.3

What's Changed

Contributors

v0.3.2

What's Changed

New Contributors

Contributors

Version 0.1