Sharding the llama2 70b on v5e-16 more efficiently. #706

zhihaoshan-google · 2024-06-14T20:31:36Z

https://arxiv.org/pdf/2211.05102
https://arxiv.org/pdf/1909.08053

MaxText/configs/v5e/inference/llama2_70b_v5e-16.yml

morgandu · 2024-06-17T17:10:30Z

MaxText/layers/attentions.py

@@ -759,9 +762,6 @@ def kv_cache_autoregressive(
 assert cached_ar_key_var[0].value.shape == self.cached_kv_shape((batch, self.max_target_length - self.max_prefill_predict_length, heads, kv_head_size), self.ar_key_axis_order)
 assert cached_ar_value_var[0].value.shape == self.cached_kv_shape((batch, self.max_target_length - self.max_prefill_predict_length, heads, kv_head_size), self.ar_value_axis_order)

- key = nn.with_logical_constraint(key, (BATCH, LENGTH, HEAD, D_KV))
- value = nn.with_logical_constraint(value, (BATCH, LENGTH, HEAD, D_KV))
-


Does removing the logical constraint have any impact? I know in some cases, compiler is not able to propagate the sharding, thus hard coded nn.with_logical_constraint is in places in some cases.

There are already annotated in https://github.com/google/maxtext/blob/main/MaxText/layers/attentions.py#L1057-L1062.

Instead of removing them could you replace them with the more specific axis names?

If they are already annotated in the https://github.com/google/maxtext/blob/main/MaxText/layers/attentions.py#L1057-L1062, I don't think we need to annotate it again. (just like we don't annotate it in the kv_cache_prefill, I think we don't need to annotate it in the kv_cache_autoregressive as well)

I think these two lines are safe to be deleted given https://github.com/google/maxtext/blob/main/MaxText/layers/attentions.py#L1057-L1062

MaxText/layers/llama2.py

MaxText/layers/attentions.py

MaxText/layers/llama2.py

MaxText/max_utils.py

getting_started/Data_Input_Pipeline.md

gobbleturk · 2024-06-17T18:18:23Z

MaxText/layers/attentions.py

@@ -759,9 +762,6 @@ def kv_cache_autoregressive(
 assert cached_ar_key_var[0].value.shape == self.cached_kv_shape((batch, self.max_target_length - self.max_prefill_predict_length, heads, kv_head_size), self.ar_key_axis_order)
 assert cached_ar_value_var[0].value.shape == self.cached_kv_shape((batch, self.max_target_length - self.max_prefill_predict_length, heads, kv_head_size), self.ar_value_axis_order)

- key = nn.with_logical_constraint(key, (BATCH, LENGTH, HEAD, D_KV))
- value = nn.with_logical_constraint(value, (BATCH, LENGTH, HEAD, D_KV))
-


Instead of removing them could you replace them with the more specific axis names?

MaxText/layers/llama2.py

vipannalla

Looks good, I've similar concerns as Morgan and Matthew regarding configs vs. model specific code.

MaxText/layers/attentions.py

MaxText/layers/llama2.py

MaxText/max_utils.py

vipannalla · 2024-06-17T20:55:11Z

Talked off line, George is OOO soon and doesn't time right now. He can make these changes once he is back in 2 weeks. I'm ok with merging this as a short-term fix, will let @gobbleturk decide.

gobbleturk

Thanks George!

zhihaoshan-google · 2024-06-17T21:38:37Z

Thanks for the review, Matt, Vipan and Morgan!

https://arxiv.org/pdf/2211.05102 https://arxiv.org/pdf/1909.08053

zhihaoshan-google requested review from morgandu, vipannalla and jwyang-google June 14, 2024 20:31

zhihaoshan-google requested review from rwitten and gobbleturk as code owners June 14, 2024 20:31

zhihaoshan-google force-pushed the zhihaoshan_dev branch from c5bbd85 to 0ef4504 Compare June 14, 2024 20:56

morgandu reviewed Jun 17, 2024

View reviewed changes

MaxText/configs/v5e/inference/llama2_70b_v5e-16.yml Outdated Show resolved Hide resolved

morgandu reviewed Jun 17, 2024

View reviewed changes

MaxText/layers/llama2.py Outdated Show resolved Hide resolved

zhihaoshan-google force-pushed the zhihaoshan_dev branch from 0ef4504 to a9951b0 Compare June 17, 2024 17:39

zhihaoshan-google requested a review from morgandu June 17, 2024 17:41

zhihaoshan-google force-pushed the zhihaoshan_dev branch 2 times, most recently from 77c181e to 5122191 Compare June 17, 2024 18:17

gobbleturk requested changes Jun 17, 2024

View reviewed changes

zhihaoshan-google requested a review from gobbleturk June 17, 2024 18:28

zhihaoshan-google force-pushed the zhihaoshan_dev branch from 5122191 to 247e3a1 Compare June 17, 2024 18:54

vipannalla reviewed Jun 17, 2024

View reviewed changes

MaxText/layers/attentions.py Outdated Show resolved Hide resolved

MaxText/layers/llama2.py Outdated Show resolved Hide resolved

MaxText/max_utils.py Outdated Show resolved Hide resolved

zhihaoshan-google requested a review from vipannalla June 17, 2024 20:40

zhihaoshan-google force-pushed the zhihaoshan_dev branch 2 times, most recently from b44df5e to 5e6f7a4 Compare June 17, 2024 21:20

gobbleturk approved these changes Jun 17, 2024

View reviewed changes

zhihaoshan-google force-pushed the zhihaoshan_dev branch 6 times, most recently from d1c9cc7 to e8a4961 Compare June 18, 2024 00:59

Sharding the llama2 70b on v5e-16 more efficiently.

8bf9f8e

https://arxiv.org/pdf/2211.05102 https://arxiv.org/pdf/1909.08053

zhihaoshan-google force-pushed the zhihaoshan_dev branch from e8a4961 to 8bf9f8e Compare June 18, 2024 02:10

github-actions bot added the pull ready label Jun 18, 2024

copybara-service bot merged commit 180a780 into main Jun 18, 2024
13 checks passed

copybara-service bot deleted the zhihaoshan_dev branch June 18, 2024 03:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharding the llama2 70b on v5e-16 more efficiently. #706

Sharding the llama2 70b on v5e-16 more efficiently. #706

zhihaoshan-google commented Jun 14, 2024

morgandu Jun 17, 2024

zhihaoshan-google Jun 17, 2024

gobbleturk Jun 17, 2024

zhihaoshan-google Jun 17, 2024

morgandu Jun 17, 2024

gobbleturk Jun 17, 2024

vipannalla left a comment

vipannalla commented Jun 17, 2024

gobbleturk left a comment

zhihaoshan-google commented Jun 17, 2024

Sharding the llama2 70b on v5e-16 more efficiently. #706

Sharding the llama2 70b on v5e-16 more efficiently. #706

Conversation

zhihaoshan-google commented Jun 14, 2024

morgandu Jun 17, 2024

Choose a reason for hiding this comment

zhihaoshan-google Jun 17, 2024

Choose a reason for hiding this comment

gobbleturk Jun 17, 2024

Choose a reason for hiding this comment

zhihaoshan-google Jun 17, 2024

Choose a reason for hiding this comment

morgandu Jun 17, 2024

Choose a reason for hiding this comment

gobbleturk Jun 17, 2024

Choose a reason for hiding this comment

vipannalla left a comment

Choose a reason for hiding this comment

vipannalla commented Jun 17, 2024

gobbleturk left a comment

Choose a reason for hiding this comment

zhihaoshan-google commented Jun 17, 2024