InternLM/InternEvo issues and pull requests

#176 - [QA] 关于使用张量并行或流水线并行的模型切分与合并问题

Issue - State: open - Opened by BaiBlanc 6 months ago
Labels: question

#170 - remove timer_diagnosis and bench_gpu

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#166 - fix internlm_accelerator

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#156 - fix(dummy_dataset): fixed_random_dataset_seqlen default is true

Pull Request - State: closed - Opened by sunpengsdu 6 months ago

#142 - replace is_cuda with get_accelerator_backend

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#140 - fix is_cuda to support npu

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#133 - remove global variable internlm_accelerator

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#132 - remove global variable internlm_accelerator

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#126 - refactor(model): refactor model architecture

Pull Request - State: closed - Opened by mwiacx 6 months ago

#123 - fix apply_rotary_torch not inplace problem

Pull Request - State: closed - Opened by sallyjunjun 6 months ago

#108 - support npu fa and nofa use sp

Pull Request - State: closed - Opened by sallyjunjun 7 months ago

#105 - fix(embedding.py): fix triton apply_rotary to rotary_emb version

Pull Request - State: closed - Opened by sallyjunjun 7 months ago

#101 - [Feature] a very simple hugging-face dataloader

Issue - State: open - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#100 - feat: rm grad profiling

Pull Request - State: closed - Opened by JiaoPL 7 months ago

#99 - [Feature] only overlap sync_grad in pp0 with pipeline parallelism

Issue - State: closed - Opened by sunpengsdu 7 months ago
Labels: enhancement

#99 - [Feature] only overlap sync_grad in pp0 with pipeline parallelism

Issue - State: closed - Opened by sunpengsdu 7 months ago
Labels: enhancement

#98 - [Bug] no need to uniscale_monitoring in public repo

Issue - State: closed - Opened by sunpengsdu 7 months ago
Labels: bug

#98 - [Bug] no need to uniscale_monitoring in public repo

Issue - State: closed - Opened by sunpengsdu 7 months ago
Labels: bug

#97 - [Bug] parallel output may be error in no flashattention case

Issue - State: closed - Opened by sunpengsdu 7 months ago - 3 comments
Labels: bug

#97 - [Bug] parallel output may be error in no flashattention case

Issue - State: closed - Opened by sunpengsdu 7 months ago - 3 comments
Labels: bug

#96 - [Feature] CI should have a true no flashattention env

Issue - State: open - Opened by sunpengsdu 7 months ago - 1 comment
Labels: bug, enhancement

#96 - [Feature] CI should have a true no flashattention env

Issue - State: open - Opened by sunpengsdu 7 months ago - 1 comment
Labels: bug, enhancement

#95 - [Feature] support sequence parallel in head layer and embedding layer

Issue - State: open - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#95 - [Feature] support sequence parallel in head layer and embedding layer

Issue - State: open - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#94 - [Feature] clear the two stage norm calc logic in the optimizer

Issue - State: closed - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#94 - [Feature] clear the two stage norm calc logic in the optimizer

Issue - State: closed - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#93 - [Feature] clear the norm value collection in the optimizer

Issue - State: closed - Opened by sunpengsdu 7 months ago - 1 comment
Labels: enhancement

#92 - feat(npu): add support for Ascend 910B

Pull Request - State: closed - Opened by SolenoidWGT 7 months ago - 1 comment

#91 - feat(multimodal): support train llava with dummy dataset

Pull Request - State: closed - Opened by Khoray 7 months ago

#91 - feat(multimodal): support train llava with dummy dataset

Pull Request - State: closed - Opened by Khoray 7 months ago

#90 - fix(transformers): fix no white space when chatting with fast tokenizer

Pull Request - State: closed - Opened by x54-729 7 months ago
Labels: bug

#89 - Fix(QA): fix check ckpt loss

Pull Request - State: closed - Opened by li126com 7 months ago

#88 - fix(tokenized/packed_dataset.py): fix packed dataset when train_folder is not None

Pull Request - State: closed - Opened by huangting4201 7 months ago

#87 - add gradient sharding

Pull Request - State: open - Opened by ChenQiaoling00 7 months ago

#87 - add gradient sharding

Pull Request - State: open - Opened by ChenQiaoling00 7 months ago

#86 - [Bug] OSError: [Errno 9] Bad file descriptor

Issue - State: closed - Opened by kkscilife 7 months ago - 1 comment
Labels: bug

#85 - 在安装docker环境时，总是爆出这个错误，无法解决

Issue - State: open - Opened by zjtggssg 7 months ago - 1 comment
Labels: bug

#84 - Speedup grad norm computation

Pull Request - State: closed - Opened by Godricly 7 months ago

#83 - fix(embedding.py): fix flash attn error of llama and internlm2

Pull Request - State: closed - Opened by sallyjunjun 7 months ago

#82 - feat(internlm): refactor code structure based on InternTrain

Pull Request - State: closed - Opened by huangting4201 7 months ago
Labels: enhancement

#81 - Update version.txt

Pull Request - State: closed - Opened by sunpengsdu 7 months ago

#80 - Fix missing requirments for NUMA

Pull Request - State: closed - Opened by Godricly 7 months ago - 1 comment

#79 - fix(ckpt): fix load funcs when loading llama & hf_llama

Pull Request - State: closed - Opened by gaoyang07 7 months ago - 1 comment
Labels: bug

#79 - fix(ckpt): fix load funcs when loading llama & hf_llama

Pull Request - State: closed - Opened by gaoyang07 7 months ago - 1 comment
Labels: bug

#78 - 升级CUDA版本以支持Windows版本的flash-attention

Issue - State: open - Opened by SkyblueMr 7 months ago - 1 comment
Labels: enhancement

#77 - [Bug]当数据不够的时候，会出现StopIteration。

Issue - State: closed - Opened by Carol-gutianle 7 months ago - 2 comments
Labels: bug

#76 - feat(moe): impl moe with megablock kernel

Pull Request - State: closed - Opened by blankde 7 months ago

#76 - feat(moe): impl moe with megablock kernel

Pull Request - State: closed - Opened by blankde 7 months ago

#75 - Feat(QA): temp no fa

Pull Request - State: closed - Opened by li126com 7 months ago

#75 - Feat(QA): temp no fa

Pull Request - State: closed - Opened by li126com 7 months ago

#74 - fix(transformers): fix parameter error of `safe_open` in revert scripts

Pull Request - State: closed - Opened by x54-729 7 months ago
Labels: bug

#74 - fix(transformers): fix parameter error of `safe_open` in revert scripts

Pull Request - State: closed - Opened by x54-729 7 months ago
Labels: bug

#73 - (feat/demo) add internlm2 1.8b config

Pull Request - State: closed - Opened by 00INDEX 7 months ago

#73 - (feat/demo) add internlm2 1.8b config

Pull Request - State: closed - Opened by 00INDEX 7 months ago

#72 - fix(communication/isp.py): fix redundant callback and remove head embed hook

Pull Request - State: closed - Opened by huangting4201 7 months ago - 1 comment

#72 - fix(communication/isp.py): fix redundant callback and remove head embed hook

Pull Request - State: closed - Opened by huangting4201 7 months ago - 1 comment

#71 - [Feature] add internlm2-1.8b finetuning config

Issue - State: open - Opened by 00INDEX 7 months ago
Labels: enhancement

#71 - [Feature] add internlm2-1.8b finetuning config

Issue - State: open - Opened by 00INDEX 7 months ago
Labels: enhancement

#70 - test(workflow): add workflow for norm_weight_test

Pull Request - State: closed - Opened by kkscilife 7 months ago

#70 - test(workflow): add workflow for norm_weight_test

Pull Request - State: closed - Opened by kkscilife 7 months ago

#69 - feat(modeling_internlm2.py): update model type to INTERNLM2_PUBLIC

Pull Request - State: closed - Opened by huangting4201 7 months ago - 1 comment

#69 - feat(modeling_internlm2.py): update model type to INTERNLM2_PUBLIC

Pull Request - State: closed - Opened by huangting4201 7 months ago - 1 comment

#68 - feat(model/linear.py): support norm head for model internlm2

Pull Request - State: closed - Opened by huangting4201 7 months ago

#67 - feat(optimizer/hybrid_zero_optim.py): update optim to adapt DiT model

Pull Request - State: closed - Opened by huangting4201 7 months ago

#66 - Delete .github/workflows/stale.yml

Pull Request - State: closed - Opened by del-zhenwu 7 months ago

#65 - feat(ckpt): optimize model checkpointing in Volc and Ali

Pull Request - State: closed - Opened by zigzagcai 7 months ago
Labels: enhancement

#65 - feat(ckpt): optimize model checkpointing in Volc and Ali

Pull Request - State: closed - Opened by zigzagcai 7 months ago
Labels: enhancement

#64 - Fix/fix broadcast overlap with isp

Pull Request - State: closed - Opened by mwiacx 7 months ago

#64 - Fix/fix broadcast overlap with isp

Pull Request - State: closed - Opened by mwiacx 7 months ago

#63 - fix(QA): fix test_swap_nb_loss_and_gradnorm

Pull Request - State: closed - Opened by li126com 7 months ago

#63 - fix(QA): fix test_swap_nb_loss_and_gradnorm

Pull Request - State: closed - Opened by li126com 7 months ago

#62 - Feat(QA norm)：check norm weights for different ranks

Pull Request - State: closed - Opened by li126com 7 months ago

#62 - Feat(QA norm)：check norm weights for different ranks

Pull Request - State: closed - Opened by li126com 7 months ago

#61 - Feat(QA norm)：check norm weights for different ranks

Pull Request - State: closed - Opened by li126com 7 months ago

#60 - feat(*): remove unnecessary communication

Pull Request - State: closed - Opened by mwiacx 7 months ago

#59 - chore(tools):update tools examples

Pull Request - State: open - Opened by x54-729 7 months ago

#59 - chore(tools):update tools examples

Pull Request - State: closed - Opened by x54-729 7 months ago

#58 - fix(context/process_group_initializer.py): fix gqa process group

Pull Request - State: closed - Opened by huangting4201 7 months ago

#58 - fix(context/process_group_initializer.py): fix gqa process group

Pull Request - State: closed - Opened by huangting4201 7 months ago

#57 - feat(moe): support isp for moe

Pull Request - State: open - Opened by blankde 7 months ago

#57 - feat(moe): support isp for moe

Pull Request - State: open - Opened by blankde 7 months ago

#56 - test(ci): add write permissions for actions

Pull Request - State: closed - Opened by kkscilife 7 months ago

#56 - test(ci): add write permissions for actions

Pull Request - State: closed - Opened by kkscilife 7 months ago

#55 - feat(switch topology): add control switch

Pull Request - State: closed - Opened by li126com 8 months ago
Labels: enhancement

#55 - feat(switch topology): add control switch

Pull Request - State: closed - Opened by li126com 8 months ago
Labels: enhancement

#54 - feat(parallel_context.py): add gqa process group to allreduce dkv

Pull Request - State: closed - Opened by huangting4201 8 months ago

#54 - feat(parallel_context.py): add gqa process group to allreduce dkv

Pull Request - State: closed - Opened by huangting4201 8 months ago

#53 - fix(optimizer/hybrid_zero_optim.py): fix layer norm grad allreduce when sp is True

Pull Request - State: closed - Opened by huangting4201 8 months ago

#53 - fix(optimizer/hybrid_zero_optim.py): fix layer norm grad allreduce when sp is True

Pull Request - State: closed - Opened by huangting4201 8 months ago

#52 - Fix (unitest, interleaved pp and other bugs): re-adapt unitest for isp and adapt interleaved pp for no flash_attention

Pull Request - State: closed - Opened by li126com 8 months ago

#52 - Fix (unitest, interleaved pp and other bugs): re-adapt unitest for isp and adapt interleaved pp for no flash_attention

Pull Request - State: closed - Opened by li126com 8 months ago

#51 - [Doc] https://arxiv.org/pdf/2401.09149.pdf Typo

Issue - State: closed - Opened by hxdtest 8 months ago - 1 comment
Labels: documentation

#51 - [Doc] https://arxiv.org/pdf/2401.09149.pdf Typo

Issue - State: closed - Opened by hxdtest 8 months ago - 1 comment
Labels: documentation

#50 - fix(moe): fix bugs for moe sequence parallel and memory pool

Pull Request - State: closed - Opened by blankde 8 months ago

#50 - fix(moe): fix bugs for moe sequence parallel and memory pool

Pull Request - State: closed - Opened by blankde 8 months ago

#49 - [Typo] `schedulder` -> `scheduler`

Issue - State: open - Opened by Spico197 8 months ago - 1 comment
Labels: question

#49 - [Typo] `schedulder` -> `scheduler`

Issue - State: open - Opened by Spico197 8 months ago - 1 comment
Labels: question

#48 - test(workflow): change env into flash2 and add rerun workflow

Pull Request - State: closed - Opened by kkscilife 8 months ago

#48 - test(workflow): change env into flash2 and add rerun workflow

Pull Request - State: closed - Opened by kkscilife 8 months ago

#47 - feat(model): update model internlm2

Pull Request - State: closed - Opened by huangting4201 8 months ago - 1 comment

GitHub / InternLM/InternEvo issues and pull requests