epfllm/megatron-llm issues and pull requests

#100 - Any plans to rebase the codebase to most recent Megatron-LM for MoE?

Issue - State: open - Opened by xingyaoww 11 months ago

#100 - Any plans to rebase the codebase to most recent Megatron-LM for MoE?

Issue - State: open - Opened by xingyaoww 11 months ago

#99 - Correctness when enabling FlashAttention + Sequence Parallel at the same time?

Issue - State: closed - Opened by xingyaoww 11 months ago - 2 comments

#99 - Correctness when enabling FlashAttention + Sequence Parallel at the same time?

Issue - State: closed - Opened by xingyaoww 11 months ago - 2 comments

#98 - Multi nodes

Issue - State: closed - Opened by wodeqiansuihan 11 months ago - 1 comment

#98 - Multi nodes

Issue - State: closed - Opened by wodeqiansuihan 11 months ago - 1 comment

#97 - update conversion script to support codellama-70b

Pull Request - State: open - Opened by panx27 12 months ago

#97 - update conversion script to support codellama-70b

Pull Request - State: open - Opened by panx27 12 months ago

#96 - Support QWen？

Issue - State: open - Opened by Vincent131499 12 months ago - 1 comment

#96 - Support QWen？

Issue - State: open - Opened by Vincent131499 12 months ago - 1 comment

#95 - How to load from a saved intermediate checkpoint?

Issue - State: closed - Opened by jjzha about 1 year ago - 3 comments

#95 - How to load from a saved intermediate checkpoint?

Issue - State: closed - Opened by jjzha about 1 year ago - 3 comments

#94 - error: preprocess.py file error while working on custom data

Issue - State: open - Opened by toqeer618 about 1 year ago

#94 - error: preprocess.py file error while working on custom data

Issue - State: open - Opened by toqeer618 about 1 year ago

#93 - Replace 1F1B with ZB-H1

Pull Request - State: open - Opened by QPHutu about 1 year ago - 4 comments

#93 - Replace 1F1B with ZB-H1

Pull Request - State: open - Opened by QPHutu about 1 year ago - 4 comments

#92 - LLaMA2-70B Inference Optmization

Issue - State: closed - Opened by RaymondHQR about 1 year ago - 1 comment

#92 - LLaMA2-70B Inference Optmization

Issue - State: closed - Opened by RaymondHQR about 1 year ago - 1 comment

#91 - LLaMa and Mistral 7B pretraining support

Issue - State: closed - Opened by StephennFernandes about 1 year ago - 2 comments

#91 - LLaMa and Mistral 7B pretraining support

Issue - State: closed - Opened by StephennFernandes about 1 year ago - 2 comments

#90 - added mistral docs

Pull Request - State: closed - Opened by AleHD about 1 year ago

#90 - added mistral docs

Pull Request - State: closed - Opened by AleHD about 1 year ago

#89 - One question about the permute function code in permute_qkv.py

Issue - State: open - Opened by drxmy about 1 year ago - 2 comments

#89 - One question about the permute function code in permute_qkv.py

Issue - State: open - Opened by drxmy about 1 year ago - 2 comments

#88 - Add Mistral Model

Pull Request - State: closed - Opened by xingyaoww about 1 year ago

#88 - Add Mistral Model

Pull Request - State: closed - Opened by xingyaoww about 1 year ago

#87 - Evalonly and wbresume

Pull Request - State: closed - Opened by AleHD about 1 year ago

#87 - Evalonly and wbresume

Pull Request - State: closed - Opened by AleHD about 1 year ago

#86 - Fix missing position_ids argument when recompute_granularity == full

Pull Request - State: open - Opened by xingyaoww over 1 year ago

#86 - Fix missing position_ids argument when recompute_granularity == full

Pull Request - State: open - Opened by xingyaoww over 1 year ago

#85 - Typo Fixes in docs/

Pull Request - State: closed - Opened by tmsagarofficial over 1 year ago

#85 - Typo Fixes in docs/

Pull Request - State: closed - Opened by tmsagarofficial over 1 year ago

#84 - Support specifying load_iters for checkpoint

Pull Request - State: closed - Opened by xingyaoww over 1 year ago - 2 comments

#84 - Support specifying load_iters for checkpoint

Pull Request - State: closed - Opened by xingyaoww over 1 year ago - 2 comments

#83 - Use --no_new_tokens to stop adding built-in special tokens

Pull Request - State: closed - Opened by xingyaoww over 1 year ago - 4 comments

#83 - Use --no_new_tokens to stop adding built-in special tokens

Pull Request - State: closed - Opened by xingyaoww over 1 year ago - 4 comments

#82 - args.make_vocab_size_divisible_by set failed

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#82 - args.make_vocab_size_divisible_by set failed

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#81 - llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#81 - llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#80 - RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED

Issue - State: closed - Opened by 13416157913 over 1 year ago - 4 comments

#80 - RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED

Issue - State: closed - Opened by 13416157913 over 1 year ago - 4 comments

#79 - finetune llama2-7B when set --seq_length 4096 error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#79 - finetune llama2-7B when set --seq_length 4096 error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#78 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#78 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 1 comment

#77 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 2 comments

#77 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 over 1 year ago - 2 comments

#76 - Support for Mistral

Issue - State: closed - Opened by philschmid over 1 year ago - 7 comments

#76 - Support for Mistral

Issue - State: closed - Opened by philschmid over 1 year ago - 7 comments

#75 - Add eval-only arguments and W&B resume options

Pull Request - State: closed - Opened by eric11eca over 1 year ago - 4 comments
Labels: enhancement

#75 - Add eval-only arguments and W&B resume options

Pull Request - State: closed - Opened by eric11eca over 1 year ago - 4 comments
Labels: enhancement

#74 - Update getting_started.md

Pull Request - State: closed - Opened by AleHD over 1 year ago

#74 - Update getting_started.md

Pull Request - State: closed - Opened by AleHD over 1 year ago

#73 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)

Issue - State: closed - Opened by liuxm117 over 1 year ago - 2 comments

#73 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)

Issue - State: closed - Opened by liuxm117 over 1 year ago - 2 comments

#72 - Add pointer to the shm-size docker arg to the docs

Pull Request - State: closed - Opened by kylematoba over 1 year ago

#72 - Add pointer to the shm-size docker arg to the docs

Pull Request - State: closed - Opened by kylematoba over 1 year ago

#71 - support falcon 180B

Issue - State: open - Opened by martinjaggi over 1 year ago

#71 - support falcon 180B

Issue - State: open - Opened by martinjaggi over 1 year ago

#70 - Getting started "shard" model not working

Issue - State: closed - Opened by philschmid over 1 year ago - 9 comments

#70 - Getting started "shard" model not working

Issue - State: closed - Opened by philschmid over 1 year ago - 9 comments

#69 - [Save checkpoint needs long time]

Issue - State: closed - Opened by mynewstart over 1 year ago - 2 comments

#69 - [Save checkpoint needs long time]

Issue - State: closed - Opened by mynewstart over 1 year ago - 2 comments

#68 - add support to finetune with use_distributed_optimizer

Pull Request - State: closed - Opened by dumpmemory over 1 year ago - 11 comments

#68 - add support to finetune with use_distributed_optimizer

Pull Request - State: closed - Opened by dumpmemory over 1 year ago - 11 comments

#67 - [Megatron Base Version] Would mind share the based version of Megatron ?

Issue - State: closed - Opened by dumpmemory over 1 year ago - 7 comments

#67 - [Megatron Base Version] Would mind share the based version of Megatron ?

Issue - State: closed - Opened by dumpmemory over 1 year ago - 7 comments

#66 - Tokens per second metric

Pull Request - State: closed - Opened by AleHD over 1 year ago

#66 - Tokens per second metric

Pull Request - State: closed - Opened by AleHD over 1 year ago

#65 - Feature Request: Can we directly use the huggingface dataset for training

Issue - State: closed - Opened by dumpmemory over 1 year ago - 4 comments
Labels: enhancement

#65 - Feature Request: Can we directly use the huggingface dataset for training

Issue - State: closed - Opened by dumpmemory over 1 year ago - 4 comments
Labels: enhancement

#64 - [Swiglu] question about swiglu

Issue - State: closed - Opened by mynewstart over 1 year ago - 6 comments
Labels: question

#64 - [Swiglu] question about swiglu

Issue - State: closed - Opened by mynewstart over 1 year ago - 6 comments
Labels: question

#63 - Loading weights from hf conversion with different TP,PP settings

Issue - State: closed - Opened by binwang777 over 1 year ago - 14 comments

#63 - Loading weights from hf conversion with different TP,PP settings

Issue - State: closed - Opened by binwang777 over 1 year ago - 14 comments

#62 - Fixed linear time increase observed when micro=1

Pull Request - State: closed - Opened by AleHD over 1 year ago - 2 comments

#62 - Fixed linear time increase observed when micro=1

Pull Request - State: closed - Opened by AleHD over 1 year ago - 2 comments

#61 - From custom hf source

Pull Request - State: closed - Opened by AleHD over 1 year ago

#61 - From custom hf source

Pull Request - State: closed - Opened by AleHD over 1 year ago

#60 - iteration-time increases linearly when micro_batch_size=1

Issue - State: closed - Opened by LlinWing over 1 year ago - 1 comment

#60 - iteration-time increases linearly when micro_batch_size=1

Issue - State: closed - Opened by LlinWing over 1 year ago - 1 comment

#59 - Update hf_to_megatron.py

Pull Request - State: closed - Opened by AleHD over 1 year ago