epfllm/megatron-llm issues and pull requests

#100 - Any plans to rebase the codebase to most recent Megatron-LM for MoE?

Issue - State: open - Opened by xingyaoww 9 months ago

#100 - Any plans to rebase the codebase to most recent Megatron-LM for MoE?

Issue - State: open - Opened by xingyaoww 9 months ago

#99 - Correctness when enabling FlashAttention + Sequence Parallel at the same time?

Issue - State: closed - Opened by xingyaoww 9 months ago - 2 comments

#99 - Correctness when enabling FlashAttention + Sequence Parallel at the same time?

Issue - State: closed - Opened by xingyaoww 9 months ago - 2 comments

#98 - Multi nodes

Issue - State: closed - Opened by wodeqiansuihan 9 months ago - 1 comment

#98 - Multi nodes

Issue - State: closed - Opened by wodeqiansuihan 9 months ago - 1 comment

#97 - update conversion script to support codellama-70b

Pull Request - State: open - Opened by panx27 10 months ago

#97 - update conversion script to support codellama-70b

Pull Request - State: open - Opened by panx27 10 months ago

#96 - Support QWen？

Issue - State: open - Opened by Vincent131499 10 months ago - 1 comment

#96 - Support QWen？

Issue - State: open - Opened by Vincent131499 10 months ago - 1 comment

#95 - How to load from a saved intermediate checkpoint?

Issue - State: closed - Opened by jjzha 10 months ago - 3 comments

#95 - How to load from a saved intermediate checkpoint?

Issue - State: closed - Opened by jjzha 10 months ago - 3 comments

#94 - error: preprocess.py file error while working on custom data

Issue - State: open - Opened by toqeer618 10 months ago

#94 - error: preprocess.py file error while working on custom data

Issue - State: open - Opened by toqeer618 10 months ago

#93 - Replace 1F1B with ZB-H1

Pull Request - State: open - Opened by QPHutu 11 months ago - 4 comments

#93 - Replace 1F1B with ZB-H1

Pull Request - State: open - Opened by QPHutu 11 months ago - 4 comments

#92 - LLaMA2-70B Inference Optmization

Issue - State: closed - Opened by RaymondHQR 11 months ago - 1 comment

#92 - LLaMA2-70B Inference Optmization

Issue - State: closed - Opened by RaymondHQR 11 months ago - 1 comment

#91 - LLaMa and Mistral 7B pretraining support

Issue - State: closed - Opened by StephennFernandes 11 months ago - 2 comments

#91 - LLaMa and Mistral 7B pretraining support

Issue - State: closed - Opened by StephennFernandes 11 months ago - 2 comments

#90 - added mistral docs

Pull Request - State: closed - Opened by AleHD about 1 year ago

#90 - added mistral docs

Pull Request - State: closed - Opened by AleHD about 1 year ago

#89 - One question about the permute function code in permute_qkv.py

Issue - State: open - Opened by drxmy about 1 year ago - 2 comments

#89 - One question about the permute function code in permute_qkv.py

Issue - State: open - Opened by drxmy about 1 year ago - 2 comments

#88 - Add Mistral Model

Pull Request - State: closed - Opened by xingyaoww about 1 year ago

#88 - Add Mistral Model

Pull Request - State: closed - Opened by xingyaoww about 1 year ago

#87 - Evalonly and wbresume

Pull Request - State: closed - Opened by AleHD about 1 year ago

#87 - Evalonly and wbresume

Pull Request - State: closed - Opened by AleHD about 1 year ago

#86 - Fix missing position_ids argument when recompute_granularity == full

Pull Request - State: open - Opened by xingyaoww about 1 year ago

#86 - Fix missing position_ids argument when recompute_granularity == full

Pull Request - State: open - Opened by xingyaoww about 1 year ago

#85 - Typo Fixes in docs/

Pull Request - State: closed - Opened by tmsagarofficial about 1 year ago

#85 - Typo Fixes in docs/

Pull Request - State: closed - Opened by tmsagarofficial about 1 year ago

#84 - Support specifying load_iters for checkpoint

Pull Request - State: closed - Opened by xingyaoww about 1 year ago - 2 comments

#84 - Support specifying load_iters for checkpoint

Pull Request - State: closed - Opened by xingyaoww about 1 year ago - 2 comments

#83 - Use --no_new_tokens to stop adding built-in special tokens

Pull Request - State: closed - Opened by xingyaoww about 1 year ago - 4 comments

#83 - Use --no_new_tokens to stop adding built-in special tokens

Pull Request - State: closed - Opened by xingyaoww about 1 year ago - 4 comments

#82 - args.make_vocab_size_divisible_by set failed

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#82 - args.make_vocab_size_divisible_by set failed

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#81 - llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#81 - llama2-7B AssertionError: padded_vocab_size value from checkpoint (32000) is not equal to the input argument value (32256)

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#80 - RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED

Issue - State: closed - Opened by 13416157913 about 1 year ago - 4 comments

#80 - RuntimeError: seq_len <= 2048 INTERNAL ASSERT FAILED

Issue - State: closed - Opened by 13416157913 about 1 year ago - 4 comments

#79 - finetune llama2-7B when set --seq_length 4096 error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#79 - finetune llama2-7B when set --seq_length 4096 error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#78 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#78 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 1 comment

#77 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 2 comments

#77 - run finetune llama2-7B error

Issue - State: closed - Opened by 13416157913 about 1 year ago - 2 comments

#76 - Support for Mistral

Issue - State: closed - Opened by philschmid about 1 year ago - 7 comments

#76 - Support for Mistral

Issue - State: closed - Opened by philschmid about 1 year ago - 7 comments

#75 - Add eval-only arguments and W&B resume options

Pull Request - State: closed - Opened by eric11eca about 1 year ago - 4 comments
Labels: enhancement

#75 - Add eval-only arguments and W&B resume options

Pull Request - State: closed - Opened by eric11eca about 1 year ago - 4 comments
Labels: enhancement

#74 - Update getting_started.md

Pull Request - State: closed - Opened by AleHD about 1 year ago

#74 - Update getting_started.md

Pull Request - State: closed - Opened by AleHD about 1 year ago

#73 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)

Issue - State: closed - Opened by liuxm117 about 1 year ago - 2 comments

#73 - RuntimeError: mat1 and mat2 shapes cannot be multiplied (29056x22016 and 11008x4096)

Issue - State: closed - Opened by liuxm117 about 1 year ago - 2 comments

#72 - Add pointer to the shm-size docker arg to the docs

Pull Request - State: closed - Opened by kylematoba about 1 year ago

#72 - Add pointer to the shm-size docker arg to the docs

Pull Request - State: closed - Opened by kylematoba about 1 year ago

#71 - support falcon 180B

Issue - State: open - Opened by martinjaggi about 1 year ago

#71 - support falcon 180B

Issue - State: open - Opened by martinjaggi about 1 year ago

#70 - Getting started "shard" model not working

Issue - State: closed - Opened by philschmid about 1 year ago - 9 comments

#70 - Getting started "shard" model not working

Issue - State: closed - Opened by philschmid about 1 year ago - 9 comments

#69 - [Save checkpoint needs long time]

Issue - State: closed - Opened by mynewstart about 1 year ago - 2 comments

#69 - [Save checkpoint needs long time]

Issue - State: closed - Opened by mynewstart about 1 year ago - 2 comments

#68 - add support to finetune with use_distributed_optimizer

Pull Request - State: closed - Opened by dumpmemory about 1 year ago - 11 comments

#68 - add support to finetune with use_distributed_optimizer

Pull Request - State: closed - Opened by dumpmemory about 1 year ago - 11 comments

#67 - [Megatron Base Version] Would mind share the based version of Megatron ?

Issue - State: closed - Opened by dumpmemory about 1 year ago - 7 comments

#67 - [Megatron Base Version] Would mind share the based version of Megatron ?

Issue - State: closed - Opened by dumpmemory about 1 year ago - 7 comments

#66 - Tokens per second metric

Pull Request - State: closed - Opened by AleHD about 1 year ago

#66 - Tokens per second metric

Pull Request - State: closed - Opened by AleHD about 1 year ago

#65 - Feature Request: Can we directly use the huggingface dataset for training

Issue - State: closed - Opened by dumpmemory about 1 year ago - 4 comments
Labels: enhancement

#65 - Feature Request: Can we directly use the huggingface dataset for training

Issue - State: closed - Opened by dumpmemory about 1 year ago - 4 comments
Labels: enhancement

#64 - [Swiglu] question about swiglu

Issue - State: closed - Opened by mynewstart about 1 year ago - 6 comments
Labels: question

#64 - [Swiglu] question about swiglu

Issue - State: closed - Opened by mynewstart about 1 year ago - 6 comments
Labels: question

#63 - Loading weights from hf conversion with different TP,PP settings

Issue - State: closed - Opened by binwang777 about 1 year ago - 14 comments

#63 - Loading weights from hf conversion with different TP,PP settings

Issue - State: closed - Opened by binwang777 about 1 year ago - 14 comments

#62 - Fixed linear time increase observed when micro=1

Pull Request - State: closed - Opened by AleHD about 1 year ago - 2 comments

#62 - Fixed linear time increase observed when micro=1

Pull Request - State: closed - Opened by AleHD about 1 year ago - 2 comments

#61 - From custom hf source

Pull Request - State: closed - Opened by AleHD about 1 year ago

#61 - From custom hf source

Pull Request - State: closed - Opened by AleHD about 1 year ago

#60 - iteration-time increases linearly when micro_batch_size=1

Issue - State: closed - Opened by LlinWing about 1 year ago - 1 comment

#60 - iteration-time increases linearly when micro_batch_size=1

Issue - State: closed - Opened by LlinWing about 1 year ago - 1 comment

#59 - Update hf_to_megatron.py

Pull Request - State: closed - Opened by AleHD about 1 year ago

#59 - Update hf_to_megatron.py

Pull Request - State: closed - Opened by AleHD about 1 year ago

#58 - Instruct loss scalar

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#58 - Instruct loss scalar

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#57 - Better documentation

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#57 - Better documentation

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#56 - Llama v1 import from HF support

Pull Request - State: closed - Opened by AleHD over 1 year ago - 3 comments

#56 - Llama v1 import from HF support

Pull Request - State: closed - Opened by AleHD over 1 year ago - 3 comments

#55 - Metrics support

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#55 - Metrics support

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#54 - Prepend bos token

Issue - State: closed - Opened by panx27 over 1 year ago - 1 comment

#54 - Prepend bos token

Issue - State: closed - Opened by panx27 over 1 year ago - 1 comment

#53 - Make llama2 vocab size divisible by 128 by default

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#53 - Make llama2 vocab size divisible by 128 by default

Pull Request - State: closed - Opened by AleHD over 1 year ago - 1 comment

#52 - dose 8 A100 80g enough to finetune 70b llama2 ?

Issue - State: closed - Opened by james2v over 1 year ago - 5 comments

#52 - dose 8 A100 80g enough to finetune 70b llama2 ?

Issue - State: closed - Opened by james2v over 1 year ago - 5 comments

#51 - Add CodeLlama support

Pull Request - State: closed - Opened by andreaskoepf over 1 year ago - 6 comments

#51 - Add CodeLlama support

Pull Request - State: closed - Opened by andreaskoepf over 1 year ago - 6 comments

GitHub / epfllm/megatron-llm issues and pull requests