Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / alibaba/Megatron-LLaMA issues and pull requests
#70 - 执行hf转megatron格式报错了
Issue -
State: open - Opened by Lilypad97 about 1 month ago
#69 - TypeError: perform_nocolor_split(): incompatible function arguments.
Issue -
State: open - Opened by vedantgoswami 3 months ago
#68 - When training BERT, ERROR: "AttributeError: 'FullTokenizer' object has no attribute 'save_pretrained' "
Issue -
State: open - Opened by yuzhiguo07 3 months ago
#67 - No module named 'megatron.tokenizer.file_utils'
Issue -
State: closed - Opened by yuzhiguo07 4 months ago
#66 - 如何断点续训
Issue -
State: open - Opened by MAxx8371 5 months ago
#65 - No update for a long time
Issue -
State: open - Opened by dong-liuliu 5 months ago
#64 - Has llama2 GQA been supported yet?
Issue -
State: open - Opened by JiwenJ 6 months ago
#63 - 请问有dingding群聊,或者微信群吗?可以沟通的
Issue -
State: open - Opened by felix0080 6 months ago
#62 - Llama 3 Support
Issue -
State: open - Opened by john-theo 7 months ago
#61 - About batch_size
Issue -
State: open - Opened by tszslovewanpu 9 months ago
#60 - sh LLaMA2_7B_standalone.sh
Issue -
State: open - Opened by yangzhipeng1108 9 months ago
#59 - 请问是否支持从0训练一个小规模的LLaMA模型,如:1B
Issue -
State: open - Opened by liubo12 9 months ago
- 1 comment
#58 - 注意力权重转换问题
Issue -
State: open - Opened by noob-ctrl 9 months ago
- 2 comments
#57 - 在模型转换权重时遇到了如下问题 Zarr-based strategies will not be registered because of missing packages
Issue -
State: open - Opened by ZhangEnmao 10 months ago
#56 - 使用distributed optimzer时grad_norm计算准确度的疑问
Issue -
State: open - Opened by chivychao 11 months ago
- 1 comment
#55 - 出现 forward() missing 1 required positional argument: 'memory_efficient'
Issue -
State: closed - Opened by TongLi3701 11 months ago
#54 - add alibi position embedding and support baichuan
Pull Request -
State: open - Opened by qyccc 12 months ago
- 3 comments
#53 - LLaMAModel._causal_lm_process中的labels和logits对齐方法疑问
Issue -
State: open - Opened by chivychao 12 months ago
- 3 comments
#52 - Megatron-LM权重转hf格式
Issue -
State: open - Opened by Yang-QW 12 months ago
- 1 comment
#52 - Megatron-LM权重转hf格式
Issue -
State: open - Opened by Yang-QW 12 months ago
- 4 comments
#51 - Unable to import Megatron
Issue -
State: closed - Opened by fyf2016 about 1 year ago
- 8 comments
#50 - llama中decoder layer层里面的MLP问题
Issue -
State: closed - Opened by yuanzhoulvpi2017 about 1 year ago
- 4 comments
#49 - 问下readme中32机的吞吐对应的参数可以提供下吗,目前没有复现出来
Issue -
State: open - Opened by jianzi123 about 1 year ago
- 5 comments
#49 - 问下readme中32机的吞吐对应的参数可以提供下吗,目前没有复现出来
Issue -
State: open - Opened by jianzi123 about 1 year ago
- 5 comments
#48 - 求一份Serving的教程代码
Issue -
State: open - Opened by xealml about 1 year ago
- 1 comment
#47 - hf权重转换代码小bug
Issue -
State: open - Opened by yuanzhoulvpi2017 about 1 year ago
#47 - hf权重转换代码小bug
Issue -
State: open - Opened by yuanzhoulvpi2017 about 1 year ago
#46 - INT4 量化的模型可以被Megatron-LLaMA支持吗?
Issue -
State: open - Opened by Jeff123z about 1 year ago
- 1 comment
#45 - 请问目前Megatron-LLaMA支持LLaMA2-70B的训练吗?
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#44 - 是否兼容sequence parallel
Issue -
State: closed - Opened by jingjie01ai about 1 year ago
- 2 comments
#43 - CUDA_DEVICE_MAX_CONNECTIONS 设置问题
Issue -
State: closed - Opened by Richie-yan about 1 year ago
#43 - CUDA_DEVICE_MAX_CONNECTIONS 设置问题
Issue -
State: closed - Opened by Richie-yan about 1 year ago
#42 - 每次GA的backward都需要做通信
Issue -
State: closed - Opened by jingjie01ai about 1 year ago
- 5 comments
#41 - fp16的支持问题
Issue -
State: open - Opened by XUWeijiang about 1 year ago
- 1 comment
#41 - fp16的支持问题
Issue -
State: open - Opened by XUWeijiang about 1 year ago
- 1 comment
#40 - TypeError: OverlappedDistributedOptimizer.gather_parameters() got an unexpected keyword argument 'skip_if_not_stepped'
Issue -
State: closed - Opened by Double-bear about 1 year ago
- 4 comments
#39 - Supporting overlapping AG with forward computation
Pull Request -
State: closed - Opened by li-yi-dong about 1 year ago
#39 - Supporting overlapping AG with forward computation
Pull Request -
State: closed - Opened by li-yi-dong about 1 year ago
#38 - Adopt OverlappedDistributedOptimizer to PP
Pull Request -
State: closed - Opened by li-yi-dong about 1 year ago
- 1 comment
#37 - OverlappedDistributedOptimizer 支持 pipeline parallelism > 1 和 data parallelism > 1 同时使用吗?
Issue -
State: open - Opened by Baibaifan about 1 year ago
- 8 comments
#36 - 对于不同参数模型,如何通过配置参数信息计算显存占用大小?
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#36 - 对于不同参数模型,如何通过配置参数信息计算显存占用大小?
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#35 - 大家好,请教一个关于GLOBAL_BATCH_SIZE值计算的问题,希望大家不吝赐教。
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#34 - 训练LLaMA2-70B模型,4个节点,A800*80GB 设置8张量并行,1流水线并行,训练报错
Issue -
State: open - Opened by 13416157913 about 1 year ago
#33 - solve the RuntimeError: Tensors must be CUDA and dense
Pull Request -
State: open - Opened by 13416157913 about 1 year ago
- 2 comments
#33 - solve the RuntimeError: Tensors must be CUDA and dense
Pull Request -
State: open - Opened by 13416157913 about 1 year ago
- 2 comments
#32 - 多节点训练时使用nccl后端,在训练完后,保存检查点时报错
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#31 - Loss对齐
Issue -
State: open - Opened by wuziyou199217 about 1 year ago
- 3 comments
#30 - 训练llama-30b模型报错是不支持llama-30b模型么?
Issue -
State: open - Opened by 13416157913 about 1 year ago
#30 - 训练llama-30b模型报错是不支持llama-30b模型么?
Issue -
State: open - Opened by 13416157913 about 1 year ago
#29 - pipeline parallel fwd/bwd里面为什么没有调用optimizer.backward_epilogue()
Issue -
State: closed - Opened by jingjie01ai about 1 year ago
- 4 comments
#29 - pipeline parallel fwd/bwd里面为什么没有调用optimizer.backward_epilogue()
Issue -
State: closed - Opened by jingjie01ai about 1 year ago
- 4 comments
#28 - 运行时无法找到fused_kernels/build/scaled_upper_triang_masked_softmax_cuda.so
Issue -
State: closed - Opened by xikaluo about 1 year ago
- 5 comments
#28 - 运行时无法找到fused_kernels/build/scaled_upper_triang_masked_softmax_cuda.so
Issue -
State: closed - Opened by xikaluo about 1 year ago
- 5 comments
#27 - 4台A100*8测试,开启 overlapped-distributed-optimizer 的速度比开启 use-distributed-optimizer 慢很多
Issue -
State: closed - Opened by silingtong123 about 1 year ago
- 2 comments
#27 - 4台A100*8测试,开启 overlapped-distributed-optimizer 的速度比开启 use-distributed-optimizer 慢很多
Issue -
State: closed - Opened by silingtong123 about 1 year ago
- 2 comments
#26 - 请教下为什么使用overlapped_distributed_optimizer后,CUDA_DEVICE_MAX_CONNECTIONS就可以不为1了?
Issue -
State: closed - Opened by yinzhijian about 1 year ago
- 5 comments
#26 - 请教下为什么使用overlapped_distributed_optimizer后,CUDA_DEVICE_MAX_CONNECTIONS就可以不为1了?
Issue -
State: closed - Opened by yinzhijian about 1 year ago
- 5 comments
#25 - 请问ParameterSchedule实际上有作用吗?
Issue -
State: closed - Opened by yinzhijian about 1 year ago
- 1 comment
#24 - 请教一下,怎么感觉LLaMA2-7B模型单机A800*8*80G 用8张卡预训练TP4-PP1-DP2时间和TP1-PP1-DP8时间不合理
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#23 - NGC22.08 环境报错。
Issue -
State: closed - Opened by EthanChen1234 about 1 year ago
- 2 comments
#22 - 在A800*8卡的机器上,开启 overlapped-distributed-optimizer 的速度比开启 use-distributed-optimizer 的慢约8%
Issue -
State: closed - Opened by tingkuanpei about 1 year ago
- 6 comments
#21 - llama2-34b shape不匹配
Issue -
State: closed - Opened by cdj0311 about 1 year ago
- 4 comments
#20 - 训练完后,将保存的Megatron格式转成HF格式报错
Issue -
State: closed - Opened by 13416157913 about 1 year ago
- 7 comments
#19 - deepspeed+megatron+llama,请问作者有试过吗
Issue -
State: open - Opened by Chandler-Bing about 1 year ago
- 1 comment
#18 - add Megatron-LLaMA/examples/LLaMA/LLaMA2_7B_standalone.sh file
Pull Request -
State: closed - Opened by 13416157913 about 1 year ago
- 4 comments
#17 - nccl通信边界问题?
Issue -
State: open - Opened by Baibaifan about 1 year ago
- 10 comments
#17 - nccl通信边界问题?
Issue -
State: open - Opened by Baibaifan about 1 year ago
- 10 comments
#16 - 2节点训练13B LLaMA模型效率只能达到840 token/sec/GPU
Issue -
State: open - Opened by YaboSun about 1 year ago
- 13 comments
#15 - RuntimeError: CUDA error: device-side assert triggered
Issue -
State: closed - Opened by Double-bear about 1 year ago
- 2 comments
#14 - 单机训练跑不了,CUDA报错
Issue -
State: closed - Opened by XUWeijiang about 1 year ago
- 11 comments
#14 - 单机训练跑不了,CUDA报错
Issue -
State: closed - Opened by XUWeijiang about 1 year ago
- 12 comments
#13 - 请问支持Qwen模型的训练吗?
Issue -
State: open - Opened by sxthunder about 1 year ago
- 2 comments
#12 - DistributedOptimizer 梯度聚合,疑问
Issue -
State: closed - Opened by EthanChen1234 about 1 year ago
- 3 comments
#12 - DistributedOptimizer 梯度聚合,疑问
Issue -
State: closed - Opened by EthanChen1234 about 1 year ago
- 3 comments
#11 - llama2-7b模型用单机2张卡跑不动报内存不够,2个节点,8张卡(每个节点4卡)也跑不动,这个正常吗?(配置:A800*8张卡*80G)
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 5 comments
#11 - llama2-7b模型用单机2张卡跑不动报内存不够,2个节点,8张卡(每个节点4卡)也跑不动,这个正常吗?(配置:A800*8张卡*80G)
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 5 comments
#10 - hf转megatron shape错误
Issue -
State: open - Opened by Double-bear about 1 year ago
- 10 comments
#9 - 请问什么时候出一个傻瓜教程?比如跑通7B完整训练流程
Issue -
State: open - Opened by iMountTai about 1 year ago
- 11 comments
#9 - 请问什么时候出一个傻瓜教程?比如跑通7B完整训练流程
Issue -
State: open - Opened by iMountTai about 1 year ago
- 11 comments
#8 - llama2分布式训练脚本有没有不用容器部署方式的脚本?
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#8 - llama2分布式训练脚本有没有不用容器部署方式的脚本?
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 1 comment
#7 - 实验结果的网络带宽数据能说明一下吗
Issue -
State: closed - Opened by donghucey about 1 year ago
- 1 comment
#6 - Compability with Huggingface
Issue -
State: open - Opened by YuanLiuuuuuu about 1 year ago
- 1 comment
#6 - Compability with Huggingface
Issue -
State: open - Opened by YuanLiuuuuuu about 1 year ago
- 1 comment
#5 - RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead
Issue -
State: closed - Opened by 13416157913 about 1 year ago
- 2 comments
#4 - 执行LLaMA_13_standalone.sh脚本,没有训练过程很快就结束
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 2 comments
#4 - 执行LLaMA_13_standalone.sh脚本,没有训练过程很快就结束
Issue -
State: open - Opened by 13416157913 about 1 year ago
- 2 comments
#3 - 您好,请问一下训练数据集是用什么格式的,需要预先转成.bin和.idx那种格式吗?
Issue -
State: closed - Opened by 13416157913 about 1 year ago
- 1 comment
#3 - 您好,请问一下训练数据集是用什么格式的,需要预先转成.bin和.idx那种格式吗?
Issue -
State: closed - Opened by 13416157913 about 1 year ago
- 1 comment