TencentARC/LLaMA-Pro issues and pull requests

#33 - 这个方法可以扩展到vit类的视觉encode上吗？

Issue - State: open - Opened by lucasjinreal 22 days ago

#32 - 请教大佬可以训练qewn2-7b吗

Issue - State: open - Opened by jqtian123 28 days ago

#31 - 关于论文中通用能力榜单几乎没有下降，部分反而有提升

Issue - State: closed - Opened by bestpredicts 3 months ago

#30 - 关于运行流程

Issue - State: open - Opened by GOOD-N-LCM 3 months ago - 4 comments

#29 - 训练到10B tokens 时loss就收敛了无法下降

Issue - State: closed - Opened by bestpredicts 3 months ago - 1 comment

#28 - 关于零初始化和扩展层的位置

Issue - State: open - Opened by ouyanxi1125 3 months ago - 4 comments

#27 - finetune_cosmopedia.sh如何训练出来8B模型

Issue - State: open - Opened by RuipingWang1986 4 months ago

#26 - 利用finetune_cosmopedia.sh脚本进行继续预训练中的数据集如何构建

Issue - State: open - Opened by RuipingWang1986 4 months ago - 2 comments

#25 - Thanks for wonderful projects ! Why I always got the results of apparent loss of original ability?

Issue - State: open - Opened by hzgdeerHo 4 months ago - 8 comments

#24 - 请教下论文中的实验

Issue - State: closed - Opened by ChrisXULC 5 months ago - 1 comment

#23 - Training on arbitary data

Issue - State: open - Opened by HelloWorldLTY 5 months ago - 2 comments

#22 - Pretrain code of Mistral-Pro-8B-v0.1

Issue - State: open - Opened by shawnricecake 6 months ago - 1 comment

#21 - Do we need to freeze embedding layer and the lm_head as well during the Llama-pro style training ?

Issue - State: closed - Opened by shamanez 6 months ago - 2 comments

#20 - 请教下训练的显存需求

Issue - State: open - Opened by denghj3 6 months ago - 4 comments

#19 - Comparison with PEFT

Issue - State: open - Opened by LaVieEnRose365 6 months ago - 1 comment

#18 - 更大的模型需要更多的block吗？

Issue - State: open - Opened by PoseidomWong 6 months ago - 1 comment

#17 - add `pip install fire` to requirements.txt

Issue - State: open - Opened by r4dm 6 months ago - 1 comment

#16 - 新增的transformer层是与上一层共享参数吗？

Issue - State: closed - Opened by CharlinChen 7 months ago - 3 comments

#15 - llama factory的llama-pro是不是写得不对啊

Issue - State: closed - Opened by HuXinjing 7 months ago - 2 comments

#14 - 对比lora优势是什么

Issue - State: open - Opened by xiaozhu1106 7 months ago - 1 comment

#13 - 增量预训练的疑惑？

Issue - State: closed - Opened by zhuxiaobin 7 months ago - 6 comments

#12 - Issue with Model Saving After Layer Expansion: Removed Shared Tensors

Issue - State: closed - Opened by yumingfan-0219 7 months ago - 2 comments

#11 - guide to run the code

Issue - State: open - Opened by Abolfazl-kr 7 months ago - 2 comments

#10 - 您好，请教一下post pretrain的问题

Issue - State: closed - Opened by ray075hl 8 months ago - 8 comments

#9 - Question regarding the difference between llama-pro and the regular llama.（关于llama-pro和普通llama之间的区别的疑问）

Issue - State: open - Opened by WUHU-G 8 months ago - 8 comments

#8 - How to load the new model weight

Issue - State: open - Opened by khalil-Hennara 8 months ago - 1 comment

#7 - Should I freeze norm.weight?

Issue - State: open - Opened by metterian 8 months ago - 1 comment

#6 - full code to continue pre-training

Issue - State: open - Opened by Abolfazl-kr 8 months ago - 1 comment

#5 - Question about Llama-7B and Llama-7B-Pro comparison.

Issue - State: open - Opened by ryusaeba 8 months ago - 2 comments

#4 - Arxiv Data

Issue - State: open - Opened by ZhengTang1120 8 months ago - 2 comments

#3 - 我们如何针对扩展区块微调?

Issue - State: open - Opened by win10ogod 8 months ago - 5 comments

#2 - Code for training llama pro?

Issue - State: open - Opened by yhyu13 9 months ago - 8 comments

#1 - 论文Table7请教

Issue - State: closed - Opened by XiaoYee 9 months ago - 5 comments

Ecosyste.ms: Issues

GitHub / TencentARC/LLaMA-Pro issues and pull requests