DLLXW/baby-llama2-chinese issues and pull requests

#84 - 关于分词器处理后的预训练语料是通过哪个代码生成的

Issue - State: open - Opened by livevivaer 2 months ago

#84 - 关于分词器处理后的预训练语料是通过哪个代码生成的

Issue - State: open - Opened by livevivaer 2 months ago

#83 - 想问一下为什么做数据清洗时保存数据为Parquet格式，后面做分词时候还是用的json

Issue - State: open - Opened by yangwenche 3 months ago

#83 - 想问一下为什么做数据清洗时保存数据为Parquet格式，后面做分词时候还是用的json

Issue - State: open - Opened by yangwenche 3 months ago

#82 - ChatGLMTokenizer类

Issue - State: open - Opened by licx102359 3 months ago - 2 comments

#82 - ChatGLMTokenizer类

Issue - State: open - Opened by licx102359 3 months ago - 2 comments

#81 - 预训练输入最后的切片不会导致模型的输入少一个长度吗？

Issue - State: open - Opened by AI-Study-Han 4 months ago

#81 - 预训练输入最后的切片不会导致模型的输入少一个长度吗？

Issue - State: open - Opened by AI-Study-Han 4 months ago

#80 - 模型的回答较长，输出结果不完整要怎么解决

Issue - State: open - Opened by MSJeinlong 4 months ago

#80 - 模型的回答较长，输出结果不完整要怎么解决

Issue - State: open - Opened by MSJeinlong 4 months ago

#79 - smallvocab tokenizer

Issue - State: open - Opened by iangellove 5 months ago

#79 - smallvocab tokenizer

Issue - State: open - Opened by iangellove 5 months ago

#78 - 请问语言模型的强化学习有可以参考的开源项目吗？

Issue - State: open - Opened by AI-Study-Han 5 months ago - 1 comment

#78 - 请问语言模型的强化学习有可以参考的开源项目吗？

Issue - State: open - Opened by AI-Study-Han 5 months ago - 1 comment

#77 - 请问大数据量怎么加载呢？

Issue - State: open - Opened by CaesarGo 6 months ago

#77 - 请问大数据量怎么加载呢？

Issue - State: open - Opened by CaesarGo 6 months ago

#76 - 请问哪步加的 Positional embeddings

Issue - State: closed - Opened by buhe 6 months ago - 1 comment

#76 - 请问哪步加的 Positional embeddings

Issue - State: closed - Opened by buhe 6 months ago - 1 comment

#75 - chatglm_tokenizer 模块是在哪个软件包中？

Issue - State: open - Opened by PANASV 6 months ago - 2 comments

#75 - chatglm_tokenizer 模块是在哪个软件包中？

Issue - State: open - Opened by PANASV 6 months ago - 2 comments

#74 - 预训练阶段，每条训练样本混杂着不同的句子（不同句子用<eos>隔开）

Issue - State: open - Opened by Itochiee 6 months ago

#74 - 预训练阶段，每条训练样本混杂着不同的句子（不同句子用<eos>隔开）

Issue - State: open - Opened by Itochiee 6 months ago

#73 - 请问在处理微调数据集时为何要限制文本长度?

Issue - State: open - Opened by jzzzf 6 months ago - 1 comment

#73 - 请问在处理微调数据集时为何要限制文本长度?

Issue - State: open - Opened by jzzzf 6 months ago - 1 comment

#72 - 作者，这个项目支持断点续训嘛

Issue - State: open - Opened by 1737686924 7 months ago - 2 comments

#72 - 作者，这个项目支持断点续训嘛

Issue - State: open - Opened by 1737686924 7 months ago - 2 comments

#71 - 请问支持tensorrt llm部署吗

Issue - State: open - Opened by Ss-shuang123 7 months ago

#71 - 请问支持tensorrt llm部署吗

Issue - State: open - Opened by Ss-shuang123 7 months ago

#70 - 交个作业吧

Issue - State: closed - Opened by yasohasakii 7 months ago

#70 - 交个作业吧

Issue - State: closed - Opened by yasohasakii 7 months ago

#69 - proces single file in foreach,avoid oom

Pull Request - State: open - Opened by maoxiangyi 7 months ago

#69 - proces single file in foreach,avoid oom

Pull Request - State: open - Opened by maoxiangyi 7 months ago

#68 - 预训练模型参数和eval参数维度不匹配的问题

Issue - State: open - Opened by 1019245175 7 months ago

#68 - 预训练模型参数和eval参数维度不匹配的问题

Issue - State: open - Opened by 1019245175 7 months ago

#67 - c4-zh数据有问题

Issue - State: closed - Opened by yasohasakii 7 months ago - 3 comments

#67 - c4-zh数据有问题

Issue - State: closed - Opened by yasohasakii 7 months ago - 3 comments

#66 - 关于运行一段时间，机器断电，如何继续训练

Issue - State: open - Opened by GromZhang 8 months ago - 2 comments

#66 - 关于运行一段时间，机器断电，如何继续训练

Issue - State: open - Opened by GromZhang 8 months ago - 2 comments

#65 - fix: Fix attribute error and reduce memory usage during data processing

Pull Request - State: open - Opened by noahc1510 8 months ago

#65 - fix: Fix attribute error and reduce memory usage during data processing

Pull Request - State: open - Opened by noahc1510 8 months ago

#64 - 请问单卡16G显存的4060Ti能训练吗？

Issue - State: closed - Opened by XiaoluJiayou 8 months ago - 1 comment

#64 - 请问单卡16G显存的4060Ti能训练吗？

Issue - State: closed - Opened by XiaoluJiayou 8 months ago - 1 comment

#63 - Problem with tokenizer?

Issue - State: open - Opened by shokhjakhonone 8 months ago - 3 comments

#62 - 请问下这个报错是哪里配置的不对吗？

Issue - State: open - Opened by beginner-wj 8 months ago

#61 - 请问下这个报错是什么信息？

Issue - State: closed - Opened by beginner-wj 8 months ago

#60 - 为了丰富和扩充本项目，这里开源了使用deepspeed进行训练的代码和权重（1.75B）

Issue - State: closed - Opened by AI-Study-Han 9 months ago

#59 - Ignore the `freqs_cis` buffer so that DDP does not broadcast it at construction time

Issue - State: open - Opened by xiaoguzai 9 months ago

#58 - 跑训练报错

Issue - State: closed - Opened by singeleaf 10 months ago - 3 comments

#57 - 自己用

Pull Request - State: closed - Opened by life-peace 10 months ago

#56 - fix: multi gpu ddp save error

Pull Request - State: closed - Opened by billvsme 10 months ago

#55 - 配置优化器的部分为什么，大于或等于2D的参数会被衰减，小于2D不会衰减？

Issue - State: closed - Opened by zerozhoujie 10 months ago - 2 comments

#54 - /track1/train_valid.json

Issue - State: closed - Opened by cj401 10 months ago - 1 comment

#53 - 如何修改，支持4k上下文，以及16k上下文呢？

Issue - State: closed - Opened by 937739823 10 months ago - 1 comment

#52 - 交个作业

Issue - State: closed - Opened by ljg-lixufeng 10 months ago

#51 - 提示：在训练中加入complie = True后再sft中也需要同步，不然会造成模型载入错误

Issue - State: closed - Opened by Hong-Shuo 11 months ago

#50 - Attention!! 推理代码里面的致命笔误是导致大家看到效果不好的原因。望周知！

Issue - State: closed - Opened by DLLXW 11 months ago

#49 - 多个节点多卡的pretrain

Issue - State: closed - Opened by lixin716 11 months ago - 2 comments

#48 - 修改了词表大小后与预训练模型的维度不匹配，大家怎么处理的呀

Issue - State: open - Opened by ghost 11 months ago

#47 - 模型效果

Issue - State: closed - Opened by AI-Study-Han 11 months ago - 3 comments

#46 - 没有找到此文件

Issue - State: closed - Opened by servlet1111 12 months ago

#45 - transformers最新版本会报错

Issue - State: open - Opened by somewordstoolate 12 months ago - 2 comments

#44 - 模型参数量计算

Issue - State: open - Opened by zxx20231119 12 months ago - 2 comments

#43 - 前期数据处理差异

Issue - State: closed - Opened by wujianqiangwjq about 1 year ago - 1 comment

#42 - 一个很诡异的错误 IndexError: index 35930 is out of bounds for axis 1 with size 2048

Issue - State: closed - Opened by zhaodice about 1 year ago - 1 comment

#41 - 第一轮练完了正在跑第二轮。不能能加个脚本转格式能让obabooga使用？

Issue - State: open - Opened by limao999666 about 1 year ago

#40 - 为什么预训练时，做attention的时候不需要mask

Issue - State: closed - Opened by LLH1818 about 1 year ago

#39 - 想问下训练的数据和epoch数

Issue - State: open - Opened by YuzhouPeng about 1 year ago - 4 comments

#38 - eos token是空字符串

Issue - State: closed - Opened by Destiny-Lu about 1 year ago - 4 comments

#37 - 进行多卡pretrain的时候，出现了如下异常

Issue - State: open - Opened by GromZhang about 1 year ago - 7 comments

#36 - 请问是从头开始预训练的，为什么在项目中体现到了llama2，初学者不太理解

Issue - State: closed - Opened by GromZhang about 1 year ago - 1 comment

#35 - 总结下几个问题

Issue - State: open - Opened by Vincent-ZHQ about 1 year ago - 3 comments

#34 - 为什么在pretrain 309行model.complie要加prefix '_orig_mod'?

Issue - State: closed - Opened by ToxicNeil about 1 year ago - 1 comment

#33 - RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Issue - State: open - Opened by sunhao about 1 year ago - 1 comment

#32 - Windows support modifications in pretrain script

Pull Request - State: closed - Opened by jh01231230 about 1 year ago

#31 - Windows support modifications in pretrain script

Pull Request - State: closed - Opened by jh01231230 about 1 year ago - 1 comment

#30 - 没有SFT的话推理会抱错，麻烦看看

Issue - State: open - Opened by hopeforus about 1 year ago - 2 comments

#29 - '../track1/train_valid.json。这个文件在哪里下载？

Issue - State: open - Opened by hopeforus about 1 year ago - 2 comments

#28 - sft dataset

Issue - State: open - Opened by paopao0226 about 1 year ago - 2 comments

#27 - Data process modifications

Pull Request - State: closed - Opened by jh01231230 about 1 year ago

#26 - 要训练几个epoch，会有比较好的效果？

Issue - State: closed - Opened by binwang672012 about 1 year ago - 4 comments

#25 - 处理百度数据集的时间报错

Issue - State: open - Opened by hopeforus about 1 year ago - 6 comments

#24 - 交个作业

Issue - State: open - Opened by AClolinta about 1 year ago - 13 comments

#23 - 数据集问题

Issue - State: open - Opened by zhihui-shao about 1 year ago - 3 comments

#22 - sft.py运行报错 CUDA out of memory，请问咋解决？

Issue - State: closed - Opened by qxj about 1 year ago - 6 comments

#21 - 您好，请问显存为24G 3090预训练这个参数量大小的模型大概需要多久呀？

Issue - State: open - Opened by LePanda026 about 1 year ago - 3 comments

#20 - 可以提供一个训练好的模型吗？

Issue - State: open - Opened by PeterouZh about 1 year ago - 5 comments

#19 - fix: remove redundant pkg

Pull Request - State: closed - Opened by jianhu-chen about 1 year ago

GitHub / DLLXW/baby-llama2-chinese issues and pull requests