mosaicml/llm-foundry issues and pull requests

#299 - Default to debug level debug

Pull Request - State: closed - Opened by samhavens over 1 year ago

#294 - Update README.md

Pull Request - State: closed - Opened by abhi-mosaic over 1 year ago - 2 comments

#292 - Upgrade to `mosaicml-streaming==0.5.x`

Pull Request - State: closed - Opened by abhi-mosaic over 1 year ago - 3 comments

#290 - Constant training loss observed when using mpt-7b_dolly_sft.yaml config

Issue - State: closed - Opened by suehyunpark over 1 year ago - 5 comments

#288 - Add shift_labels arg to HF wrappers

Pull Request - State: closed - Opened by dakinggg over 1 year ago - 1 comment

#285 - Small formatting fix in eval README

Pull Request - State: closed - Opened by sashaDoubov over 1 year ago

#281 - Can't produce same answer

Issue - State: closed - Opened by apachemycat over 1 year ago - 1 comment
Labels: question

#279 - Add 8-bit LION optimizer

Pull Request - State: closed - Opened by dblalock over 1 year ago - 4 comments

#278 - Adding custom embedding

Pull Request - State: closed - Opened by bcui19 over 1 year ago - 1 comment

#271 - adding te Linear for fp8 support

Pull Request - State: closed - Opened by vchiley over 1 year ago - 8 comments

#264 - Model loading on local machine

Issue - State: closed - Opened by Devangkaruskar over 1 year ago - 6 comments
Labels: question

#260 - updt tritonpremlir to sm90 version

Pull Request - State: closed - Opened by vchiley over 1 year ago - 4 comments

#259 - Multiple models inference on Single-GPU

Issue - State: closed - Opened by gitsand996 over 1 year ago - 1 comment
Labels: question

#258 - ERROR:composer.cli.launcher:Rank 2 crashed with exit code -7

Issue - State: closed - Opened by tb852 over 1 year ago - 3 comments

#248 - Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Issue - State: closed - Opened by tginart over 1 year ago - 6 comments

#247 - ERROR: expected to be in states [<TrainingState_.IDLE: 1>] but current state is TrainingState_.BACKWARD_PRE

Issue - State: closed - Opened by NarenZen over 1 year ago - 4 comments

#245 - Generated sample equals to input samples

Issue - State: closed - Opened by germanjke over 1 year ago - 4 comments

#243 - Multi-nodes slurm training

Issue - State: closed - Opened by j-Gaow over 1 year ago - 2 comments

#240 - Fix Typing (part 1)

Pull Request - State: closed - Opened by hanlint over 1 year ago

#234 - Refactor logging

Pull Request - State: closed - Opened by hanlint over 1 year ago - 2 comments

#224 - timeout error

Issue - State: closed - Opened by NarenZen over 1 year ago - 4 comments

#223 - Why does convert_dataset_json.py only support 'train' for the --split argument?"

Issue - State: closed - Opened by sysusicily over 1 year ago - 1 comment

#217 - "triton-pre-mlir" installation issue using the command pip install -e ".[gpu]"

Issue - State: closed - Opened by satyaskada over 1 year ago - 7 comments

#214 - G-Eval

Pull Request - State: closed - Opened by samhavens over 1 year ago - 1 comment

#212 - Torch 1.13.1 doesn't support sm_90

Issue - State: open - Opened by jwatte over 1 year ago

#211 - Question

Issue - State: open - Opened by ChiefBlacktail over 1 year ago

#210 - Kv cache speed

Pull Request - State: closed - Opened by vchiley over 1 year ago

#209 - Use $RUN_NAME rather than $COMPOSER_RUN_NAME

Pull Request - State: open - Opened by abhi-mosaic over 1 year ago

#208 - Refresh Mosaicml platform yamls

Pull Request - State: open - Opened by aspfohl over 1 year ago

#207 - Update README.md - Slack Link

Pull Request - State: open - Opened by ejyuen over 1 year ago

#206 - Removed unused `tokenizer_name` config field

Pull Request - State: closed - Opened by dakinggg over 1 year ago

#205 - Onboarding tutorial and related improvements

Pull Request - State: open - Opened by alextrott16 over 1 year ago

#204 - Update inference README

Pull Request - State: closed - Opened by abhi-mosaic over 1 year ago

#203 - Error:"Watchdog caught collective operation timeout" when finetuning MPT-7B on a local dataset using 2 A100 GPUs

Issue - State: closed - Opened by satyaskada over 1 year ago - 15 comments

#202 - S3 ckpt saving

Issue - State: open - Opened by germanjke over 1 year ago - 1 comment

#201 - ONNX conversion is too memory expensive

Issue - State: open - Opened by makermotion over 1 year ago

#200 - Providing an input context.

Issue - State: closed - Opened by thusithaC over 1 year ago - 2 comments

#199 - eos tokens

Issue - State: open - Opened by tginart over 1 year ago

#198 - Update README.md

Pull Request - State: closed - Opened by jacobfulano over 1 year ago

#197 - test ci

Pull Request - State: closed - Opened by vchiley over 1 year ago

#196 - Kernel Crashes when trying to load model to CUDA

Issue - State: open - Opened by souvik0306 over 1 year ago - 2 comments

#195 - Inferencing with multigpu

Issue - State: open - Opened by singhalshikha518 over 1 year ago

#194 - learning rate for pre-training

Issue - State: closed - Opened by sysusicily over 1 year ago - 1 comment

#193 - torch2 updt with hf fixes

Pull Request - State: closed - Opened by vchiley over 1 year ago - 1 comment

#192 - Tensor Parallel MLP with torch2.0

Pull Request - State: closed - Opened by dskhudia over 1 year ago - 7 comments

#191 - How to adapt to different context size?

Issue - State: closed - Opened by jwatte over 1 year ago - 1 comment

#190 - Triton not working on A40 and A6000 machines

Issue - State: open - Opened by NarenZen over 1 year ago - 2 comments

#189 - ERROR: Could not build wheels for flash-attn, xentropy-cuda-lib, which is required to install pyproject.toml-based projects on A6000 machine

Issue - State: open - Opened by NarenZen over 1 year ago - 2 comments

#188 - GPTQ support for quantization

Issue - State: open - Opened by casperbh96 over 1 year ago - 2 comments

#187 - downloading datasets

Issue - State: open - Opened by germanjke over 1 year ago - 1 comment

#186 - ALiBi with `causal=True` unexpected bias?

Issue - State: open - Opened by KeremTurgutlu over 1 year ago - 7 comments

#185 - unexpected results in inference

Issue - State: open - Opened by OmarMohammed88 over 1 year ago - 4 comments

#184 - WandB integration?

Issue - State: closed - Opened by germanjke over 1 year ago - 1 comment

#183 - GPU OOM while fine-tuning MPT-7B

Issue - State: closed - Opened by karthikmurugadoss over 1 year ago - 5 comments

#182 - Add community links to README

Pull Request - State: closed - Opened by hanlint over 1 year ago

#181 - Revert "Torch2 (#177) (#178)"

Pull Request - State: closed - Opened by dakinggg over 1 year ago - 2 comments

#180 - Flash Attention vs Triton Flash Attention

Issue - State: open - Opened by germanjke over 1 year ago - 6 comments

#179 - MPT-7B strange inference speed

Issue - State: closed - Opened by SinanAkkoyun over 1 year ago - 1 comment

#178 - Torch2 (#177)

Pull Request - State: closed - Opened by vchiley over 1 year ago - 1 comment

#177 - Torch2

Pull Request - State: closed - Opened by vchiley over 1 year ago

#176 - Error while loading converted hf model(from composer checkpoint)

Issue - State: closed - Opened by singhalshikha518 over 1 year ago - 4 comments

#175 - Rename datasets to avoid hf conflict

Pull Request - State: closed - Opened by hanlint over 1 year ago

#174 - TypeError: init() got an unexpected keyword argument 'approximate' when using mosaicml/mpt-7b-instruct model

Issue - State: closed - Opened by souvik0306 over 1 year ago - 10 comments

#173 - Can I use mpt-7b_dolly_sft.yaml used to train MPT-Instruct model

Issue - State: closed - Opened by NarenZen over 1 year ago - 1 comment

#172 - Dynamic range of ALiBi

Issue - State: closed - Opened by tginart over 1 year ago - 1 comment

#171 - where can I find training code or configs about MPT-7B-StoryWriter

Issue - State: open - Opened by metacryptom over 1 year ago

#170 - FileNotFoundError: [Errno 2] No such file or directory: '/000001_barrier'

Issue - State: closed - Opened by julianfaraone over 1 year ago - 5 comments

#169 - Convert MPT checkpoints to FT format

Pull Request - State: closed - Opened by dskhudia over 1 year ago - 2 comments

#168 - clean up dataset conversion readme

Pull Request - State: closed - Opened by codestar12 over 1 year ago

#167 - Remove health checker

Pull Request - State: closed - Opened by mvpatel2000 over 1 year ago

#166 - Add Tensorboard logger to yaml config

Pull Request - State: closed - Opened by hanlint over 1 year ago - 2 comments

#165 - Remove `pynvml`

Pull Request - State: closed - Opened by hanlint over 1 year ago - 1 comment

#164 - Explain `composer` command

Pull Request - State: closed - Opened by hanlint over 1 year ago

#163 - Error while saving checkpoint

Issue - State: closed - Opened by singhalshikha518 over 1 year ago - 14 comments

#162 - ValueError: --out_root=./my-copy-c4 contains ['train_small'] which cannot overlap with the requested splits ['train_small', 'val_small'].

Issue - State: closed - Opened by NishaDeepak over 1 year ago - 3 comments

#161 - ValueError("Please specify `target_modules` in `peft_config`")

Issue - State: closed - Opened by NarenZen over 1 year ago - 1 comment

#160 - Finetune MPT with transformers

Issue - State: closed - Opened by yangjianxin1 over 1 year ago - 2 comments

#159 - an error while training

Issue - State: open - Opened by ChrisXULC over 1 year ago - 3 comments

#158 - support for tensorboard

Issue - State: closed - Opened by sysusicily over 1 year ago - 3 comments

#157 - Update StreamingDataset defaults

Pull Request - State: closed - Opened by abhi-mosaic over 1 year ago

#156 - Adds a concrete finetuning example from a custom dataset

Pull Request - State: closed - Opened by alextrott16 over 1 year ago - 1 comment

#155 - Not getting a proper response

Issue - State: closed - Opened by gauravkaliadev over 1 year ago - 4 comments

#154 - Use the nvidia-supplied nvidia-ml-py instead of pynvml

Issue - State: closed - Opened by mattip over 1 year ago - 3 comments

#153 - Slow on V100

Issue - State: open - Opened by Louis-y-nlp over 1 year ago - 2 comments

#152 - Docker Image with CUDA 12.1 for ADA Gen cards

Issue - State: open - Opened by danzeeeman over 1 year ago - 2 comments

#151 - Add cloud upload to checkpoint conversion script

Pull Request - State: closed - Opened by dakinggg over 1 year ago

#149 - Enable Torch2

Pull Request - State: closed - Opened by vchiley over 1 year ago - 12 comments

#148 - Adds precision to eval

Pull Request - State: closed - Opened by mvpatel2000 over 1 year ago

#147 - make triton attn req pre-mlri tagged triton

Pull Request - State: closed - Opened by vchiley over 1 year ago - 2 comments

#146 - error 'Getting requirements to build wheel'... is the docker image okay??

Issue - State: closed - Opened by jewbot over 1 year ago - 7 comments

#144 - Explicit composer mention

Issue - State: closed - Opened by StrangeTcy over 1 year ago - 2 comments

#143 - fine tuning mpt7b using local dataset

Issue - State: open - Opened by singhalshikha518 over 1 year ago - 10 comments

#142 - the select for multi-GPU card

Issue - State: closed - Opened by sysusicily over 1 year ago - 1 comment

#141 - the error of streaming

Issue - State: open - Opened by sysusicily over 1 year ago - 3 comments

#140 - Why Flash Attention do not support attn bias [Alibi]?

Issue - State: closed - Opened by srn-source over 1 year ago - 2 comments

#139 - KeyError: 'attn_pdrop' with t5-small_dolly_sft.yaml when running inference/convert_composer_to_hf.py

Issue - State: open - Opened by Paladiamors over 1 year ago - 1 comment

#137 - Circular import error when using data/packing.py

Issue - State: closed - Opened by Paladiamors over 1 year ago - 2 comments

#132 - How to load a dataset with multiple rounds of conversation like sharegpt

Issue - State: closed - Opened by 0xDing over 1 year ago - 1 comment

#129 - code example for the onnx model

Issue - State: closed - Opened by therealadityashankar over 1 year ago - 4 comments

#128 - Unable to use triton? How to handle context windows >4k?

Issue - State: closed - Opened by tginart over 1 year ago - 2 comments

GitHub / mosaicml/llm-foundry issues and pull requests