EleutherAI/gpt-neox issues and pull requests

#1323 - Error when converting sequential model to HF

Issue - State: open - Opened by SilverSulfide 6 days ago
Labels: bug

#1322 - Runtime per step linearly increases with training step number.

Issue - State: open - Opened by iPRET 13 days ago - 1 comment
Labels: bug

#1321 - Can `preprocess_data.py` support Huggingface Dataset?

Issue - State: open - Opened by cafeii 14 days ago - 1 comment
Labels: feature request

#1320 - _forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false

Issue - State: open - Opened by markNZed 14 days ago - 2 comments
Labels: bug

#1319 - LLama mlp project layers missmatch with HF config during conversion

Issue - State: closed - Opened by Vmjkom 20 days ago - 2 comments
Labels: bug

#1318 - Fix documentation for converting SFT/DPO weights back to HF Llama

Pull Request - State: closed - Opened by jacobthebanana 23 days ago

#1317 - KeyError when converting DPO weights from GPTNeoX format to HuggingFace Llama in post-training documentations

Issue - State: closed - Opened by jacobthebanana 23 days ago

#1316 - Update text_generation_utils.py to work with pipe_parallel_size of 0

Pull Request - State: open - Opened by markNZed 27 days ago

#1315 - fix a GQA issue (#1314)

Pull Request - State: closed - Opened by tiandeyu-cs 29 days ago

#1314 - Training crashes when "(hidden_size * num_kv_heads) / (num_attention_heads * num_attention_heads)" is not an integer.

Issue - State: closed - Opened by tiandeyu-cs 29 days ago
Labels: bug

#1313 - Python 3.10 support

Pull Request - State: closed - Opened by markNZed 29 days ago - 1 comment

#1312 - Add support for dropout in sparse attention

Pull Request - State: closed - Opened by michaelc-yu about 1 month ago

#1311 - Add default bf16 precision setting when bf16 config option is set but precision is unset.

Pull Request - State: closed - Opened by AI-WAIFU about 1 month ago

#1310 - [Question] Running gpt-neox on AMD-based LUMI HPC centre.

Issue - State: closed - Opened by iPRET about 1 month ago - 1 comment
Labels: bug

#1309 - fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed

Pull Request - State: closed - Opened by tiandeyu-cs about 1 month ago - 1 comment

#1308 - Add ERROR logging prefix and sort the prefixes alphabetically

Pull Request - State: closed - Opened by TheBatmanofButler about 1 month ago - 2 comments

#1308 - Add ERROR logging prefix and sort the prefixes alphabetically

Pull Request - State: closed - Opened by TheBatmanofButler about 1 month ago - 2 comments

#1307 - DeeperSpeed cannot support BFloat16 and PipelineParallelism

Issue - State: open - Opened by jahatef about 1 month ago - 1 comment
Labels: bug

#1306 - Latest DeepSpeed not supported

Issue - State: open - Opened by jahatef about 1 month ago
Labels: bug

#1305 - Error with rotary embeddings and BFloat16

Issue - State: closed - Opened by jahatef about 1 month ago - 1 comment
Labels: bug

#1304 - CUDA/Pytorch multiprocessing workaround and test fixes

Pull Request - State: open - Opened by AI-WAIFU about 1 month ago

#1303 - pytest-forked alternative to get around CUDA/pytorch multiprocessing limitation

Pull Request - State: open - Opened by AI-WAIFU about 1 month ago

#1302 - adds pyproject files and tests

Pull Request - State: closed - Opened by LouisCastricato about 2 months ago

#1301 - Fix failling tests

Pull Request - State: closed - Opened by AI-WAIFU about 2 months ago

#1300 - Add additional asserts and update post training readme

Pull Request - State: closed - Opened by AI-WAIFU about 2 months ago

#1299 - Add support for context parallelism

Pull Request - State: open - Opened by bclyang about 2 months ago - 1 comment

#1298 - Improve Profiling Docs

Pull Request - State: closed - Opened by Quentin-Anthony about 2 months ago

#1297 - TE integration via full TransformerLayer

Pull Request - State: open - Opened by tf-nv about 2 months ago

#1296 - hotfix for tp >= 2 and pp > 2 in autoitercount

Pull Request - State: closed - Opened by AI-WAIFU about 2 months ago

#1295 - readded RM training removed during merge conflict in KTO

Pull Request - State: closed - Opened by dmahan93 2 months ago

#1294 - Add KTO Post-training example

Pull Request - State: closed - Opened by dmahan93 2 months ago

#1293 - update args docs

Pull Request - State: closed - Opened by Quentin-Anthony 2 months ago

#1292 - update neox arg docs

Pull Request - State: closed - Opened by Quentin-Anthony 2 months ago - 1 comment

#1291 - mamba flop calculations

Pull Request - State: closed - Opened by jahatef 2 months ago

#1290 - Fix dataset bug

Pull Request - State: closed - Opened by Quentin-Anthony 2 months ago

#1288 - Reinforce PR

Pull Request - State: open - Opened by dmahan93 2 months ago - 1 comment

#1287 - Remove the remaining two hanging wandb config fields

Pull Request - State: closed - Opened by Quentin-Anthony 2 months ago

#1286 - Make monitors consistent

Pull Request - State: closed - Opened by Quentin-Anthony 2 months ago

#1285 - Fix off by 1 error on masked tokens for RM training

Pull Request - State: closed - Opened by dmahan93 2 months ago

#1284 - Update Comet integration instructions

Pull Request - State: closed - Opened by Lothiraldan 2 months ago

#1283 - Automatically compute train_iters when train_epochs is specified.

Pull Request - State: closed - Opened by AI-WAIFU 2 months ago - 1 comment

#1282 - TransformerEngine Integration

Pull Request - State: open - Opened by aurelion-source 2 months ago - 3 comments

#1281 - Add model parallel group to reduce scatter

Pull Request - State: closed - Opened by bclyang 2 months ago

#1280 - Do not fail when git is not installed

Pull Request - State: closed - Opened by gcaillaut 3 months ago - 1 comment

#1279 - fix the imports needed for comet integration

Pull Request - State: closed - Opened by Quentin-Anthony 3 months ago

#1278 - fix gpt-j residual bias assumption

Pull Request - State: closed - Opened by dmahan93 3 months ago

#1277 - Post training examples

Pull Request - State: closed - Opened by dmahan93 3 months ago - 3 comments

#1276 - Hotfix llama models

Pull Request - State: closed - Opened by dmahan93 3 months ago - 1 comment

#1275 - Add more informative checks for ZeRO incompatibility.

Pull Request - State: closed - Opened by AI-WAIFU 3 months ago

#1274 - Fix weight decay module check

Pull Request - State: closed - Opened by aurelion-source 3 months ago

#1273 - Expand Docstring

Pull Request - State: closed - Opened by AI-WAIFU 3 months ago

#1272 - TE Import Hotfix

Pull Request - State: closed - Opened by Quentin-Anthony 3 months ago - 1 comment

#1271 - Hotfix Activation Typo

Pull Request - State: closed - Opened by Quentin-Anthony 3 months ago

#1270 - Formatting and Fix Mamba Config

Pull Request - State: closed - Opened by Quentin-Anthony 3 months ago

#1269 - LayerNorm Refactor

Pull Request - State: closed - Opened by aurelion-source 3 months ago - 3 comments

#1268 - Allow training without knowing num_iters

Issue - State: closed - Opened by StellaAthena 3 months ago - 1 comment
Labels: feature request

#1267 - Add assert to check for missing tokenizer_type in config. [#1231]

Pull Request - State: closed - Opened by AI-WAIFU 3 months ago - 1 comment

#1266 - Add initial ring flash attention support

Pull Request - State: open - Opened by dmahan93 3 months ago - 1 comment

#1265 - add Apex fused RMS norm

Pull Request - State: closed - Opened by dmahan93 3 months ago - 1 comment

#1264 - Frontier

Pull Request - State: closed - Opened by jahatef 3 months ago - 1 comment

#1263 - Improve performance of sequence parallel gather, scatter, and reduce

Pull Request - State: closed - Opened by bclyang 3 months ago

#1262 - mamba fixes and cleaning

Pull Request - State: closed - Opened by jahatef 3 months ago - 2 comments

#1261 - Comet integration

Pull Request - State: closed - Opened by jverre 3 months ago - 2 comments

#1260 - Fix gather and reduce scatter ops on sequence dimension

Pull Request - State: closed - Opened by bclyang 3 months ago

#1259 - Fix LayerNorm all reduce gradient hook

Pull Request - State: closed - Opened by bclyang 4 months ago - 1 comment

#1258 - bugfix: chat turns instead of repeating the conversation in preprocess_data_with_chat_template.py

Pull Request - State: closed - Opened by dmahan93 4 months ago

#1257 - Megatron-LM style Sequence Parallel

Pull Request - State: closed - Opened by haileyschoelkopf 4 months ago - 3 comments

#1256 - GitHub actions fix

Pull Request - State: closed - Opened by jahatef 4 months ago

#1255 - Add new cites

Pull Request - State: closed - Opened by StellaAthena 4 months ago - 1 comment

#1254 - How to Load Model from pytorch_model.bin into Trained Model for Text Generation?

Issue - State: open - Opened by lieh1203 4 months ago
Labels: feature request

#1253 - what's the biggest dataset you've tried?

Issue - State: open - Opened by exnx 4 months ago
Labels: bug

#1252 - too many .bin files for dataloader, crashed

Issue - State: closed - Opened by exnx 5 months ago
Labels: bug

#1251 - Assertion Error when Setting pipe_parallel_size or model_parallel_size in GPT-NeoX

Issue - State: open - Opened by lieh1203 5 months ago - 3 comments
Labels: bug

#1250 - For nucleus sampling, top-p sampling appears to happen on the softmax-normalized top-k logits

Issue - State: closed - Opened by j-frei 5 months ago - 3 comments
Labels: bug

#1248 - batch_input and elapsed time per iteration suddenly slow down during model training

Issue - State: open - Opened by Yuhanleeee 5 months ago - 4 comments
Labels: bug

#1247 - Add hf llama to neox conversion

Pull Request - State: closed - Opened by dmahan93 5 months ago - 1 comment

#1246 - Add Reward Model training

Pull Request - State: closed - Opened by dmahan93 5 months ago

#1245 - Conversion for CI from self-hosted hardware

Pull Request - State: closed - Opened by jaimemcc-intel 5 months ago

#1244 - Add KTO training

Pull Request - State: closed - Opened by dmahan93 5 months ago

#1243 - Replace unsafe `pyyaml` loader with `SafeLoader` (#2)

Pull Request - State: closed - Opened by pixeeai 5 months ago - 1 comment

#1242 - Add DPO training

Pull Request - State: closed - Opened by dmahan93 5 months ago - 1 comment

#1241 - Fix paper reference in init_functions.py

Pull Request - State: closed - Opened by rasbt 5 months ago - 2 comments

#1240 - SFT improvements (labeling fixes, different packing implementations)

Pull Request - State: closed - Opened by dmahan93 5 months ago

#1239 - Add a chat data preprocessing script

Pull Request - State: closed - Opened by dmahan93 5 months ago

#1238 - Pr1212

Pull Request - State: closed - Opened by jahatef 5 months ago

#1237 - Add tensor parallelism for RWKV

Pull Request - State: open - Opened by jahatef 5 months ago

#1236 - Ville dev

Pull Request - State: closed - Opened by Vmjkom 5 months ago - 1 comment

#1235 - Add Transformer Engine's version of RMSNorm and LayerNorm

Pull Request - State: closed - Opened by lintangsutawika 6 months ago - 2 comments

#1234 - fix python version and pytest install

Pull Request - State: closed - Opened by jahatef 6 months ago - 5 comments

#1233 - add workflow_dispatch to gh actions pr so we can run on command

Pull Request - State: closed - Opened by jahatef 6 months ago

#1232 - init changes to README

Pull Request - State: closed - Opened by jaimemcc-intel 6 months ago

#1231 - Cannot convert neox model to HF

Issue - State: open - Opened by srivassid 6 months ago - 2 comments
Labels: bug

#1230 - How to set the ffn hidden size parameter in gpt neox

Issue - State: closed - Opened by IronMan-WangJinxi 6 months ago - 2 comments
Labels: feature request

#1228 - Cannot perform inference, be it unconditional. input-file or interactive

Issue - State: closed - Opened by srivassid 6 months ago - 2 comments
Labels: bug

#1227 - The results of running eval show only 1 digit after decimal point for acc on all tested tasks

Issue - State: closed - Opened by lernerjenny 6 months ago - 2 comments
Labels: bug

#1226 - Add Torch Profiler Support

Pull Request - State: closed - Opened by DayOfThePenguin 6 months ago

#1225 - Add lora support

Pull Request - State: open - Opened by mkerin 6 months ago

#1224 - fixed fused_rope naming in JIT + Readme

Pull Request - State: closed - Opened by R0n12 6 months ago

#1223 - Change python invocation syntax

Pull Request - State: closed - Opened by jaimemcc-intel 6 months ago

#1222 - Small tidying

Pull Request - State: closed - Opened by yang 6 months ago

GitHub / EleutherAI/gpt-neox issues and pull requests