Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / stas00/ml-engineering issues and pull requests
#66 - [Question] `FSDP` vs `Deepspeed ZeRO3 / ZeRO++`
Issue -
State: closed - Opened by jeromeku 20 days ago
- 2 comments
#65 - Change "Gbps" to "GBps" to fix a tiny typo in network.README.md
Pull Request -
State: closed - Opened by Txxx926 24 days ago
#64 - grad checkpoint tiny error
Pull Request -
State: closed - Opened by baochi0212 24 days ago
- 4 comments
#63 - fix tiny error
Pull Request -
State: closed - Opened by baochi0212 24 days ago
#62 - slurm job array change nodes
Pull Request -
State: closed - Opened by ethanhe42 26 days ago
- 1 comment
#61 - slurm job array change nodes
Issue -
State: closed - Opened by ethanhe42 about 1 month ago
- 1 comment
#60 - Update GH200 MAMF
Pull Request -
State: closed - Opened by yaolu about 1 month ago
- 1 comment
#59 - Max Achievable TFLOP/s on H100 without warmup
Pull Request -
State: closed - Opened by OrenLeung about 1 month ago
- 1 comment
#58 - MAMF - GH200
Issue -
State: closed - Opened by frankschae about 1 month ago
- 10 comments
#56 - MAMAF + AMD debug
Pull Request -
State: closed - Opened by stas00 about 2 months ago
#55 - fix table
Pull Request -
State: closed - Opened by 152334H 2 months ago
- 1 comment
#54 - fix typo
Pull Request -
State: closed - Opened by yaolu 3 months ago
- 1 comment
#53 - add citation
Pull Request -
State: closed - Opened by stas00 4 months ago
#52 - Adding another logbook (kinda)
Issue -
State: open - Opened by boweiliu 4 months ago
- 2 comments
#50 - Fix in ai-battlefield.md
Pull Request -
State: closed - Opened by andy-yangz 5 months ago
- 1 comment
#49 - Fix incorrect Nvidia retired GPU page size mention.
Pull Request -
State: closed - Opened by cf-natali 5 months ago
- 1 comment
#48 - Fix a couple formulas rendering.
Pull Request -
State: closed - Opened by cf-natali 5 months ago
- 1 comment
#47 - MFU + HFU redux
Pull Request -
State: closed - Opened by stas00 5 months ago
- 2 comments
#46 - SWIGLU: clarifications
Pull Request -
State: closed - Opened by stas00 5 months ago
- 4 comments
#45 - Question about the right hidden dim when using SwiGLU
Issue -
State: closed - Opened by Thytu 5 months ago
- 3 comments
#44 - fix bf16 <-> fp16 dtype statement
Pull Request -
State: closed - Opened by stas00 6 months ago
#43 - fix tpu v4 hbm2 bw
Pull Request -
State: closed - Opened by stas00 6 months ago
#42 - fix typo in emulate multi node
Pull Request -
State: closed - Opened by Thytu 6 months ago
- 1 comment
#41 - Question about changing precision post training
Issue -
State: closed - Opened by Thytu 6 months ago
- 2 comments
#40 - TPU v4 has 1,200GB/s of mem bandwidth and not 2,400, right?
Issue -
State: closed - Opened by rodrigo-f-nogueira 6 months ago
- 1 comment
#39 - Fix broken links.
Pull Request -
State: closed - Opened by cf-natali 6 months ago
- 1 comment
#38 - [AI battlefield] Update NVLink bandwidths to uni-directional numbers.
Pull Request -
State: closed - Opened by cf-natali 6 months ago
- 1 comment
#36 - Add num_processes and num_machines to accelerate launcher
Pull Request -
State: closed - Opened by adamlin120 6 months ago
- 1 comment
#35 - [Network] Complete missing sentence
Pull Request -
State: closed - Opened by patrickvonplaten 7 months ago
- 1 comment
#34 - [Network] Some typos in the README
Pull Request -
State: closed - Opened by patrickvonplaten 7 months ago
- 1 comment
#32 - discuss the solutions to Not fully recovering spikes
Issue -
State: closed - Opened by pengzhangzhi 7 months ago
- 7 comments
#31 - Update README.md in network chapter, update bandwidth info
Pull Request -
State: closed - Opened by kisseternity 7 months ago
- 1 comment
#30 - Conflicting opinions about streaming data from cloud storage?
Issue -
State: closed - Opened by hacobe 7 months ago
- 2 comments
#29 - Update ai-battlefield.md
Pull Request -
State: closed - Opened by findmyway 7 months ago
- 1 comment
#28 - Quarto Site
Issue -
State: closed - Opened by saforem2 7 months ago
- 3 comments
#27 - Fix single node networking analysis
Pull Request -
State: closed - Opened by haidark 7 months ago
- 1 comment
#26 - Update README.md
Pull Request -
State: closed - Opened by pitmonticone 7 months ago
- 1 comment
#25 - Reorg 2
Pull Request -
State: closed - Opened by stas00 7 months ago
#24 - Add flash attention to overview
Pull Request -
State: closed - Opened by Quentin-Anthony 7 months ago
- 1 comment
#23 - Clarification for gradient memory in mixed precision training
Issue -
State: closed - Opened by SumanthRH 8 months ago
- 3 comments
#22 - Add cookbook and model co-design refs
Pull Request -
State: closed - Opened by Quentin-Anthony 8 months ago
- 1 comment
#21 - restructuring tools
Pull Request -
State: closed - Opened by stas00 8 months ago
#20 - pip install -r build/requirements.txt fails due to github_md_utils
Issue -
State: closed - Opened by ebowman 8 months ago
- 3 comments
#19 - Fix typo in README.md
Pull Request -
State: closed - Opened by nicolapace 8 months ago
- 1 comment
#18 - fix typo
Pull Request -
State: closed - Opened by g1y5x3 9 months ago
- 1 comment
#17 - Update emulate-multi-node.md
Pull Request -
State: closed - Opened by saforem2 9 months ago
- 2 comments
#16 - Fix typo
Pull Request -
State: closed - Opened by pitmonticone 9 months ago
- 1 comment
#15 - Improve folder structure
Issue -
State: closed - Opened by heyimjonas 10 months ago
- 3 comments
#14 - Update ai-battlefield.md
Pull Request -
State: closed - Opened by eryk-mazus 10 months ago
- 1 comment
#13 - Daisy chain batch jobs
Issue -
State: closed - Opened by adammoody 10 months ago
- 1 comment
#12 - Update ai-battlefield.md
Pull Request -
State: closed - Opened by evelynmitchell 10 months ago
- 1 comment
#11 - Update GPU guide with IPU info
Pull Request -
State: closed - Opened by thecharlieblake 10 months ago
- 1 comment
#10 - Typo fixes
Pull Request -
State: closed - Opened by BioGeek 10 months ago
- 3 comments
#9 - GPU requirements and cost estimation.
Issue -
State: closed - Opened by Anindyadeep 11 months ago
- 4 comments
#8 - Minor Typo in emulate multi node
Issue -
State: closed - Opened by anindya-saha 11 months ago
- 4 comments
#7 - [feat] md2pdf
Pull Request -
State: closed - Opened by pengzhangzhi 11 months ago
- 13 comments
#6 - convert markdown to pdf
Issue -
State: closed - Opened by pengzhangzhi 11 months ago
- 10 comments
#5 - Missing `hparams` section
Issue -
State: closed - Opened by jvmncs 11 months ago
- 2 comments
#4 - PaLM training instability
Pull Request -
State: closed - Opened by cx0 11 months ago
- 1 comment
#3 - Fix typos
Pull Request -
State: closed - Opened by pitmonticone 12 months ago
- 2 comments
#2 - Convert to bfloat16 failing
Issue -
State: closed - Opened by mhillebrand about 1 year ago
- 2 comments
#1 - Parallel training hangs
Issue -
State: closed - Opened by mhillebrand over 2 years ago
- 10 comments