Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / stas00/ml-engineering issues and pull requests

#79 - Support more dtype caculate max flops in mamf-finder tools

Pull Request - State: open - Opened by BBuf 5 days ago

#76 - AMD MI250X MAMF efficiency is wrong

Issue - State: closed - Opened by rlrs about 2 months ago - 3 comments

#74 - MAMF Finder usage of MxNxK differs from BLAS gemm MxNxK

Issue - State: closed - Opened by GKolling about 2 months ago - 2 comments

#73 - PDF link in readme doesn't work

Issue - State: closed - Opened by sytelus about 2 months ago - 3 comments

#72 - GPU utilization monitoring

Issue - State: closed - Opened by fortminors about 2 months ago - 6 comments

#71 - Performance Profiling

Issue - State: closed - Opened by jeromeku 2 months ago - 2 comments

#70 - Namespaces as a solution for performance measurements in shared clusters.

Pull Request - State: closed - Opened by BrunoScaglione 2 months ago - 4 comments

#68 - f

Issue - State: closed - Opened by sbhavani 3 months ago

#67 - bb

Issue - State: closed - Opened by sbhavani 3 months ago

#66 - [Question] `FSDP` vs `Deepspeed ZeRO3 / ZeRO++`

Issue - State: closed - Opened by jeromeku 3 months ago - 2 comments

#65 - Change "Gbps" to "GBps" to fix a tiny typo in network.README.md

Pull Request - State: closed - Opened by Txxx926 3 months ago

#64 - grad checkpoint tiny error

Pull Request - State: closed - Opened by baochi0212 3 months ago - 4 comments

#63 - fix tiny error

Pull Request - State: closed - Opened by baochi0212 3 months ago

#62 - slurm job array change nodes

Pull Request - State: closed - Opened by ethanhe42 3 months ago - 1 comment

#61 - slurm job array change nodes

Issue - State: closed - Opened by ethanhe42 3 months ago - 1 comment

#60 - Update GH200 MAMF

Pull Request - State: closed - Opened by yaolu 4 months ago - 1 comment

#59 - Max Achievable TFLOP/s on H100 without warmup

Pull Request - State: closed - Opened by OrenLeung 4 months ago - 1 comment

#58 - MAMF - GH200

Issue - State: closed - Opened by frankschae 4 months ago - 10 comments

#56 - MAMAF + AMD debug

Pull Request - State: closed - Opened by stas00 4 months ago

#55 - fix table

Pull Request - State: closed - Opened by 152334H 5 months ago - 1 comment

#54 - fix typo

Pull Request - State: closed - Opened by yaolu 5 months ago - 1 comment

#53 - add citation

Pull Request - State: closed - Opened by stas00 6 months ago

#52 - Adding another logbook (kinda)

Issue - State: closed - Opened by boweiliu 7 months ago - 4 comments

#50 - Fix in ai-battlefield.md

Pull Request - State: closed - Opened by andy-yangz 7 months ago - 1 comment

#49 - Fix incorrect Nvidia retired GPU page size mention.

Pull Request - State: closed - Opened by cf-natali 8 months ago - 1 comment

#48 - Fix a couple formulas rendering.

Pull Request - State: closed - Opened by cf-natali 8 months ago - 1 comment

#47 - MFU + HFU redux

Pull Request - State: closed - Opened by stas00 8 months ago - 2 comments

#46 - SWIGLU: clarifications

Pull Request - State: closed - Opened by stas00 8 months ago - 4 comments

#45 - Question about the right hidden dim when using SwiGLU

Issue - State: closed - Opened by Thytu 8 months ago - 3 comments

#44 - fix bf16 <-> fp16 dtype statement

Pull Request - State: closed - Opened by stas00 8 months ago

#43 - fix tpu v4 hbm2 bw

Pull Request - State: closed - Opened by stas00 8 months ago

#42 - fix typo in emulate multi node

Pull Request - State: closed - Opened by Thytu 8 months ago - 1 comment

#41 - Question about changing precision post training

Issue - State: closed - Opened by Thytu 8 months ago - 2 comments

#40 - TPU v4 has 1,200GB/s of mem bandwidth and not 2,400, right?

Issue - State: closed - Opened by rodrigo-f-nogueira 8 months ago - 1 comment

#39 - Fix broken links.

Pull Request - State: closed - Opened by cf-natali 8 months ago - 1 comment

#38 - [AI battlefield] Update NVLink bandwidths to uni-directional numbers.

Pull Request - State: closed - Opened by cf-natali 8 months ago - 1 comment

#37 - ML

Issue - State: closed - Opened by lelikdr 8 months ago

#36 - Add num_processes and num_machines to accelerate launcher

Pull Request - State: closed - Opened by adamlin120 8 months ago - 1 comment

#35 - [Network] Complete missing sentence

Pull Request - State: closed - Opened by patrickvonplaten 9 months ago - 1 comment

#34 - [Network] Some typos in the README

Pull Request - State: closed - Opened by patrickvonplaten 9 months ago - 1 comment

#32 - discuss the solutions to Not fully recovering spikes

Issue - State: closed - Opened by pengzhangzhi 9 months ago - 7 comments

#31 - Update README.md in network chapter, update bandwidth info

Pull Request - State: closed - Opened by kisseternity 9 months ago - 1 comment

#30 - Conflicting opinions about streaming data from cloud storage?

Issue - State: closed - Opened by hacobe 9 months ago - 2 comments

#29 - Update ai-battlefield.md

Pull Request - State: closed - Opened by findmyway 9 months ago - 1 comment

#28 - Quarto Site

Issue - State: closed - Opened by saforem2 9 months ago - 3 comments

#27 - Fix single node networking analysis

Pull Request - State: closed - Opened by haidark 9 months ago - 1 comment

#26 - Update README.md

Pull Request - State: closed - Opened by pitmonticone 10 months ago - 1 comment

#25 - Reorg 2

Pull Request - State: closed - Opened by stas00 10 months ago

#24 - Add flash attention to overview

Pull Request - State: closed - Opened by Quentin-Anthony 10 months ago - 1 comment

#23 - Clarification for gradient memory in mixed precision training

Issue - State: closed - Opened by SumanthRH 10 months ago - 3 comments

#22 - Add cookbook and model co-design refs

Pull Request - State: closed - Opened by Quentin-Anthony 10 months ago - 1 comment

#21 - restructuring tools

Pull Request - State: closed - Opened by stas00 10 months ago

#20 - pip install -r build/requirements.txt fails due to github_md_utils

Issue - State: closed - Opened by ebowman 10 months ago - 3 comments

#19 - Fix typo in README.md

Pull Request - State: closed - Opened by nicolapace 10 months ago - 1 comment

#18 - fix typo

Pull Request - State: closed - Opened by g1y5x3 11 months ago - 1 comment

#17 - Update emulate-multi-node.md

Pull Request - State: closed - Opened by saforem2 11 months ago - 2 comments

#16 - Fix typo

Pull Request - State: closed - Opened by pitmonticone 11 months ago - 1 comment

#15 - Improve folder structure

Issue - State: closed - Opened by heyimjonas 12 months ago - 3 comments

#14 - Update ai-battlefield.md

Pull Request - State: closed - Opened by eryk-mazus 12 months ago - 1 comment

#13 - Daisy chain batch jobs

Issue - State: closed - Opened by adammoody 12 months ago - 1 comment

#12 - Update ai-battlefield.md

Pull Request - State: closed - Opened by evelynmitchell 12 months ago - 1 comment

#11 - Update GPU guide with IPU info

Pull Request - State: closed - Opened by thecharlieblake about 1 year ago - 1 comment

#10 - Typo fixes

Pull Request - State: closed - Opened by BioGeek about 1 year ago - 3 comments

#9 - GPU requirements and cost estimation.

Issue - State: closed - Opened by Anindyadeep about 1 year ago - 4 comments

#8 - Minor Typo in emulate multi node

Issue - State: closed - Opened by anindya-saha about 1 year ago - 4 comments

#7 - [feat] md2pdf

Pull Request - State: closed - Opened by pengzhangzhi about 1 year ago - 13 comments

#6 - convert markdown to pdf

Issue - State: closed - Opened by pengzhangzhi about 1 year ago - 10 comments

#5 - Missing `hparams` section

Issue - State: closed - Opened by jvmncs about 1 year ago - 2 comments

#4 - PaLM training instability

Pull Request - State: closed - Opened by cx0 about 1 year ago - 1 comment

#3 - Fix typos

Pull Request - State: closed - Opened by pitmonticone about 1 year ago - 2 comments

#2 - Convert to bfloat16 failing

Issue - State: closed - Opened by mhillebrand over 1 year ago - 2 comments

#1 - Parallel training hangs

Issue - State: closed - Opened by mhillebrand over 2 years ago - 10 comments