Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Lightning-AI/lightning issues and pull requests
#10888 - [RFC] Support a `Trainer.train()` API
Issue -
State: closed - Opened by ananthsub almost 3 years ago
- 10 comments
Labels: feature, discussion, trainer
#10888 - [RFC] Support a `Trainer.train()` API
Issue -
State: closed - Opened by ananthsub almost 3 years ago
- 10 comments
Labels: feature, discussion, trainer
#10867 - Remove `return_result` argument from `DDPSpawnPlugin.spawn()`
Pull Request -
State: closed - Opened by awaelchli almost 3 years ago
Labels: ready, design, breaking change, strategy: ddp
#10865 - Deprecate and redefine `add_to_queue`/`get_from_queue` in spawn plugins
Issue -
State: closed - Opened by awaelchli almost 3 years ago
Labels: feature, deprecation, strategy: ddp
#10855 - Remove redundant None check from spawn plugins
Pull Request -
State: closed - Opened by awaelchli almost 3 years ago
Labels: ready, accelerator: tpu, refactor, strategy: ddp
#10854 - Add missing deprecation rerouting for `add_to_queue` in `TPUSpawnPlugin`
Pull Request -
State: closed - Opened by awaelchli almost 3 years ago
- 1 comment
Labels: ready, accelerator: tpu, deprecation, strategy: ddp
#10849 - Removed unnecessary `_move_optimizer_state` method overrides
Pull Request -
State: closed - Opened by four4fish almost 3 years ago
Labels: ready, accelerator: tpu, refactor, strategy: ddp
#10846 - Save and load with CheckpointIO in DDPSpawn plugins
Pull Request -
State: closed - Opened by awaelchli almost 3 years ago
- 2 comments
Labels: ready, accelerator: tpu, refactor, strategy: ddp
#10806 - 🚀 Memory Sharing of Datasets Across Different Processes
Issue -
State: closed - Opened by rusty1s almost 3 years ago
- 4 comments
Labels: feature
#10806 - 🚀 Memory Sharing of Datasets Across Different Processes
Issue -
State: closed - Opened by rusty1s almost 3 years ago
- 4 comments
Labels: feature
#10530 - Label tracking meta-issue (edit me to get automatically CC'ed on issues!)
Issue -
State: open - Opened by carmocca almost 3 years ago
- 9 comments
#10510 - DeepSpeed stage 3 and mixed precision cause an error
Issue -
State: open - Opened by ktrapeznikov almost 3 years ago
- 19 comments
Labels: bug, 3rd party, strategy: deepspeed
#10411 - Test - ignore me
Pull Request -
State: closed - Opened by carmocca almost 3 years ago
#10290 - IterableDataset with wrong length causes validation loop to be skipped.
Issue -
State: closed - Opened by jopo666 almost 3 years ago
- 5 comments
Labels: bug, help wanted, won't fix
#10260 - Guarantee call order for callbacks
Issue -
State: open - Opened by z-a-f almost 3 years ago
- 8 comments
Labels: question, callback
#10260 - Guarantee call order for callbacks
Issue -
State: open - Opened by z-a-f almost 3 years ago
- 8 comments
Labels: question, callback
#10260 - Guarantee call order for callbacks
Issue -
State: open - Opened by z-a-f almost 3 years ago
- 8 comments
Labels: question, callback
#10260 - Guarantee call order for callbacks
Issue -
State: open - Opened by z-a-f almost 3 years ago
- 8 comments
Labels: question, callback
#10034 - 1/n Simplify spawn plugins: Simplify handling of multiprocessing queue
Pull Request -
State: closed - Opened by awaelchli almost 3 years ago
- 2 comments
Labels: feature, ready, refactor, design, strategy: ddp
#9759 - IPU hotfix for #9721
Pull Request -
State: closed - Opened by carmocca about 3 years ago
- 2 comments
Labels: bug, ready, priority: 0, ci, accelerator: ipu (external)
#9641 - code stuck at All DDP processes registered
Issue -
State: closed - Opened by derrick-xwp about 3 years ago
- 13 comments
Labels: bug, help wanted, distributed, priority: 2
#9318 - dictionary update sequence element #0 has length 1; 2 is required
Issue -
State: closed - Opened by cristianegea about 3 years ago
- 17 comments
Labels: bug, help wanted
#9318 - dictionary update sequence element #0 has length 1; 2 is required
Issue -
State: closed - Opened by cristianegea about 3 years ago
- 17 comments
Labels: bug, help wanted
#9318 - dictionary update sequence element #0 has length 1; 2 is required
Issue -
State: closed - Opened by cristianegea about 3 years ago
- 17 comments
Labels: bug, help wanted
#9054 - [Ignore me if your name is not Adrian]Â Fix mypy for 8642
Pull Request -
State: closed - Opened by carmocca about 3 years ago
- 1 comment
#8732 - [RFC] Deprecate direct support for truncated backprop through time
Issue -
State: closed - Opened by ananthsub about 3 years ago
- 19 comments
Labels: discussion, design, deprecation
#8563 - Main script runs multiple times - once per GPU
Issue -
State: closed - Opened by ViktorThink about 3 years ago
- 8 comments
Labels: help wanted, question
#8563 - Main script runs multiple times - once per GPU
Issue -
State: closed - Opened by ViktorThink about 3 years ago
- 8 comments
Labels: help wanted, question
#8395 - how to put ```trainer.fit()``` in for loop?
Issue -
State: closed - Opened by anik123 about 3 years ago
- 10 comments
Labels: question
#8275 - Carmocca/wip 7724 update new loops
Pull Request -
State: closed - Opened by carmocca about 3 years ago
- 1 comment
#8080 - Destroy process group in DDP destructor
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 17 comments
Labels: bug, distributed
#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor
Issue -
State: closed - Opened by mleshen over 3 years ago
- 10 comments
Labels: bug, help wanted, checkpointing, 3rd party
#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor
Issue -
State: closed - Opened by mleshen over 3 years ago
- 10 comments
Labels: bug, help wanted, checkpointing, 3rd party
#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor
Issue -
State: closed - Opened by mleshen over 3 years ago
- 10 comments
Labels: bug, help wanted, checkpointing, 3rd party
#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor
Issue -
State: closed - Opened by mleshen over 3 years ago
- 10 comments
Labels: bug, help wanted, checkpointing, 3rd party
#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor
Issue -
State: closed - Opened by mleshen over 3 years ago
- 10 comments
Labels: bug, help wanted, checkpointing, 3rd party
#7957 - Failed Manual Backward during DeepSpeed training.
Issue -
State: closed - Opened by Zasder3 over 3 years ago
- 5 comments
Labels: bug, help wanted, priority: 1
#7817 - CUDA OOM when using "ddp" mode in training
Issue -
State: closed - Opened by choieq over 3 years ago
- 14 comments
Labels: bug, help wanted, distributed
#7638 - Freeze on restore from checkpoint
Issue -
State: closed - Opened by Rizhiy over 3 years ago
- 19 comments
Labels: bug, help wanted, priority: 1
#7568 - TensorBoardLogger does not store hparams inside a dataclass
Issue -
State: closed - Opened by sebp over 3 years ago
- 3 comments
Labels: feature, logger
#7568 - TensorBoardLogger does not store hparams inside a dataclass
Issue -
State: closed - Opened by sebp over 3 years ago
- 3 comments
Labels: feature, logger
#7540 - Callbacks are not saved to the config file
Issue -
State: closed - Opened by tshu-w over 3 years ago
- 13 comments
Labels: help wanted, question, argparse (removed)
#7390 - Update `configure_optimizers` docs
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 1 comment
Labels: ready, docs
#7233 - DDP - Worse performance with 2 GPUs compared to 1.
Issue -
State: closed - Opened by ManiadisG over 3 years ago
- 17 comments
Labels: bug, help wanted
#7233 - DDP - Worse performance with 2 GPUs compared to 1.
Issue -
State: closed - Opened by ManiadisG over 3 years ago
- 17 comments
Labels: bug, help wanted
#7233 - DDP - Worse performance with 2 GPUs compared to 1.
Issue -
State: closed - Opened by ManiadisG over 3 years ago
- 17 comments
Labels: bug, help wanted
#7028 - [Grid] You must call wandb.init() before wandb.log()
Issue -
State: closed - Opened by turian over 3 years ago
- 7 comments
Labels: bug, help wanted
#6782 - Not found output when run trainer.test
Issue -
State: closed - Opened by toandaominh1997 over 3 years ago
- 5 comments
Labels: bug, help wanted, docs, waiting on author
#6641 - External MLFlow logging failures cause training job to fail
Issue -
State: closed - Opened by nathancooperjones over 3 years ago
- 2 comments
Labels: bug, help wanted, won't fix, logger, 3rd party
#6622 - Revisit CONTRIBUTING.md
Issue -
State: closed - Opened by camruta over 3 years ago
- 8 comments
Labels: docs, priority: 1
#6622 - Revisit CONTRIBUTING.md
Issue -
State: closed - Opened by camruta over 3 years ago
- 8 comments
Labels: docs, priority: 1
#6622 - Revisit CONTRIBUTING.md
Issue -
State: closed - Opened by camruta over 3 years ago
- 8 comments
Labels: docs, priority: 1
#6622 - Revisit CONTRIBUTING.md
Issue -
State: closed - Opened by camruta over 3 years ago
- 8 comments
Labels: docs, priority: 1
#6622 - Revisit CONTRIBUTING.md
Issue -
State: closed - Opened by camruta over 3 years ago
- 8 comments
Labels: docs, priority: 1
#6415 - Fails to install on google colab
Issue -
State: closed - Opened by simonm3 over 3 years ago
- 8 comments
Labels: bug, help wanted, 3rd party
#6389 - Disable automatic SLURM Detection
Issue -
State: closed - Opened by amogkam over 3 years ago
- 34 comments
Labels: feature, help wanted, priority: 0, design, environment: slurm
#6362 - ImportError: cannot import name 'Batch' from 'torchtext.data' - pytorch 1.8 and torchtext 0.9
Issue -
State: closed - Opened by shrinath-suresh over 3 years ago
- 2 comments
Labels: bug, help wanted
#6154 - Cherry-picking 1.2.1 release [full merge, no squash]
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 2 comments
Labels: ready
#6094 - API consistency: "val" vs "validation"
Issue -
State: closed - Opened by msegado over 3 years ago
- 11 comments
Labels: feature, refactor, design, deprecation
#5943 - Reduce LR On Plateau after validation epoch when val_check_interval < 1
Issue -
State: closed - Opened by MatthieuToulemont over 3 years ago
- 3 comments
Labels: feature, help wanted, won't fix
#5904 - Delete unused autopep8 config
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 1 comment
Labels: ready, ci, refactor
#5874 - Refactor utilities/imports.py
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 2 comments
Labels: ready, refactor
#5825 - Fix Pruning callback and add a few features
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 6 comments
Labels: bug, feature, ready, callback
#5580 - Add new CHANGELOG section
Pull Request -
State: closed - Opened by carmocca over 3 years ago
Labels: ready
#5576 - Prepare 1.1.5 release
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 1 comment
Labels: ready
#5564 - Update README help steps
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 2 comments
Labels: ready, docs
#5563 - Drop greetings comment
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 1 comment
Labels: ready, ci
#5561 - Update CODEOWNERS
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 5 comments
Labels: ready
#5558 - Mixed precision: scheduler and optimizer are called in the wrong order
Issue -
State: open - Opened by kilianovski over 3 years ago
- 35 comments
Labels: bug, priority: 2, precision: amp, lr scheduler
#5434 - Document exceptions
Issue -
State: closed - Opened by akihironitta over 3 years ago
- 20 comments
Labels: good first issue, docs
#5434 - Document exceptions
Issue -
State: closed - Opened by akihironitta over 3 years ago
- 20 comments
Labels: good first issue, docs
#5434 - Document exceptions
Issue -
State: closed - Opened by akihironitta over 3 years ago
- 20 comments
Labels: good first issue, docs
#5434 - Document exceptions
Issue -
State: closed - Opened by akihironitta over 3 years ago
- 20 comments
Labels: good first issue, docs
#5434 - Document exceptions
Issue -
State: closed - Opened by akihironitta over 3 years ago
- 20 comments
Labels: good first issue, docs
#5378 - Add 1.1.4 section to CHANGELOG
Pull Request -
State: closed - Opened by carmocca over 3 years ago
- 1 comment
Labels: ready
#5339 - Resuming should allow to differentiate what to resume (steps/opti/weights)
Issue -
State: open - Opened by thoglu over 3 years ago
- 24 comments
Labels: feature, help wanted, priority: 1
#5339 - Resuming should allow to differentiate what to resume (steps/opti/weights)
Issue -
State: open - Opened by thoglu over 3 years ago
- 24 comments
Labels: feature, help wanted, priority: 1
#5302 - Add methods for reset dataloader during custom training
Issue -
State: closed - Opened by rwbfd almost 4 years ago
- 8 comments
Labels: feature, help wanted, design, priority: 1
#5302 - Add methods for reset dataloader during custom training
Issue -
State: closed - Opened by rwbfd almost 4 years ago
- 8 comments
Labels: feature, help wanted, design, priority: 1
#5302 - Add methods for reset dataloader during custom training
Issue -
State: closed - Opened by rwbfd almost 4 years ago
- 8 comments
Labels: feature, help wanted, design, priority: 1
#5243 - Returning None from training_step with multi GPU DDP training
Issue -
State: open - Opened by iamkucuk almost 4 years ago
- 26 comments
Labels: feature, help wanted, distributed, priority: 1
#5145 - Dataloader with custom batch sampler
Issue -
State: closed - Opened by chagmgang almost 4 years ago
- 12 comments
Labels: question, distributed
#5057 - Do not warn when the name key is used in the lr_scheduler dict
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 5 comments
Labels: bug, ready, priority: 1
#5049 - Improve some tests
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 2 comments
Labels: ready, ci, refactor, priority: 2
#5038 - Add carmocca to core
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 1 comment
Labels: ready, docs
#5008 - Start version suffixes at 1
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 4 comments
Labels: feature, ready, design, checkpointing
#4956 - how to properly skip samples that cause inf/nan gradients/loss
Issue -
State: closed - Opened by levhaikin almost 4 years ago
- 21 comments
Labels: feature, question, won't fix
#4875 - Improve epoch_result_store code quality
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 1 comment
Labels: feature, ready
#4870 - Auto-scale batch size triggers "on_train_end"
Issue -
State: closed - Opened by victorjoos almost 4 years ago
- 5 comments
Labels: bug, help wanted, tuner, priority: 1
#4870 - Auto-scale batch size triggers "on_train_end"
Issue -
State: closed - Opened by victorjoos almost 4 years ago
- 5 comments
Labels: bug, help wanted, tuner, priority: 1
#4853 - Reduce missing in DDP's training/validation_step_end
Issue -
State: closed - Opened by apacha almost 4 years ago
- 12 comments
Labels: bug, help wanted, working as intended, distributed
#4721 - Add current_score to ModelCheckpoint.on_save_checkpoint
Pull Request -
State: closed - Opened by carmocca almost 4 years ago
- 1 comment
Labels: feature, ready, checkpointing
#4666 - RuntimeError: Error(s) in loading state_dict when adding/updating metrics to a trained model.
Issue -
State: closed - Opened by Vichoko almost 4 years ago
- 12 comments
Labels: question
#4612 - Code stuck on "initalizing ddp" when using more than one gpu
Issue -
State: closed - Opened by JosephGatto almost 4 years ago
- 80 comments
Labels: bug, help wanted, distributed, priority: 1
#4504 - DDP bug with ModelCheckpoint on ckp file saving
Issue -
State: closed - Opened by zhiruiluo almost 4 years ago
- 11 comments
Labels: bug, help wanted, distributed
#4504 - DDP bug with ModelCheckpoint on ckp file saving
Issue -
State: closed - Opened by zhiruiluo almost 4 years ago
- 11 comments
Labels: bug, help wanted, distributed
#4504 - DDP bug with ModelCheckpoint on ckp file saving
Issue -
State: closed - Opened by zhiruiluo almost 4 years ago
- 11 comments
Labels: bug, help wanted, distributed
#4504 - DDP bug with ModelCheckpoint on ckp file saving
Issue -
State: closed - Opened by zhiruiluo almost 4 years ago
- 11 comments
Labels: bug, help wanted, distributed
#4504 - DDP bug with ModelCheckpoint on ckp file saving
Issue -
State: closed - Opened by zhiruiluo almost 4 years ago
- 11 comments
Labels: bug, help wanted, distributed
#4471 - Help with understanding unknown 'c10::Error' thrown during DDP training
Issue -
State: closed - Opened by neergaard almost 4 years ago
- 17 comments
Labels: bug, help wanted