Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / Lightning-AI/lightning issues and pull requests

#10888 - [RFC] Support a `Trainer.train()` API

Issue - State: closed - Opened by ananthsub almost 3 years ago - 10 comments
Labels: feature, discussion, trainer

#10888 - [RFC] Support a `Trainer.train()` API

Issue - State: closed - Opened by ananthsub almost 3 years ago - 10 comments
Labels: feature, discussion, trainer

#10867 - Remove `return_result` argument from `DDPSpawnPlugin.spawn()`

Pull Request - State: closed - Opened by awaelchli almost 3 years ago
Labels: ready, design, breaking change, strategy: ddp

#10865 - Deprecate and redefine `add_to_queue`/`get_from_queue` in spawn plugins

Issue - State: closed - Opened by awaelchli almost 3 years ago
Labels: feature, deprecation, strategy: ddp

#10855 - Remove redundant None check from spawn plugins

Pull Request - State: closed - Opened by awaelchli almost 3 years ago
Labels: ready, accelerator: tpu, refactor, strategy: ddp

#10854 - Add missing deprecation rerouting for `add_to_queue` in `TPUSpawnPlugin`

Pull Request - State: closed - Opened by awaelchli almost 3 years ago - 1 comment
Labels: ready, accelerator: tpu, deprecation, strategy: ddp

#10849 - Removed unnecessary `_move_optimizer_state` method overrides

Pull Request - State: closed - Opened by four4fish almost 3 years ago
Labels: ready, accelerator: tpu, refactor, strategy: ddp

#10846 - Save and load with CheckpointIO in DDPSpawn plugins

Pull Request - State: closed - Opened by awaelchli almost 3 years ago - 2 comments
Labels: ready, accelerator: tpu, refactor, strategy: ddp

#10806 - 🚀 Memory Sharing of Datasets Across Different Processes

Issue - State: closed - Opened by rusty1s almost 3 years ago - 4 comments
Labels: feature

#10806 - 🚀 Memory Sharing of Datasets Across Different Processes

Issue - State: closed - Opened by rusty1s almost 3 years ago - 4 comments
Labels: feature

#10530 - Label tracking meta-issue (edit me to get automatically CC'ed on issues!)

Issue - State: open - Opened by carmocca almost 3 years ago - 9 comments

#10510 - DeepSpeed stage 3 and mixed precision cause an error

Issue - State: open - Opened by ktrapeznikov almost 3 years ago - 19 comments
Labels: bug, 3rd party, strategy: deepspeed

#10411 - Test - ignore me

Pull Request - State: closed - Opened by carmocca almost 3 years ago

#10290 - IterableDataset with wrong length causes validation loop to be skipped.

Issue - State: closed - Opened by jopo666 almost 3 years ago - 5 comments
Labels: bug, help wanted, won't fix

#10260 - Guarantee call order for callbacks

Issue - State: open - Opened by z-a-f almost 3 years ago - 8 comments
Labels: question, callback

#10260 - Guarantee call order for callbacks

Issue - State: open - Opened by z-a-f almost 3 years ago - 8 comments
Labels: question, callback

#10260 - Guarantee call order for callbacks

Issue - State: open - Opened by z-a-f almost 3 years ago - 8 comments
Labels: question, callback

#10260 - Guarantee call order for callbacks

Issue - State: open - Opened by z-a-f almost 3 years ago - 8 comments
Labels: question, callback

#10034 - 1/n Simplify spawn plugins: Simplify handling of multiprocessing queue

Pull Request - State: closed - Opened by awaelchli almost 3 years ago - 2 comments
Labels: feature, ready, refactor, design, strategy: ddp

#9759 - IPU hotfix for #9721

Pull Request - State: closed - Opened by carmocca about 3 years ago - 2 comments
Labels: bug, ready, priority: 0, ci, accelerator: ipu (external)

#9641 - code stuck at All DDP processes registered

Issue - State: closed - Opened by derrick-xwp about 3 years ago - 13 comments
Labels: bug, help wanted, distributed, priority: 2

#9318 - dictionary update sequence element #0 has length 1; 2 is required

Issue - State: closed - Opened by cristianegea about 3 years ago - 17 comments
Labels: bug, help wanted

#9318 - dictionary update sequence element #0 has length 1; 2 is required

Issue - State: closed - Opened by cristianegea about 3 years ago - 17 comments
Labels: bug, help wanted

#9318 - dictionary update sequence element #0 has length 1; 2 is required

Issue - State: closed - Opened by cristianegea about 3 years ago - 17 comments
Labels: bug, help wanted

#9054 - [Ignore me if your name is not Adrian] Fix mypy for 8642

Pull Request - State: closed - Opened by carmocca about 3 years ago - 1 comment

#8732 - [RFC] Deprecate direct support for truncated backprop through time

Issue - State: closed - Opened by ananthsub about 3 years ago - 19 comments
Labels: discussion, design, deprecation

#8563 - Main script runs multiple times - once per GPU

Issue - State: closed - Opened by ViktorThink about 3 years ago - 8 comments
Labels: help wanted, question

#8563 - Main script runs multiple times - once per GPU

Issue - State: closed - Opened by ViktorThink about 3 years ago - 8 comments
Labels: help wanted, question

#8395 - how to put ```trainer.fit()``` in for loop?

Issue - State: closed - Opened by anik123 about 3 years ago - 10 comments
Labels: question

#8275 - Carmocca/wip 7724 update new loops

Pull Request - State: closed - Opened by carmocca about 3 years ago - 1 comment

#8080 - Destroy process group in DDP destructor

Pull Request - State: closed - Opened by carmocca over 3 years ago - 17 comments
Labels: bug, distributed

#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor

Issue - State: closed - Opened by mleshen over 3 years ago - 10 comments
Labels: bug, help wanted, checkpointing, 3rd party

#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor

Issue - State: closed - Opened by mleshen over 3 years ago - 10 comments
Labels: bug, help wanted, checkpointing, 3rd party

#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor

Issue - State: closed - Opened by mleshen over 3 years ago - 10 comments
Labels: bug, help wanted, checkpointing, 3rd party

#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor

Issue - State: closed - Opened by mleshen over 3 years ago - 10 comments
Labels: bug, help wanted, checkpointing, 3rd party

#8043 - OOM issues with loading large model checkpoints w/ FSDP after checkpoint refactor

Issue - State: closed - Opened by mleshen over 3 years ago - 10 comments
Labels: bug, help wanted, checkpointing, 3rd party

#7957 - Failed Manual Backward during DeepSpeed training.

Issue - State: closed - Opened by Zasder3 over 3 years ago - 5 comments
Labels: bug, help wanted, priority: 1

#7817 - CUDA OOM when using "ddp" mode in training

Issue - State: closed - Opened by choieq over 3 years ago - 14 comments
Labels: bug, help wanted, distributed

#7638 - Freeze on restore from checkpoint

Issue - State: closed - Opened by Rizhiy over 3 years ago - 19 comments
Labels: bug, help wanted, priority: 1

#7568 - TensorBoardLogger does not store hparams inside a dataclass

Issue - State: closed - Opened by sebp over 3 years ago - 3 comments
Labels: feature, logger

#7568 - TensorBoardLogger does not store hparams inside a dataclass

Issue - State: closed - Opened by sebp over 3 years ago - 3 comments
Labels: feature, logger

#7540 - Callbacks are not saved to the config file

Issue - State: closed - Opened by tshu-w over 3 years ago - 13 comments
Labels: help wanted, question, argparse (removed)

#7390 - Update `configure_optimizers` docs

Pull Request - State: closed - Opened by carmocca over 3 years ago - 1 comment
Labels: ready, docs

#7233 - DDP - Worse performance with 2 GPUs compared to 1.

Issue - State: closed - Opened by ManiadisG over 3 years ago - 17 comments
Labels: bug, help wanted

#7233 - DDP - Worse performance with 2 GPUs compared to 1.

Issue - State: closed - Opened by ManiadisG over 3 years ago - 17 comments
Labels: bug, help wanted

#7233 - DDP - Worse performance with 2 GPUs compared to 1.

Issue - State: closed - Opened by ManiadisG over 3 years ago - 17 comments
Labels: bug, help wanted

#7028 - [Grid] You must call wandb.init() before wandb.log()

Issue - State: closed - Opened by turian over 3 years ago - 7 comments
Labels: bug, help wanted

#6782 - Not found output when run trainer.test

Issue - State: closed - Opened by toandaominh1997 over 3 years ago - 5 comments
Labels: bug, help wanted, docs, waiting on author

#6641 - External MLFlow logging failures cause training job to fail

Issue - State: closed - Opened by nathancooperjones over 3 years ago - 2 comments
Labels: bug, help wanted, won't fix, logger, 3rd party

#6622 - Revisit CONTRIBUTING.md

Issue - State: closed - Opened by camruta over 3 years ago - 8 comments
Labels: docs, priority: 1

#6622 - Revisit CONTRIBUTING.md

Issue - State: closed - Opened by camruta over 3 years ago - 8 comments
Labels: docs, priority: 1

#6622 - Revisit CONTRIBUTING.md

Issue - State: closed - Opened by camruta over 3 years ago - 8 comments
Labels: docs, priority: 1

#6622 - Revisit CONTRIBUTING.md

Issue - State: closed - Opened by camruta over 3 years ago - 8 comments
Labels: docs, priority: 1

#6622 - Revisit CONTRIBUTING.md

Issue - State: closed - Opened by camruta over 3 years ago - 8 comments
Labels: docs, priority: 1

#6415 - Fails to install on google colab

Issue - State: closed - Opened by simonm3 over 3 years ago - 8 comments
Labels: bug, help wanted, 3rd party

#6389 - Disable automatic SLURM Detection

Issue - State: closed - Opened by amogkam over 3 years ago - 34 comments
Labels: feature, help wanted, priority: 0, design, environment: slurm

#6362 - ImportError: cannot import name 'Batch' from 'torchtext.data' - pytorch 1.8 and torchtext 0.9

Issue - State: closed - Opened by shrinath-suresh over 3 years ago - 2 comments
Labels: bug, help wanted

#6154 - Cherry-picking 1.2.1 release [full merge, no squash]

Pull Request - State: closed - Opened by carmocca over 3 years ago - 2 comments
Labels: ready

#6094 - API consistency: "val" vs "validation"

Issue - State: closed - Opened by msegado over 3 years ago - 11 comments
Labels: feature, refactor, design, deprecation

#5943 - Reduce LR On Plateau after validation epoch when val_check_interval < 1

Issue - State: closed - Opened by MatthieuToulemont over 3 years ago - 3 comments
Labels: feature, help wanted, won't fix

#5904 - Delete unused autopep8 config

Pull Request - State: closed - Opened by carmocca over 3 years ago - 1 comment
Labels: ready, ci, refactor

#5874 - Refactor utilities/imports.py

Pull Request - State: closed - Opened by carmocca over 3 years ago - 2 comments
Labels: ready, refactor

#5825 - Fix Pruning callback and add a few features

Pull Request - State: closed - Opened by carmocca over 3 years ago - 6 comments
Labels: bug, feature, ready, callback

#5580 - Add new CHANGELOG section

Pull Request - State: closed - Opened by carmocca over 3 years ago
Labels: ready

#5576 - Prepare 1.1.5 release

Pull Request - State: closed - Opened by carmocca over 3 years ago - 1 comment
Labels: ready

#5564 - Update README help steps

Pull Request - State: closed - Opened by carmocca over 3 years ago - 2 comments
Labels: ready, docs

#5563 - Drop greetings comment

Pull Request - State: closed - Opened by carmocca over 3 years ago - 1 comment
Labels: ready, ci

#5561 - Update CODEOWNERS

Pull Request - State: closed - Opened by carmocca over 3 years ago - 5 comments
Labels: ready

#5558 - Mixed precision: scheduler and optimizer are called in the wrong order

Issue - State: open - Opened by kilianovski over 3 years ago - 35 comments
Labels: bug, priority: 2, precision: amp, lr scheduler

#5434 - Document exceptions

Issue - State: closed - Opened by akihironitta over 3 years ago - 20 comments
Labels: good first issue, docs

#5434 - Document exceptions

Issue - State: closed - Opened by akihironitta over 3 years ago - 20 comments
Labels: good first issue, docs

#5434 - Document exceptions

Issue - State: closed - Opened by akihironitta over 3 years ago - 20 comments
Labels: good first issue, docs

#5434 - Document exceptions

Issue - State: closed - Opened by akihironitta over 3 years ago - 20 comments
Labels: good first issue, docs

#5434 - Document exceptions

Issue - State: closed - Opened by akihironitta over 3 years ago - 20 comments
Labels: good first issue, docs

#5378 - Add 1.1.4 section to CHANGELOG

Pull Request - State: closed - Opened by carmocca over 3 years ago - 1 comment
Labels: ready

#5339 - Resuming should allow to differentiate what to resume (steps/opti/weights)

Issue - State: open - Opened by thoglu over 3 years ago - 24 comments
Labels: feature, help wanted, priority: 1

#5339 - Resuming should allow to differentiate what to resume (steps/opti/weights)

Issue - State: open - Opened by thoglu over 3 years ago - 24 comments
Labels: feature, help wanted, priority: 1

#5302 - Add methods for reset dataloader during custom training

Issue - State: closed - Opened by rwbfd almost 4 years ago - 8 comments
Labels: feature, help wanted, design, priority: 1

#5302 - Add methods for reset dataloader during custom training

Issue - State: closed - Opened by rwbfd almost 4 years ago - 8 comments
Labels: feature, help wanted, design, priority: 1

#5302 - Add methods for reset dataloader during custom training

Issue - State: closed - Opened by rwbfd almost 4 years ago - 8 comments
Labels: feature, help wanted, design, priority: 1

#5243 - Returning None from training_step with multi GPU DDP training

Issue - State: open - Opened by iamkucuk almost 4 years ago - 26 comments
Labels: feature, help wanted, distributed, priority: 1

#5145 - Dataloader with custom batch sampler

Issue - State: closed - Opened by chagmgang almost 4 years ago - 12 comments
Labels: question, distributed

#5057 - Do not warn when the name key is used in the lr_scheduler dict

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 5 comments
Labels: bug, ready, priority: 1

#5049 - Improve some tests

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 2 comments
Labels: ready, ci, refactor, priority: 2

#5038 - Add carmocca to core

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 1 comment
Labels: ready, docs

#5008 - Start version suffixes at 1

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 4 comments
Labels: feature, ready, design, checkpointing

#4956 - how to properly skip samples that cause inf/nan gradients/loss

Issue - State: closed - Opened by levhaikin almost 4 years ago - 21 comments
Labels: feature, question, won't fix

#4875 - Improve epoch_result_store code quality

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 1 comment
Labels: feature, ready

#4870 - Auto-scale batch size triggers "on_train_end"

Issue - State: closed - Opened by victorjoos almost 4 years ago - 5 comments
Labels: bug, help wanted, tuner, priority: 1

#4870 - Auto-scale batch size triggers "on_train_end"

Issue - State: closed - Opened by victorjoos almost 4 years ago - 5 comments
Labels: bug, help wanted, tuner, priority: 1

#4853 - Reduce missing in DDP's training/validation_step_end

Issue - State: closed - Opened by apacha almost 4 years ago - 12 comments
Labels: bug, help wanted, working as intended, distributed

#4721 - Add current_score to ModelCheckpoint.on_save_checkpoint

Pull Request - State: closed - Opened by carmocca almost 4 years ago - 1 comment
Labels: feature, ready, checkpointing

#4666 - RuntimeError: Error(s) in loading state_dict when adding/updating metrics to a trained model.

Issue - State: closed - Opened by Vichoko almost 4 years ago - 12 comments
Labels: question

#4612 - Code stuck on "initalizing ddp" when using more than one gpu

Issue - State: closed - Opened by JosephGatto almost 4 years ago - 80 comments
Labels: bug, help wanted, distributed, priority: 1

#4504 - DDP bug with ModelCheckpoint on ckp file saving

Issue - State: closed - Opened by zhiruiluo almost 4 years ago - 11 comments
Labels: bug, help wanted, distributed

#4504 - DDP bug with ModelCheckpoint on ckp file saving

Issue - State: closed - Opened by zhiruiluo almost 4 years ago - 11 comments
Labels: bug, help wanted, distributed

#4504 - DDP bug with ModelCheckpoint on ckp file saving

Issue - State: closed - Opened by zhiruiluo almost 4 years ago - 11 comments
Labels: bug, help wanted, distributed

#4504 - DDP bug with ModelCheckpoint on ckp file saving

Issue - State: closed - Opened by zhiruiluo almost 4 years ago - 11 comments
Labels: bug, help wanted, distributed

#4504 - DDP bug with ModelCheckpoint on ckp file saving

Issue - State: closed - Opened by zhiruiluo almost 4 years ago - 11 comments
Labels: bug, help wanted, distributed

#4471 - Help with understanding unknown 'c10::Error' thrown during DDP training

Issue - State: closed - Opened by neergaard almost 4 years ago - 17 comments
Labels: bug, help wanted