Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / Lightning-AI/pytorch-lightning issues and pull requests

#15073 - [App] Support app checkpointing

Pull Request - State: closed - Opened by manskx about 2 years ago - 9 comments
Labels: feature, checkpointing, has conflicts, app

#14674 - `auto_lr_find` does not work if there is a BackboneFinetuning callback

Issue - State: open - Opened by ejm714 about 2 years ago - 2 comments
Labels: bug, help wanted, tuner, callback: finetuning

#14645 - Unable to run works in the `LightningList` structure on cloud

Issue - State: closed - Opened by krshrimali about 2 years ago - 1 comment
Labels: bug, app

#14579 - `TrainingEpochLoop._should_check_val_fx` discrepancy between continued run <> restore from ckpt

Issue - State: closed - Opened by Anner-deJong about 2 years ago - 4 comments
Labels: bug, help wanted, checkpointing, loops

#14559 - Version mismatches between package, CITATION file, and Zenodo

Issue - State: open - Opened by timothygebhard about 2 years ago - 4 comments
Labels: ci, priority: 2, admin, pl

#14545 - External IP address does not bind to streamlit functions.

Issue - State: closed - Opened by tmquan about 2 years ago - 1 comment
Labels: priority: 1, app

#14523 - Can port addresses be changed when launching Lightning App?

Issue - State: closed - Opened by Felonious-Spellfire about 2 years ago
Labels: docs, app

#14344 - make App pytest independent

Issue - State: closed - Opened by Borda about 2 years ago - 4 comments
Labels: priority: 1, tests, app

#14284 - [WIP] uneven input support for DDP

Pull Request - State: closed - Opened by otaj about 2 years ago - 6 comments
Labels: pl

#14188 - Introduce `Logger.experiment_dir`

Issue - State: open - Opened by awaelchli over 2 years ago - 17 comments
Labels: feature, design, logger

#14167 - Learning Rate finder too strong loss smoothing

Issue - State: open - Opened by hcgasser over 2 years ago - 5 comments
Labels: feature, tuner

#14078 - RFC: Remove `num_nodes` Trainer argument and infer world size from cluster environment directly

Issue - State: open - Opened by awaelchli over 2 years ago - 8 comments
Labels: deprecation, strategy: ddp, environment, trainer: argument

#14036 - Cannot use `torch.jit.trace` to trace `LightningModule` in Lightning v1.7

Issue - State: closed - Opened by J-shang over 2 years ago - 17 comments
Labels: bug, lightningmodule

#13944 - LightningFlow state increases indefinitely

Issue - State: closed - Opened by belerico over 2 years ago - 5 comments
Labels: app

#13931 - Switch from click to google Fire

Issue - State: closed - Opened by nicolai86 over 2 years ago
Labels: discussion, app

#13902 - BackboneFinetuning - even with train_bn batch normalization is still learning.

Issue - State: closed - Opened by perliczka1 over 2 years ago - 6 comments
Labels: bug, callback: finetuning

#13891 - 404 page not found error for all "Web UIs * " pages in the Intermediate skills Level 10 docs.

Issue - State: closed - Opened by Nachimak28 over 2 years ago - 2 comments
Labels: won't fix, app

#13848 - Write a contributing guide for Lightning App

Issue - State: closed - Opened by tchaton over 2 years ago
Labels: docs, app

#13757 - Support remote Lightning apps templates

Issue - State: closed - Opened by manskx over 2 years ago - 3 comments
Labels: app

#13756 - Update List of Components

Issue - State: closed - Opened by oojo12 over 2 years ago - 1 comment
Labels: docs, app

#13745 - Lightning init [app/pl-app] issues

Issue - State: closed - Opened by luca-medeiros over 2 years ago - 3 comments
Labels: bug, won't fix

#13639 - In multinode training with ddp each node duplicates logs and has node_rank=0

Issue - State: closed - Opened by jessecambon over 2 years ago - 25 comments
Labels: feature, distributed, environment

#13521 - Lightning applications fail when Path(".") is used

Issue - State: closed - Opened by krishnakalyan3 over 2 years ago - 2 comments
Labels: won't fix, app

#13507 - Support dynamic dark theme

Issue - State: closed - Opened by MarcSkovMadsen over 2 years ago - 2 comments
Labels: feature, app

#13496 - Add support for embedding a Grid of iframes on the UI

Issue - State: closed - Opened by tchaton over 2 years ago - 2 comments
Labels: feature, won't fix, app

#13407 - Enable me to ignore or solve self signed certificate issue

Issue - State: closed - Opened by MarcSkovMadsen over 2 years ago - 4 comments
Labels: feature, won't fix, waiting on author, app

#13323 - Make the Streamlit frontend multi tenant by default

Issue - State: closed - Opened by zippeurfou over 2 years ago - 1 comment
Labels: feature, won't fix, app

#13124 - Resuming from a mid-epoch checkpoint produces negative time estimates

Issue - State: closed - Opened by fishbotics over 2 years ago - 19 comments
Labels: bug, priority: 0, progress bar: tqdm

#12917 - MisconfigurationException: Trying to inject `DistributedSampler` into the `AnnLoader` instance

Issue - State: closed - Opened by mbuttner over 2 years ago - 6 comments
Labels: bug, data handling, trainer: predict

#12833 - MLFlowLogger used with server crashes training

Issue - State: open - Opened by GinkoBalboa over 2 years ago - 7 comments
Labels: feature, logger: mlflow

#12756 - UserWarning: The flag devices=-1 will be ignored

Issue - State: closed - Opened by mnslarcher over 2 years ago - 4 comments
Labels: question

#12624 - Enable Hyperparameter logging from any hook in the LightningModule

Issue - State: open - Opened by cemde over 2 years ago - 8 comments
Labels: feature, lightningmodule

#12438 - Whether clarification/documentation/redesign is needed for customizing LightningCLI subcommands

Issue - State: open - Opened by mauvilsa over 2 years ago - 3 comments
Labels: docs, design, lightningcli

#12119 - Use :emphasize-lines: in sphinx docs to highlight code.

Issue - State: open - Opened by tchaton over 2 years ago - 6 comments
Labels: good first issue, docs, priority: 1

#12095 - Early stopping conditioned on metric `val_loss` which is not available

Issue - State: closed - Opened by JackRio over 2 years ago - 5 comments
Labels: bug

#12094 - EarlyStopping Callback relative threshold mode

Issue - State: open - Opened by tlpss over 2 years ago - 8 comments
Labels: feature, design, callback: early stopping

#12013 - Cannot pass callable as `model_class` to `LightningCLI`

Issue - State: closed - Opened by yangky11 over 2 years ago - 5 comments
Labels: bug, lightningcli

#11979 - `ModelCheckpoint` does NOT save anything if `every_n_train_steps` is greater than the number of training steps in a epoch

Issue - State: closed - Opened by ShaneTian over 2 years ago - 8 comments
Labels: bug, callback: model checkpoint

#11923 - DDP GPU memory imbalanced

Issue - State: closed - Opened by lukasfolle almost 3 years ago - 8 comments
Labels: bug, strategy: ddp, accelerator: cuda

#11923 - DDP GPU memory imbalanced

Issue - State: closed - Opened by lukasfolle almost 3 years ago - 8 comments
Labels: bug, strategy: ddp, accelerator: cuda

#11923 - DDP GPU memory imbalanced

Issue - State: closed - Opened by lukasfolle almost 3 years ago - 8 comments
Labels: bug, strategy: ddp, accelerator: cuda

#11922 - Support user-defined parallelization in the LightningModule

Issue - State: closed - Opened by ananthsub almost 3 years ago - 3 comments
Labels: feature, distributed, strategy

#11841 - [Bug] training (sometimes) freezes in a multi-gpu setting without throwing any errors or warnings.

Issue - State: closed - Opened by ragavsachdeva almost 3 years ago - 6 comments
Labels: bug, won't fix, strategy: ddp

#11547 - Ability to change the number of epochs after initiating the trainer.

Issue - State: closed - Opened by BartekKrzepkowski almost 3 years ago - 5 comments
Labels: feature, won't fix

#11438 - Integrate TorchTensorRt in order to increase speed during inference

Issue - State: open - Opened by Actis92 almost 3 years ago - 7 comments
Labels: feature, 3rd party, performance

#11242 - DDP training randomly stopping

Issue - State: closed - Opened by yoonseok312 almost 3 years ago - 41 comments
Labels: bug, strategy: ddp

#11224 - Add "interval": "validation" to scheduler configuration

Issue - State: open - Opened by de-gozaru almost 3 years ago - 3 comments
Labels: feature, priority: 1, lr scheduler

#11158 - Hang when using Lightning CLI from config file and DDP

Issue - State: closed - Opened by gau-nernst almost 3 years ago - 12 comments
Labels: bug, lightningcli

#11126 - LightningModule self.log add_dataloader_idx doesn't reduce properly the metric across dataloaders

Issue - State: open - Opened by tchaton almost 3 years ago - 13 comments
Labels: bug, priority: 1

#11029 - Resuming training throws the mid-epoch warning everytime

Issue - State: closed - Opened by rohitgr7 almost 3 years ago - 13 comments
Labels: refactor, checkpointing

#10914 - Add feature Exponential Moving Average (EMA)

Issue - State: open - Opened by hankyul2 almost 3 years ago - 53 comments
Labels: feature

#10876 - RichProgressBar is not compatible with nohup command

Issue - State: closed - Opened by quancs almost 3 years ago - 1 comment
Labels: bug, progress bar: rich

#10759 - Proper support for Pytorch SequentialLR Scheduler

Issue - State: open - Opened by marcm-ml almost 3 years ago - 9 comments
Labels: bug, 3rd party, lr scheduler

#10530 - Label tracking meta-issue (edit me to get automatically CC'ed on issues!)

Issue - State: open - Opened by carmocca about 3 years ago - 9 comments

#10389 - Lightning is very slow between epochs, compared to PyTorch.

Issue - State: closed - Opened by TheMrZZ about 3 years ago - 60 comments
Labels: bug, help wanted, priority: 1, performance

#10285 - UserWarning: you defined a validation_step but have no val_dataloader. Skipping val loop

Issue - State: closed - Opened by 7starsea about 3 years ago - 6 comments
Labels: bug, help wanted, won't fix

#10260 - Guarantee call order for callbacks

Issue - State: open - Opened by z-a-f about 3 years ago - 9 comments
Labels: question, callback

#9947 - Support `str(datamodule)`

Issue - State: open - Opened by carmocca about 3 years ago - 11 comments
Labels: feature, good first issue, data handling

#9938 - Support checkpoint save and load with Stochastic Weight Averaging

Pull Request - State: closed - Opened by adamreeve about 3 years ago - 31 comments
Labels: feature, ready, callback: swa, community, pl

#9450 - PyTorch profiler not working with the new version 1.4.6

Issue - State: closed - Opened by aprbw about 3 years ago - 10 comments
Labels: bug, help wanted, priority: 0, profiler

#9318 - dictionary update sequence element #0 has length 1; 2 is required

Issue - State: closed - Opened by cristianegea about 3 years ago - 18 comments
Labels: bug, help wanted

#9254 - Run the test set every epoch on a single GPU

Issue - State: closed - Opened by jipson7 about 3 years ago - 8 comments
Labels: feature, help wanted

#9170 - Enums parsing in hparams.yaml generated

Pull Request - State: closed - Opened by grajat90 about 3 years ago - 11 comments
Labels: bug, ready

#8720 - FineTuning and ReduceLROnPleateau scheduler fail - optimizer.param_groups

Issue - State: closed - Opened by FlorianMF over 3 years ago - 7 comments
Labels: feature, help wanted, won't fix

#8040 - Memory explodes when limit_train_batches argument used

Issue - State: closed - Opened by ejohb over 3 years ago - 10 comments
Labels: bug, help wanted, good first issue, priority: 0

#7653 - Allow returning of test results from Trainer.test

Issue - State: open - Opened by Rizhiy over 3 years ago - 11 comments
Labels: feature, design, trainer: validate, trainer: test

#7028 - [Grid] You must call wandb.init() before wandb.log()

Issue - State: closed - Opened by turian over 3 years ago - 8 comments
Labels: bug, help wanted

#6544 - Random job failures caused by the CheckpointConnector on slurm managed hpc

Issue - State: closed - Opened by dln22 over 3 years ago - 4 comments
Labels: bug, help wanted, priority: 0, waiting on author, checkpointing, environment: slurm

#6480 - on_epoch_end callback is called before on_validation_epoch_end

Issue - State: closed - Opened by dumitrescustefan over 3 years ago - 7 comments
Labels: bug, help wanted, working as intended

#6446 - Early Stopping Min Epochs

Issue - State: closed - Opened by thomasj02 over 3 years ago - 5 comments
Labels: feature, help wanted, won't fix, design, callback

#6389 - Disable automatic SLURM Detection

Issue - State: closed - Opened by amogkam over 3 years ago - 36 comments
Labels: feature, help wanted, priority: 0, design, environment: slurm

#6381 - fit hangs on single GPU

Issue - State: closed - Opened by fonnesbeck over 3 years ago - 9 comments
Labels: bug, help wanted, priority: 2

#6319 - AttributeError in .fit() method for Stallion notebook

Issue - State: closed - Opened by NatashaSvc over 3 years ago - 4 comments
Labels: bug, help wanted, won't fix, priority: 1

#6159 - Model loaded from checkpoint has bad accuracy

Issue - State: closed - Opened by Inspirateur over 3 years ago - 9 comments
Labels: question

#5969 - Lightning throws "bypassing sigterm" on Slurm Cluster for unknown reason

Issue - State: closed - Opened by vitusbenson almost 4 years ago - 15 comments
Labels: bug, help wanted, won't fix, environment: slurm, priority: 2

#5930 - Metrics API when using DDP and multi-GPU freezes on compute() at end of validation phase

Issue - State: closed - Opened by angadkalra almost 4 years ago - 31 comments
Labels: bug, help wanted, priority: 0

#5725 - Training Process hangs. Full RAM and SWAP.

Issue - State: closed - Opened by Arij-Aladel almost 4 years ago - 16 comments
Labels: won't fix

#5469 - WandB dropping items when logging LR or val_loss with accumulate_grad_batches > 1

Issue - State: closed - Opened by tadejsv almost 4 years ago - 9 comments
Labels: bug, help wanted, won't fix, logger, priority: 1

#5384 - Value interpolation with hydra composition

Issue - State: closed - Opened by celsofranssa almost 4 years ago - 14 comments
Labels: bug, help wanted, priority: 1

#5339 - Resuming should allow to differentiate what to resume (steps/opti/weights)

Issue - State: open - Opened by thoglu almost 4 years ago - 25 comments
Labels: feature, help wanted, priority: 1

#5180 - How can I stop WandbLogger instance being instantiated when calling load_from_checkpoint?

Issue - State: closed - Opened by kyoungrok0517 almost 4 years ago - 11 comments
Labels: bug, question, won't fix, logger, 3rd party

#4998 - `LightningModule.log(..., on_epoch=True)` logs with `global_step` instead of `current_epoch`

Issue - State: closed - Opened by quinor almost 4 years ago - 11 comments
Labels: feature, help wanted, logging

#4792 - checkpoint cannot be loaded without source code

Issue - State: closed - Opened by Sushobhan04 almost 4 years ago - 9 comments
Labels: help wanted, question, checkpointing

#4450 - Data loading hangs before first validation step

Issue - State: closed - Opened by jonashaag about 4 years ago - 30 comments
Labels: help wanted, won't fix, waiting on author

#4045 - continue training from checkpoint seems broken (high loss values), while reasonable with .eval()

Issue - State: closed - Opened by yairkit about 4 years ago - 20 comments
Labels: bug, help wanted, priority: 0

#3431 - How to disable printings about GPU/TPU

Issue - State: closed - Opened by 7rick03ligh7 about 4 years ago - 11 comments
Labels: question

#3325 - Support uneven DDP inputs with pytorch model.join

Issue - State: open - Opened by edenlightning about 4 years ago - 25 comments
Labels: feature, help wanted, distributed, 3rd party

#3228 - Log epoch as step when on_epoch=True and on_step=False

Issue - State: closed - Opened by ToucheSir about 4 years ago - 34 comments
Labels: feature, help wanted

#3107 - How automaticly load best model checkpoint on Trainer instance with TestTubeLogger

Issue - State: closed - Opened by Vichoko about 4 years ago - 10 comments
Labels: question

#2974 - fix tb hparams logging

Pull Request - State: closed - Opened by s-rog over 4 years ago - 32 comments
Labels: bug, feature

#2772 - Model alone makes different predictions compared to trainer + model

Issue - State: closed - Opened by JanRuettinger over 4 years ago - 14 comments
Labels: bug, help wanted, priority: 0

#2658 - Pytorch lightning switched to cpu in the middle of training. How can I debug this?

Issue - State: closed - Opened by samikhenissi over 4 years ago - 11 comments
Labels: bug, help wanted

#2351 - Model validation code is not called

Issue - State: closed - Opened by Uroc327 over 4 years ago - 13 comments
Labels: bug, help wanted

#2295 - Stop at Validation sanity check

Issue - State: closed - Opened by hminle over 4 years ago - 8 comments
Labels: question

#2189 - Can you make a new progress bar for each epoch?

Issue - State: closed - Opened by bjourne over 4 years ago - 22 comments
Labels: question, progress bar: tqdm

#2145 - How do you save a trained model in standard pytorch format?

Issue - State: closed - Opened by mm04926412 over 4 years ago - 13 comments
Labels: question