Lightning-AI/pytorch-lightning issues and pull requests

#20191 - Fix: Make `WandbLogger` upload models from all `ModelCheckpoint` callbacks, not just one

Pull Request - State: open - Opened by cgebbe 3 months ago - 1 comment
Labels: pl

#20190 - shortcuts for logging weights and biases norms

Issue - State: open - Opened by heth27 3 months ago
Labels: feature, needs triage

#20189 - Support IO Type Checkpoints for trainer.fit() in ckpt_path Parameter

Issue - State: open - Opened by kimjw0623 3 months ago
Labels: feature, needs triage

#20188 - Seeding and multi-GPU training

Issue - State: open - Opened by tomsons22 3 months ago - 1 comment
Labels: docs, needs triage

#20187 - OnExceptionCheckpoint callback suppresses exceptions and results in NCCL timeout

Issue - State: open - Opened by jackdent 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20186 - make plugin type check more flexible

Pull Request - State: open - Opened by jedyang97 3 months ago - 1 comment
Labels: pl

#20185 - Checkpoint callback run before validation step - stale or none monitor values considered for validation metrics

Issue - State: open - Opened by PheelaV 3 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x, ver: 2.3.x

#20184 - MLFlowLogger does not save config.yaml for each run

Issue - State: open - Opened by jeangud 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20183 - Add device property to lazy load functionality

Pull Request - State: closed - Opened by t-vi 3 months ago - 2 comments
Labels: ready, fabric

#20182 - `Error while merging hparams` when using LightningCLI and YAML

Issue - State: open - Opened by cgebbe 3 months ago - 5 comments
Labels: bug, needs triage, ver: 2.4.x

#20181 - trainer test and validate have issues with autograd

Issue - State: open - Opened by bpfrd 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20179 - trainer.validate() get different result from trainer.fit

Issue - State: open - Opened by matrix72c 3 months ago - 1 comment
Labels: bug, needs triage, ver: 2.2.x

#20177 - Trainer does not switch to train mode after validation step

Issue - State: open - Opened by ClemensSchwarke 3 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x, ver: 2.3.x

#20176 - Add `step` parameter to `TensorBoardLogger.log_hyperparams`

Pull Request - State: open - Opened by ringohoffman 3 months ago - 2 comments
Labels: fabric, pl

#20175 - docs: fixed the `init_module` and deepspeed

Pull Request - State: open - Opened by alyakin314 3 months ago - 1 comment
Labels: docs, fabric

#20173 - loss spikes in validation step when the model has multiple losses applied

Issue - State: open - Opened by RainRoboforce 3 months ago - 1 comment
Labels: question

#20172 - Re-enable passing BytesIO as path in `.to_onnx()`

Pull Request - State: closed - Opened by GdoongMathew 3 months ago - 2 comments
Labels: bug, community, pl

#20171 - Inconsistent input io type between `to_onnx` and `torch.onnx.export`.

Issue - State: closed - Opened by GdoongMathew 3 months ago
Labels: bug, ver: 2.3.x

#20170 - fix(docs): remove dead link from readme

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment

#20169 - fix(ci): resolve input str -> num conversion

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: ready, ci

#20168 - ci/docs: disable optional cache pkg

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: docs, ci

#20167 - ci: fix cleaning caches

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: bug, ci

#20166 - False positive iterable dataset warning for LitData StreamingDataset

Issue - State: open - Opened by awaelchli 3 months ago
Labels: bug, data handling

#20165 - Remove the `optimizer_to_device` logic if possible

Issue - State: open - Opened by awaelchli 3 months ago - 3 comments
Labels: refactor, checkpointing, performance

#20164 - docs: fix typo in `linkcheck_ignore`

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: docs, pl

#20163 - Fix parameter count in ModelSummary when parameters are DTensors

Pull Request - State: closed - Opened by awaelchli 3 months ago - 2 comments
Labels: bug, fabric, callback: model summary, strategy: fsdp, pl, fun

#20162 - Add email callback on train complete

Pull Request - State: open - Opened by loucaspapalazarou 3 months ago - 1 comment
Labels: pl

#20161 - Add diffusion example to README

Pull Request - State: closed - Opened by awaelchli 3 months ago - 1 comment

#20160 - Bug: automatic logging doesn't log metric on steps if .update is used

Issue - State: open - Opened by EtayLivne 4 months ago
Labels: bug, needs triage, ver: 2.2.x

#20159 - Count number of modules in train/eval mode in ModelSummary

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: feature, docs, callback: model summary, pl, fun

#20158 - Remove outdated `process_position` reference in progress bar docs.

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: docs, progress bar: tqdm, pl, fun

#20157 - docs: update ref to latest tutorials

Pull Request - State: closed - Opened by pl-ghost 4 months ago - 1 comment
Labels: docs, examples

#20156 - Avoid deprecated distutils for docs build

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: docs, ci, fun

#20155 - Update type check workflow to PyTorch 2.4

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: docs, ci, fabric, code quality, pl, fun, dependencies

#20154 - Prepare Lightning 2.4.0 release

Pull Request - State: closed - Opened by awaelchli 4 months ago - 1 comment
Labels: docs, ci, release, fabric, pl, fun, package

#20153 - Confusing recommendation to use sync_dist=True even with TorchMetrics

Issue - State: open - Opened by srprca 4 months ago - 9 comments
Labels: bug, help wanted, logging, ver: 2.2.x

#20152 - Typing for `_restricted_classmethod` (e.g. for `LightningModule.load_from_checkpoint`) has stopped working for mypy 1.11

Issue - State: closed - Opened by maciejzj 4 months ago - 1 comment
Labels: bug, help wanted, code quality, ver: 2.2.x

#20151 - Support computing parameter count in ModelSummary for FSDP models

Issue - State: closed - Opened by awaelchli 4 months ago
Labels: feature, callback: model summary, strategy: fsdp

#20150 - docs: adding link to obj detect. studio

Pull Request - State: closed - Opened by Borda 4 months ago - 1 comment
Labels: ready, docs

#20149 - How to use Webdataset in DDP setting? ValueError: you need to add an explicit nodesplitter to your input pipeline for multi-node training

Issue - State: open - Opened by cgebbe 4 months ago
Labels: help wanted, docs, ver: 2.2.x

#20148 - Loading `train_dataloader` before estimating `max_batches`

Pull Request - State: open - Opened by shihchengli 4 months ago - 1 comment
Labels: pl

#20147 - `link_arguments` does not work in lightning 2.3

Issue - State: open - Opened by peacekurella 4 months ago - 7 comments
Labels: bug, lightningcli, ver: 2.2.x

#20146 - Docs: Add note about version counter in `ModelCheckpoint`

Pull Request - State: closed - Opened by adosar 4 months ago - 1 comment
Labels: ready, docs, callback: model checkpoint, community, pl

#20145 - mps and manual_seed_all

Issue - State: closed - Opened by Tonys21 4 months ago - 2 comments
Labels: question

#20144 - docs: adding link to img classif. studio

Pull Request - State: closed - Opened by Borda 4 months ago - 1 comment
Labels: ready, docs

#20143 - Add simple LSTM example to demo folder

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: example, pl

#20142 - Add LLM finetuning Studio example to README.md

Pull Request - State: closed - Opened by awaelchli 4 months ago - 1 comment
Labels: ready, docs

#20141 - ModelCheckpoint reduce logic seems wrong

Issue - State: closed - Opened by manbango 4 months ago - 6 comments
Labels: question, logging, callback: model checkpoint

#20140 - StreamingDataset not working in multi-gpu environement

Issue - State: open - Opened by davidpicard 4 months ago - 3 comments
Labels: bug, repro needed

#20138 - FSDP Fails with floating nn.Parameter

Issue - State: open - Opened by schopra8 4 months ago - 6 comments
Labels: bug, duplicate, strategy: fsdp, ver: 2.2.x

#20137 - Support restoring callbacks' status when predicting

Issue - State: closed - Opened by zihaozou 4 months ago - 1 comment
Labels: feature

#20133 - Email Callback on training done

Issue - State: open - Opened by loucaspapalazarou 4 months ago - 6 comments
Labels: feature, discussion

#20130 - Documentation for filename convention of save_top_k in ModelCheckpoint

Issue - State: closed - Opened by adosar 4 months ago - 6 comments
Labels: docs

#20128 - training=False when use a pretrained model like BERT

Issue - State: closed - Opened by huangfu170 4 months ago - 3 comments
Labels: bug, docs

#20126 - Switch to PyTorch 2.4 stable testing

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: ci, fun, dockers

#20125 - Add `ddp_find_unused_parameters_true` alias in Fabric's DDPStrategy

Pull Request - State: closed - Opened by 01AbhiSingh 4 months ago - 4 comments
Labels: bug, fabric, community

#20121 - Fix attribute error on `_NotYetLoadedTensor` after loading checkpoint into quantized model with `_lazy_load()`

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: bug, fabric, precision: bnb

#20111 - docs: update ref to latest tutorials

Pull Request - State: open - Opened by pl-ghost 4 months ago - 1 comment
Labels: docs, examples

#20110 - CSV Logger acts weirdly in Callbacks

Issue - State: open - Opened by oabuhamdan 4 months ago
Labels: bug, needs triage, ver: 2.2.x

#20109 - Remove confusing warning "Missing logger folder"

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: fabric, pl, fun

#20108 - Avoid printing the seed info message multiple times

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: bug, fabric, pl, fun

#20107 - TypeError: on_train_batch_start() takes 3 positional arguments but 4 were given

Issue - State: closed - Opened by cxhagd 4 months ago - 3 comments
Labels: question

#20106 - OptimizerLRScheduler typing does not fit examples

Issue - State: closed - Opened by MalteEbner 4 months ago - 4 comments
Labels: bug, help wanted, example, ver: 2.2.x

#20105 - What happens during training with HuggingFace models in eval mode?

Issue - State: closed - Opened by StevenSong 4 months ago - 2 comments
Labels: bug

#20104 - Get `num_nodes` automatically

Issue - State: closed - Opened by BakerBunker 4 months ago - 2 comments
Labels: duplicate, feature, strategy: ddp

#20103 - LightningCLI doesn't save optimizer's configuration if not explicitly given

Issue - State: closed - Opened by adosar 4 months ago - 7 comments
Labels: question, lightningcli

#20102 - Remove outdated warnings filter for `reduce_op`

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: code quality, pl

#20101 - pl.TrainResult not found in 2.3.3

Issue - State: closed - Opened by manavkulshrestha 4 months ago - 1 comment
Labels: question

#20100 - Pytorch FSDPStrategy saving checkpoint behavior work correctly?

Issue - State: open - Opened by nbqu 4 months ago
Labels: bug, needs triage

#20099 - Fixed positional encoding not used in Demo Transformer

Pull Request - State: closed - Opened by K-H-Ismail 4 months ago - 1 comment
Labels: bug, example, community, pl

#20096 - Adding support for Python 12?

Issue - State: closed - Opened by mohammedsalah-ai 4 months ago - 2 comments
Labels: feature

#20095 - Sometimes error when logging model graph with `functional.interpolate` and `deterministic=True`

Issue - State: open - Opened by pandegaabyan 4 months ago
Labels: bug, needs triage, ver: 2.2.x

#20094 - Please allow automatic optimization for multiple optimizers again.

Issue - State: open - Opened by profPlum 4 months ago - 2 comments
Labels: feature, discussion

#20093 - wandblogger : File handles cannot be properly released

Issue - State: closed - Opened by zhf321 4 months ago - 1 comment
Labels: repro needed

#20092 - dirpath isn't updated when logger chages dir after first run

Issue - State: open - Opened by ScarWar 4 months ago - 2 comments
Labels: bug, ver: 2.2.x

#20091 - Add example to README

Pull Request - State: closed - Opened by awaelchli 4 months ago - 1 comment

#20090 - Remove numpy from base requirements

Pull Request - State: closed - Opened by 01AbhiSingh 4 months ago - 3 comments
Labels: ci, fabric, community, pl, dependencies

#20089 - Checkpoint silently not correctly restored.

Issue - State: closed - Opened by phuntast1c 4 months ago - 3 comments
Labels: bug, ver: 2.0.x, repro needed

#20088 - Sometimes I get Dataset Errors when using the lightning module in a distributed manor

Issue - State: open - Opened by asusdisciple 4 months ago
Labels: bug, needs triage

#20087 - Improve error message when object is passed to Trainer callbacks

Issue - State: closed - Opened by huangfu170 4 months ago - 2 comments
Labels: bug, help wanted, good first issue

#20086 - module statistics has no attribute mean

Issue - State: closed - Opened by FabianKuon 4 months ago - 3 comments
Labels: question, ver: 2.2.x

#20084 - build(deps): bump Lightning-AI/utilities from 0.11.3 to 0.11.4

Pull Request - State: closed - Opened by dependabot[bot] 4 months ago
Labels: ready, ci

#20083 - Make numpy an optional dependency

Pull Request - State: closed - Opened by 01AbhiSingh 4 months ago - 2 comments
Labels: has conflicts, fabric

#20082 - Fix: Use `dirpath` to resolve checkpoint path only when passed

Pull Request - State: closed - Opened by ScarWar 4 months ago - 1 comment
Labels: pl

#20081 - Remove deprecated `pkg_resources`

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: ci, fabric, pl, fun, dependencies, package

#20080 - Made numpy optional dependency in ```apply_func.py``` and ```logger.py```

Pull Request - State: closed - Opened by 01AbhiSingh 4 months ago - 4 comments
Labels: refactor, fabric, community

#20079 - Update PyTorch 2.4 tests

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: fabric, pl, fun

#20078 - Add Python 3.12 to the CPU test matrix

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: ci, fabric, tests, pl, fun, dependencies

#20077 - `pkg_resources` Deprecation Warnings on import

Issue - State: closed - Opened by LucaBonfiglioli 4 months ago - 2 comments
Labels: bug, duplicate, help wanted, package, ver: 2.2.x

#20076 - training time increase epoch by epoch

Issue - State: open - Opened by Eric-Lin-CVTE 4 months ago - 2 comments
Labels: bug, help wanted, performance, repro needed, ver: 2.2.x

#20075 - ModelCheckpoint save ckpts at the end of every epoch even in step-saving strategy

Issue - State: open - Opened by leonardodalinky 4 months ago
Labels: bug, needs triage, ver: 2.2.x

#20074 - Cannot pass `schedule` for `PyTorchProfiler` using `LightningCLI`

Issue - State: open - Opened by tensorcopy 4 months ago - 6 comments
Labels: bug, lightningcli

#20072 - Drop testing standalone package in GPU CI

Pull Request - State: closed - Opened by awaelchli 4 months ago
Labels: ci

#20071 - Remove support for Python 3.8

Pull Request - State: closed - Opened by awaelchli 4 months ago - 2 comments
Labels: ci, fabric, tests, pl, fun, package

#20070 - Using Stochastic Weight Averaging (SWA) and LearningRateFinder simultaneously can cause issues:

Issue - State: open - Opened by liuzeyu6 4 months ago
Labels: bug, help wanted, callback: swa, ver: 2.2.x

#20069 - Installing lightning 2.3.3 also installs numpy<3

Issue - State: closed - Opened by wsascha 4 months ago - 6 comments
Labels: bug, ver: 2.2.x

#20068 - Fix LightningCLI saving hyperparameters breaking change

Pull Request - State: closed - Opened by mauvilsa 4 months ago - 4 comments
Labels: bug, lightningcli, pl

#20067 - PowerSGD to FSDP Strategy

Issue - State: closed - Opened by anandxpeng 4 months ago
Labels: feature, needs triage

#20066 - Add reference to the `torch.compile` manual

Pull Request - State: closed - Opened by awaelchli 4 months ago - 1 comment
Labels: ready, docs, fabric, pl

#20065 - enable loading `universal checkpointing` checkpoint in `DeepSpeedStrategy`

Issue - State: open - Opened by zhoubay 4 months ago - 1 comment
Labels: feature, help wanted, strategy: deepspeed

GitHub / Lightning-AI/pytorch-lightning issues and pull requests