Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / Lightning-AI/pytorch-lightning issues and pull requests

#20310 - `hparams` not loaded when loading checkpoint via LightningCLI

Issue - State: open - Opened by YouRik about 2 months ago
Labels: bug, needs triage, ver: 2.4.x, ver: 2.3.x

#20309 - Split `reload_dataloaders_every_n_epochs` into separate parameters for train, val and test dataloaders

Issue - State: closed - Opened by windring about 2 months ago - 3 comments
Labels: feature, needs triage

#20308 - The problem shows: version incompatibility from v1.3.x to v2.4

Issue - State: open - Opened by sunhan3787 about 2 months ago - 1 comment
Labels: bug, needs triage, ver: 2.4.x

#20307 - `Trainer`'s `.init_module()` context does not initialize model on target device

Issue - State: open - Opened by jin-zhe about 2 months ago - 1 comment
Labels: bug, needs triage, ver: 2.4.x

#20306 - NCCL backend fails during multi-node, multi-GPU training

Issue - State: open - Opened by raketenolli about 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20305 - minor stable update & update docs [rebase & merge]

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments
Labels: docs, ci, release, fabric, pl, dependencies, package

#20304 - fix(ci): update for breaking change with upload/download

Pull Request - State: closed - Opened by Borda about 2 months ago - 1 comment
Labels: docs, priority: 0, ci

#20303 - the example that shows "The LightningModule also has access to the Hyperparameters" is not correct

Issue - State: open - Opened by XinleiRen about 2 months ago
Labels: docs, needs triage

#20302 - fix(tests): update tests after torch 2.4.1

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments
Labels: fabric, pl, dependencies, package

#20301 - Add str method to datamodule

Pull Request - State: open - Opened by MrWhatZitToYaa about 2 months ago - 3 comments
Labels: waiting on author, lightningdatamodule, pl

#20300 - RichProgressBar: refresh_rate doesn't affect metric_component

Issue - State: open - Opened by marios1861 about 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20299 - Incosistant memory usage comparing to huggingface trainer when using deepspeed

Issue - State: open - Opened by mickeysun0104 about 2 months ago - 5 comments
Labels: bug, needs triage, ver: 2.4.x

#20298 - docs: fix broken links to W&B

Pull Request - State: closed - Opened by Borda about 2 months ago - 1 comment
Labels: docs, pl

#20297 - docs: update favicon to match Lightning AI app consistency

Pull Request - State: closed - Opened by EmilieLny about 2 months ago
Labels: ready, docs, release, fabric, pl

#20296 - Error encountered while using multiple optimizers inside a loop.

Issue - State: open - Opened by RAraghavarora about 2 months ago
Labels: bug, needs triage

#20293 - Fabric does not sync gradients?

Issue - State: closed - Opened by RuABraun about 2 months ago
Labels: bug, needs triage, ver: 2.2.x

#20290 - Update favicon to match Lightning AI app consistency

Pull Request - State: closed - Opened by EmilieLny about 2 months ago - 2 comments
Labels: ready, docs, fabric, pl

#20289 - Update favicon

Pull Request - State: closed - Opened by EmilieLny about 2 months ago - 1 comment
Labels: docs, pl

#20288 - Mid-epoch resume causes a single unwanted validation step (which is not a sanity check)

Issue - State: open - Opened by Youyoun about 2 months ago - 2 comments
Labels: bug, reproducibility, repro needed

#20285 - Add rtx 4080 super to chips dictionary

Pull Request - State: closed - Opened by kazuar 2 months ago - 2 comments
Labels: ready, fabric

#20284 - docs: update ref to latest tutorials

Pull Request - State: closed - Opened by pl-ghost 2 months ago - 1 comment
Labels: docs, examples

#20282 - Saving a checkpoint every n epochs does not work as expected

Issue - State: closed - Opened by olly-writes-code 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x

#20281 - `NeptuneCallback` produces lots of `X-coordinates (step) must be strictly increasing` errors

Issue - State: open - Opened by iirekm 2 months ago - 1 comment
Labels: bug, needs triage

#20280 - SLURM resubmission crashes because of multiprocessing error

Issue - State: open - Opened by antonzub99 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x

#20279 - Incorrect URI Prefix Stripping in MLflowLogger

Issue - State: closed - Opened by awindmann 2 months ago
Labels: bug, ver: 2.4.x

#20278 - WandbLogger will cause error on TPU v3-8

Issue - State: open - Opened by buoyancy99 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20277 - Validation is incorrectly run on resume

Issue - State: open - Opened by PiotrDabkowski 2 months ago - 3 comments
Labels: bug, needs triage, ver: 2.4.x

#20276 - Lightning place model inputs and model to different devices

Issue - State: closed - Opened by Kami-chanw 2 months ago - 1 comment
Labels: bug, needs triage, ver: 2.4.x

#20275 - comet_ml logger update

Pull Request - State: open - Opened by japdubengsub 2 months ago - 4 comments
Labels: pl

#20274 - `strict = False` does not work when the checkpoint is distributed

Issue - State: open - Opened by NathanGodey 2 months ago - 1 comment
Labels: bug, needs triage, ver: 2.4.x

#20273 - MLFlow logger returns None when MLFlow server is used

Issue - State: open - Opened by lilruwu 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20272 - Custom batch sampler fails to re-instantiate in `_dataloader_init_kwargs_resolve_sampler`

Issue - State: closed - Opened by Kami-chanw 2 months ago - 1 comment
Labels: refactor, needs triage

#20270 - _atomic_save with transaction cause "Invalid cross-device link" error

Issue - State: open - Opened by RichardChe 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x

#20269 - Add compile_fn parameter for Trainer

Pull Request - State: open - Opened by mieshkiwrk 2 months ago - 5 comments
Labels: waiting on author, pl, torch.compile

#20268 - rich progress bar shows v_num as 0.000

Issue - State: open - Opened by npuichigo 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20267 - build(deps): bump Lightning-AI/utilities from 0.11.6 to 0.11.7

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: ci

#20266 - build(deps): bump peter-evans/create-pull-request from 6 to 7

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: ci

#20264 - Problem in multi-gpu training

Issue - State: closed - Opened by xizaoqu 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.1.x

#20262 - docs: update ref to latest tutorials

Pull Request - State: closed - Opened by pl-ghost 2 months ago - 1 comment
Labels: docs, examples

#20260 - Make RichProgressBar visible for both light and dark background

Pull Request - State: closed - Opened by tshu-w 2 months ago - 1 comment
Labels: pl

#20258 - Registered buffers not moved to correct device when using DeepSpeed Stage 3

Issue - State: open - Opened by amorehead 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.4.x

#20255 - Weights are misshappen when using model's forward in on_fit_end() hook with FSDP

Issue - State: open - Opened by QuentinAndre11 2 months ago
Labels: bug, needs triage, ver: 2.3.x

#20253 - Cannot turn off sampler injection at inference time.

Issue - State: open - Opened by ovavourakis 2 months ago
Labels: bug, needs triage, ver: 2.1.x

#20252 - fix a typo of precision help doc

Pull Request - State: closed - Opened by vincentme 2 months ago - 1 comment
Labels: docs, fabric

#20251 - Mixed precision, ddp and torch.no_grad()

Issue - State: open - Opened by tomsons22 2 months ago
Labels: bug, needs triage, ver: 2.1.x

#20250 - LearningRateMonitor broken on MPS backend with Apple silicon

Issue - State: open - Opened by MalteEbner 2 months ago
Labels: bug, needs triage, ver: 2.4.x

#20249 - Shuffle order is the same across runs when using strategy='ddp'

Issue - State: open - Opened by bogdanmagometa 2 months ago - 2 comments
Labels: bug, needs triage, ver: 2.2.x

#20248 - Update LR step scheduler to use total step to work across epochs

Pull Request - State: closed - Opened by falckt 2 months ago - 2 comments
Labels: pl

#20247 - Update model_checkpoint.py

Pull Request - State: closed - Opened by happyfox-dot 2 months ago - 2 comments
Labels: pl

#20246 - build(deps): bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: docs, ci, dependencies, package, github_actions

#20245 - ModelCheckpoint's `save_last` does not adhere to documentation

Issue - State: open - Opened by godaup 2 months ago - 1 comment
Labels: bug, needs triage, ver: 2.3.x

#20243 - Checkpoints Saving with different permissions to account defaults

Issue - State: open - Opened by CompRhys 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20242 - Add something like `use_compile` parameter for Trainer

Issue - State: open - Opened by mieshkiwrk 3 months ago - 1 comment
Labels: feature, needs triage

#20241 - typo

Issue - State: closed - Opened by 0x1orz 3 months ago
Labels: feature, needs triage

#20240 - Easier access to train_batch_idx for control

Issue - State: open - Opened by heth27 3 months ago
Labels: feature, needs triage

#20239 - FSDP Strategy not working with bfloat16

Issue - State: open - Opened by whitehathacker-git 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20238 - DDPStrategy under windows is complaining about missing libuv

Issue - State: open - Opened by benHeid 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20237 - Add support to Fairscale Parallel Layers

Issue - State: open - Opened by loretoparisi 3 months ago
Labels: feature, needs triage

#20236 - Support variable batch size in throughput callback

Pull Request - State: open - Opened by alex-hh 3 months ago - 1 comment
Labels: pl

#20235 - Token throughput monitor assumes batch size is fixed but does not raise meaningful error

Issue - State: open - Opened by alex-hh 3 months ago
Labels: bug, callback: throughput, ver: 2.4.x

#20234 - Add support to Llama 3.1

Issue - State: open - Opened by loretoparisi 3 months ago - 1 comment
Labels: feature, needs triage

#20233 - Recommended way to save checkpoints from internal compiled model

Issue - State: open - Opened by fteufel 3 months ago
Labels: feature, needs triage

#20232 - environment variable WORLD_SIZE is incorrectly set to 1 after trainer.fit is done

Issue - State: open - Opened by simon-ging 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20231 - torch.cuda.OutOfMemoryError after running tuner.scale_batch_size() in "binsearch" mode

Issue - State: open - Opened by rittik9 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20229 - RuntimeError: each element in list of batch should be of equal size

Issue - State: closed - Opened by loretoparisi 3 months ago - 1 comment
Labels: bug, needs triage

#20227 - Dashboard

Issue - State: open - Opened by qbilius 3 months ago
Labels: feature, needs triage

#20223 - metric.compute() hangs when using DDP with multiple GPUs

Issue - State: open - Opened by manavkulshrestha 3 months ago - 5 comments
Labels: bug, needs triage, ver: 2.4.x

#20221 - Fix LightningCLI failing when both module and data module save hyperparameters

Pull Request - State: open - Opened by mauvilsa 3 months ago - 1 comment
Labels: waiting on author, pl

#20220 - Can no longer install versions 1.5.10-1.6.5

Issue - State: open - Opened by JonathanBhimani-Burrows 3 months ago - 8 comments
Labels: bug, needs triage

#20219 - NCCL error: Invalid rank requested

Issue - State: closed - Opened by loretoparisi 3 months ago - 2 comments
Labels: bug, needs triage

#20217 - Questions about loading a pre-trained model using lightnining CLI for continue training

Issue - State: open - Opened by HelloWorldLTY 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20216 - Switching into training mode in training_step

Issue - State: open - Opened by heth27 3 months ago - 1 comment
Labels: bug, needs triage, ver: 2.4.x

#20215 - Model does not update its weights

Issue - State: open - Opened by kopalja 3 months ago - 4 comments
Labels: bug, needs triage, ver: 2.4.x

#20213 - added ignore for hyper params

Pull Request - State: open - Opened by aseemk98 3 months ago - 1 comment
Labels: waiting on author, pl

#20212 - KeyError:pytorch_lightning.utilities.argparse_utils

Issue - State: open - Opened by Aaron1993 3 months ago
Labels: bug, needs triage

#20211 - Stepwise LR scheduler

Pull Request - State: open - Opened by 01AbhiSingh 3 months ago - 6 comments
Labels: waiting on author, pl

#20210 - fix: allow loading of nested states in `Fabric.load` [wip]

Pull Request - State: open - Opened by Markus28 3 months ago - 1 comment
Labels: fabric

#20209 - ImportError: cannot import name '_TORCHMETRICS_GREATER_EQUAL_1_0_0' from 'pytorch_lightning.utilities.imports'

Issue - State: open - Opened by Horizon-369 3 months ago
Labels: bug, needs triage, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x

#20208 - Unexpected Behavior: `Fabric.load` operates out-of-place on nested states

Issue - State: open - Opened by Markus28 3 months ago - 1 comment
Labels: bug, needs triage, ver: 2.3.x

#20207 - Stepwise LR scheduler

Pull Request - State: closed - Opened by 01AbhiSingh 3 months ago - 1 comment
Labels: fabric, pl

#20206 - Training crash when using XLA profiler on XLA accelerator and manual optimization

Issue - State: open - Opened by sdsuster 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20204 - Loading a model changes pytorch random state

Issue - State: open - Opened by heth27 3 months ago
Labels: bug, needs triage, ver: 2.4.x

#20203 - fix: correct the positional encoding of Transformer in pytorch examples

Pull Request - State: closed - Opened by Galaxy-Husky 3 months ago - 3 comments
Labels: ready, pl

#20202 - Feat: support reusable instance of ModelCheckpoint

Pull Request - State: open - Opened by ScarWar 3 months ago
Labels: pl

#20201 - 7x slower training speed when switching from lightning 1.0 to 2.0

Issue - State: open - Opened by MaiBe-ctrl 3 months ago - 2 comments
Labels: bug, needs triage, ver: 2.1.x, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x

#20200 - ModelCheckpoint Callback not working/saving unless `save_on_train_epoch_end` is enabled True which considerably slows down training

Issue - State: open - Opened by snknitin 3 months ago
Labels: bug, needs triage, ver: 2.1.x, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x

#20199 - LightningCLI: --help argument given after the subcommand fails

Issue - State: open - Opened by nisar2 3 months ago - 5 comments
Labels: bug, needs triage, ver: 2.4.x

#20198 - Add documentation note for TQDMProgressBar

Pull Request - State: closed - Opened by NishantDahal 3 months ago
Labels: docs, pl

#20197 - docs: adding link to recommendation studio

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: docs

#20196 - docs: adding link to forecasting studio

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: docs

#20194 - Add param_group name for BaseFinetuningCallback

Issue - State: open - Opened by Jserax 3 months ago
Labels: feature, needs triage

#20193 - feat: allow immutable file upload for wandb logger

Pull Request - State: open - Opened by cgebbe 3 months ago
Labels: pl

#20192 - Unable to load Checkpoint

Issue - State: closed - Opened by JaLnYn 3 months ago
Labels: bug, needs triage, ver: 2.4.x