Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / Lightning-AI/pytorch-lightning issues and pull requests
#20310 - `hparams` not loaded when loading checkpoint via LightningCLI
Issue -
State: open - Opened by YouRik about 2 months ago
Labels: bug, needs triage, ver: 2.4.x, ver: 2.3.x
#20309 - Split `reload_dataloaders_every_n_epochs` into separate parameters for train, val and test dataloaders
Issue -
State: closed - Opened by windring about 2 months ago
- 3 comments
Labels: feature, needs triage
#20308 - The problem shows: version incompatibility from v1.3.x to v2.4
Issue -
State: open - Opened by sunhan3787 about 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.4.x
#20307 - `Trainer`'s `.init_module()` context does not initialize model on target device
Issue -
State: open - Opened by jin-zhe about 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.4.x
#20306 - NCCL backend fails during multi-node, multi-GPU training
Issue -
State: open - Opened by raketenolli about 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20305 - minor stable update & update docs [rebase & merge]
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 2 comments
Labels: docs, ci, release, fabric, pl, dependencies, package
#20304 - fix(ci): update for breaking change with upload/download
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 1 comment
Labels: docs, priority: 0, ci
#20303 - the example that shows "The LightningModule also has access to the Hyperparameters" is not correct
Issue -
State: open - Opened by XinleiRen about 2 months ago
Labels: docs, needs triage
#20302 - fix(tests): update tests after torch 2.4.1
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 2 comments
Labels: fabric, pl, dependencies, package
#20301 - Add str method to datamodule
Pull Request -
State: open - Opened by MrWhatZitToYaa about 2 months ago
- 3 comments
Labels: waiting on author, lightningdatamodule, pl
#20300 - RichProgressBar: refresh_rate doesn't affect metric_component
Issue -
State: open - Opened by marios1861 about 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20299 - Incosistant memory usage comparing to huggingface trainer when using deepspeed
Issue -
State: open - Opened by mickeysun0104 about 2 months ago
- 5 comments
Labels: bug, needs triage, ver: 2.4.x
#20298 - docs: fix broken links to W&B
Pull Request -
State: closed - Opened by Borda about 2 months ago
- 1 comment
Labels: docs, pl
#20297 - docs: update favicon to match Lightning AI app consistency
Pull Request -
State: closed - Opened by EmilieLny about 2 months ago
Labels: ready, docs, release, fabric, pl
#20296 - Error encountered while using multiple optimizers inside a loop.
Issue -
State: open - Opened by RAraghavarora about 2 months ago
Labels: bug, needs triage
#20293 - Fabric does not sync gradients?
Issue -
State: closed - Opened by RuABraun about 2 months ago
Labels: bug, needs triage, ver: 2.2.x
#20290 - Update favicon to match Lightning AI app consistency
Pull Request -
State: closed - Opened by EmilieLny about 2 months ago
- 2 comments
Labels: ready, docs, fabric, pl
#20289 - Update favicon
Pull Request -
State: closed - Opened by EmilieLny about 2 months ago
- 1 comment
Labels: docs, pl
#20288 - Mid-epoch resume causes a single unwanted validation step (which is not a sanity check)
Issue -
State: open - Opened by Youyoun about 2 months ago
- 2 comments
Labels: bug, reproducibility, repro needed
#20285 - Add rtx 4080 super to chips dictionary
Pull Request -
State: closed - Opened by kazuar 2 months ago
- 2 comments
Labels: ready, fabric
#20284 - docs: update ref to latest tutorials
Pull Request -
State: closed - Opened by pl-ghost 2 months ago
- 1 comment
Labels: docs, examples
#20282 - Saving a checkpoint every n epochs does not work as expected
Issue -
State: closed - Opened by olly-writes-code 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20281 - `NeptuneCallback` produces lots of `X-coordinates (step) must be strictly increasing` errors
Issue -
State: open - Opened by iirekm 2 months ago
- 1 comment
Labels: bug, needs triage
#20280 - SLURM resubmission crashes because of multiprocessing error
Issue -
State: open - Opened by antonzub99 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20279 - Incorrect URI Prefix Stripping in MLflowLogger
Issue -
State: closed - Opened by awindmann 2 months ago
Labels: bug, ver: 2.4.x
#20278 - WandbLogger will cause error on TPU v3-8
Issue -
State: open - Opened by buoyancy99 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20277 - Validation is incorrectly run on resume
Issue -
State: open - Opened by PiotrDabkowski 2 months ago
- 3 comments
Labels: bug, needs triage, ver: 2.4.x
#20276 - Lightning place model inputs and model to different devices
Issue -
State: closed - Opened by Kami-chanw 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.4.x
#20275 - comet_ml logger update
Pull Request -
State: open - Opened by japdubengsub 2 months ago
- 4 comments
Labels: pl
#20274 - `strict = False` does not work when the checkpoint is distributed
Issue -
State: open - Opened by NathanGodey 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.4.x
#20273 - MLFlow logger returns None when MLFlow server is used
Issue -
State: open - Opened by lilruwu 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20272 - Custom batch sampler fails to re-instantiate in `_dataloader_init_kwargs_resolve_sampler`
Issue -
State: closed - Opened by Kami-chanw 2 months ago
- 1 comment
Labels: refactor, needs triage
#20270 - _atomic_save with transaction cause "Invalid cross-device link" error
Issue -
State: open - Opened by RichardChe 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20269 - Add compile_fn parameter for Trainer
Pull Request -
State: open - Opened by mieshkiwrk 2 months ago
- 5 comments
Labels: waiting on author, pl, torch.compile
#20268 - rich progress bar shows v_num as 0.000
Issue -
State: open - Opened by npuichigo 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20267 - build(deps): bump Lightning-AI/utilities from 0.11.6 to 0.11.7
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
- 1 comment
Labels: ci
#20266 - build(deps): bump peter-evans/create-pull-request from 6 to 7
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
- 1 comment
Labels: ci
#20265 - `_update_dataloader` improperly copies state of subclassed dataloader with attribute names that differ from `__init__` parameters.
Issue -
State: open - Opened by spenceforce 2 months ago
Labels: bug, needs triage, ver: 2.4.x, ver: 2.3.x
#20264 - Problem in multi-gpu training
Issue -
State: closed - Opened by xizaoqu 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.1.x
#20262 - docs: update ref to latest tutorials
Pull Request -
State: closed - Opened by pl-ghost 2 months ago
- 1 comment
Labels: docs, examples
#20260 - Make RichProgressBar visible for both light and dark background
Pull Request -
State: closed - Opened by tshu-w 2 months ago
- 1 comment
Labels: pl
#20258 - Registered buffers not moved to correct device when using DeepSpeed Stage 3
Issue -
State: open - Opened by amorehead 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20255 - Weights are misshappen when using model's forward in on_fit_end() hook with FSDP
Issue -
State: open - Opened by QuentinAndre11 2 months ago
Labels: bug, needs triage, ver: 2.3.x
#20253 - Cannot turn off sampler injection at inference time.
Issue -
State: open - Opened by ovavourakis 2 months ago
Labels: bug, needs triage, ver: 2.1.x
#20252 - fix a typo of precision help doc
Pull Request -
State: closed - Opened by vincentme 2 months ago
- 1 comment
Labels: docs, fabric
#20251 - Mixed precision, ddp and torch.no_grad()
Issue -
State: open - Opened by tomsons22 2 months ago
Labels: bug, needs triage, ver: 2.1.x
#20250 - LearningRateMonitor broken on MPS backend with Apple silicon
Issue -
State: open - Opened by MalteEbner 2 months ago
Labels: bug, needs triage, ver: 2.4.x
#20249 - Shuffle order is the same across runs when using strategy='ddp'
Issue -
State: open - Opened by bogdanmagometa 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.2.x
#20248 - Update LR step scheduler to use total step to work across epochs
Pull Request -
State: closed - Opened by falckt 2 months ago
- 2 comments
Labels: pl
#20247 - Update model_checkpoint.py
Pull Request -
State: closed - Opened by happyfox-dot 2 months ago
- 2 comments
Labels: pl
#20246 - build(deps): bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
- 1 comment
Labels: docs, ci, dependencies, package, github_actions
#20245 - ModelCheckpoint's `save_last` does not adhere to documentation
Issue -
State: open - Opened by godaup 2 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.3.x
#20244 - RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: Invalid --2a886c8_slice_builder_worker_addresses specified. Expected 4 worker addresses, got 1.
Issue -
State: open - Opened by Bhargav230m 2 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20243 - Checkpoints Saving with different permissions to account defaults
Issue -
State: open - Opened by CompRhys 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20242 - Add something like `use_compile` parameter for Trainer
Issue -
State: open - Opened by mieshkiwrk 3 months ago
- 1 comment
Labels: feature, needs triage
#20241 - typo
Issue -
State: closed - Opened by 0x1orz 3 months ago
Labels: feature, needs triage
#20240 - Easier access to train_batch_idx for control
Issue -
State: open - Opened by heth27 3 months ago
Labels: feature, needs triage
#20239 - FSDP Strategy not working with bfloat16
Issue -
State: open - Opened by whitehathacker-git 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20238 - DDPStrategy under windows is complaining about missing libuv
Issue -
State: open - Opened by benHeid 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20237 - Add support to Fairscale Parallel Layers
Issue -
State: open - Opened by loretoparisi 3 months ago
Labels: feature, needs triage
#20236 - Support variable batch size in throughput callback
Pull Request -
State: open - Opened by alex-hh 3 months ago
- 1 comment
Labels: pl
#20235 - Token throughput monitor assumes batch size is fixed but does not raise meaningful error
Issue -
State: open - Opened by alex-hh 3 months ago
Labels: bug, callback: throughput, ver: 2.4.x
#20234 - Add support to Llama 3.1
Issue -
State: open - Opened by loretoparisi 3 months ago
- 1 comment
Labels: feature, needs triage
#20233 - Recommended way to save checkpoints from internal compiled model
Issue -
State: open - Opened by fteufel 3 months ago
Labels: feature, needs triage
#20232 - environment variable WORLD_SIZE is incorrectly set to 1 after trainer.fit is done
Issue -
State: open - Opened by simon-ging 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20231 - torch.cuda.OutOfMemoryError after running tuner.scale_batch_size() in "binsearch" mode
Issue -
State: open - Opened by rittik9 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20230 - KeyError: 'Trying to restore optimizer state but checkpoint contains only the model. This is probably due to `ModelCheckpoint.save_weights_only` being set to `True`.' But optim_cfg is in model
Issue -
State: open - Opened by CSteinhardt153 3 months ago
Labels: bug, needs triage, ver: 2.2.x
#20229 - RuntimeError: each element in list of batch should be of equal size
Issue -
State: closed - Opened by loretoparisi 3 months ago
- 1 comment
Labels: bug, needs triage
#20227 - Dashboard
Issue -
State: open - Opened by qbilius 3 months ago
Labels: feature, needs triage
#20226 - "FileExistsError: [Errno 17] File exists: '/000000_epoch_shape'" using the ddp_notebook strategy with data stored in MDS (mosaic streaming) format
Issue -
State: open - Opened by elbamos 3 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.4.x
#20223 - metric.compute() hangs when using DDP with multiple GPUs
Issue -
State: open - Opened by manavkulshrestha 3 months ago
- 5 comments
Labels: bug, needs triage, ver: 2.4.x
#20221 - Fix LightningCLI failing when both module and data module save hyperparameters
Pull Request -
State: open - Opened by mauvilsa 3 months ago
- 1 comment
Labels: waiting on author, pl
#20220 - Can no longer install versions 1.5.10-1.6.5
Issue -
State: open - Opened by JonathanBhimani-Burrows 3 months ago
- 8 comments
Labels: bug, needs triage
#20219 - NCCL error: Invalid rank requested
Issue -
State: closed - Opened by loretoparisi 3 months ago
- 2 comments
Labels: bug, needs triage
#20218 - using deepspeed in pytorch lightning, a bug occurred : RuntimeError: Function ConvolutionBackward0 returned an invalid gradient at index 1
Issue -
State: open - Opened by hongsixin 3 months ago
Labels: bug, needs triage
#20217 - Questions about loading a pre-trained model using lightnining CLI for continue training
Issue -
State: open - Opened by HelloWorldLTY 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20216 - Switching into training mode in training_step
Issue -
State: open - Opened by heth27 3 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.4.x
#20215 - Model does not update its weights
Issue -
State: open - Opened by kopalja 3 months ago
- 4 comments
Labels: bug, needs triage, ver: 2.4.x
#20214 - You really should make the access to optimizers and schedulers more comprehensible and more detailed.
Issue -
State: open - Opened by onbigion13 3 months ago
Labels: docs, needs triage
#20213 - added ignore for hyper params
Pull Request -
State: open - Opened by aseemk98 3 months ago
- 1 comment
Labels: waiting on author, pl
#20212 - KeyError:pytorch_lightning.utilities.argparse_utils
Issue -
State: open - Opened by Aaron1993 3 months ago
Labels: bug, needs triage
#20211 - Stepwise LR scheduler
Pull Request -
State: open - Opened by 01AbhiSingh 3 months ago
- 6 comments
Labels: waiting on author, pl
#20210 - fix: allow loading of nested states in `Fabric.load` [wip]
Pull Request -
State: open - Opened by Markus28 3 months ago
- 1 comment
Labels: fabric
#20209 - ImportError: cannot import name '_TORCHMETRICS_GREATER_EQUAL_1_0_0' from 'pytorch_lightning.utilities.imports'
Issue -
State: open - Opened by Horizon-369 3 months ago
Labels: bug, needs triage, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x
#20208 - Unexpected Behavior: `Fabric.load` operates out-of-place on nested states
Issue -
State: open - Opened by Markus28 3 months ago
- 1 comment
Labels: bug, needs triage, ver: 2.3.x
#20207 - Stepwise LR scheduler
Pull Request -
State: closed - Opened by 01AbhiSingh 3 months ago
- 1 comment
Labels: fabric, pl
#20206 - Training crash when using XLA profiler on XLA accelerator and manual optimization
Issue -
State: open - Opened by sdsuster 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20205 - Allow passing custom reader/writer in _distributed_checkpoint_save and _distributed_checkpoint_load.
Issue -
State: open - Opened by Yash9060 3 months ago
Labels: feature, needs triage
#20204 - Loading a model changes pytorch random state
Issue -
State: open - Opened by heth27 3 months ago
Labels: bug, needs triage, ver: 2.4.x
#20203 - fix: correct the positional encoding of Transformer in pytorch examples
Pull Request -
State: closed - Opened by Galaxy-Husky 3 months ago
- 3 comments
Labels: ready, pl
#20202 - Feat: support reusable instance of ModelCheckpoint
Pull Request -
State: open - Opened by ScarWar 3 months ago
Labels: pl
#20201 - 7x slower training speed when switching from lightning 1.0 to 2.0
Issue -
State: open - Opened by MaiBe-ctrl 3 months ago
- 2 comments
Labels: bug, needs triage, ver: 2.1.x, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x
#20200 - ModelCheckpoint Callback not working/saving unless `save_on_train_epoch_end` is enabled True which considerably slows down training
Issue -
State: open - Opened by snknitin 3 months ago
Labels: bug, needs triage, ver: 2.1.x, ver: 2.2.x, ver: 2.4.x, ver: 2.3.x
#20199 - LightningCLI: --help argument given after the subcommand fails
Issue -
State: open - Opened by nisar2 3 months ago
- 5 comments
Labels: bug, needs triage, ver: 2.4.x
#20198 - Add documentation note for TQDMProgressBar
Pull Request -
State: closed - Opened by NishantDahal 3 months ago
Labels: docs, pl
#20197 - docs: adding link to recommendation studio
Pull Request -
State: closed - Opened by Borda 3 months ago
- 1 comment
Labels: docs
#20196 - docs: adding link to forecasting studio
Pull Request -
State: closed - Opened by Borda 3 months ago
- 1 comment
Labels: docs
#20194 - Add param_group name for BaseFinetuningCallback
Issue -
State: open - Opened by Jserax 3 months ago
Labels: feature, needs triage
#20193 - feat: allow immutable file upload for wandb logger
Pull Request -
State: open - Opened by cgebbe 3 months ago
Labels: pl
#20192 - Unable to load Checkpoint
Issue -
State: closed - Opened by JaLnYn 3 months ago
Labels: bug, needs triage, ver: 2.4.x