Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / lightning-ai/litdata issues and pull requests

#415 - Change S3Client to use user-provided storage_options even in Studio

Pull Request - State: open - Opened by grez72 1 day ago - 1 comment

#414 - use storage_options even when IS_IN_STUDIO

Issue - State: open - Opened by grez72 1 day ago
Labels: enhancement

#413 - Multithreading function for merge_datasets

Pull Request - State: closed - Opened by yhl48 6 days ago - 1 comment

#411 - `StreamingDataloader` is not split on each rank when training

Issue - State: closed - Opened by Aceticia 8 days ago - 8 comments
Labels: bug, help wanted

#410 - Bump version to 0.2.30

Pull Request - State: closed - Opened by bhimrazy 10 days ago - 1 comment

#409 - Clear Examples of use with different dataset types and code changes.

Issue - State: open - Opened by Woodr7 10 days ago - 2 comments
Labels: enhancement

#408 - training hangs with lightning ddp and cloud dir?

Issue - State: open - Opened by rxqy 14 days ago - 3 comments
Labels: bug, help wanted

#405 - πŸ“ docs: specify custom cache directory

Pull Request - State: closed - Opened by bhimrazy 17 days ago - 1 comment
Labels: documentation

#404 - Fix broken link for CONTRIBUTING.md

Pull Request - State: closed - Opened by bhimrazy 17 days ago - 1 comment

#403 - `use_checkpoint=True` creates invalid config.json file

Issue - State: closed - Opened by cyrildiagne 17 days ago - 4 comments
Labels: bug, help wanted

#402 - incorrect dataloader length when `drop_last=False`

Issue - State: open - Opened by grez72 17 days ago - 1 comment
Labels: bug, help wanted

#401 - Feat/add support for numpy datatypes in tokensloader

Pull Request - State: closed - Opened by bhimrazy 18 days ago - 1 comment
Labels: enhancement

#400 - Feature: Add support for numpy datatypes in TokensLoader

Issue - State: closed - Opened by bhimrazy 18 days ago
Labels: enhancement

#399 - Feat: add support for custom cache dir in Streaming Dataset

Pull Request - State: closed - Opened by bhimrazy 18 days ago - 1 comment
Labels: enhancement

#398 - Existing Cache files leads to permanent DataLoader hang

Issue - State: closed - Opened by lilavocado 30 days ago - 5 comments
Labels: bug, help wanted

#397 - pass storage options to s5cmd

Pull Request - State: closed - Opened by bhimrazy about 1 month ago - 2 comments
Labels: enhancement

#396 - Combine Small StreamingDatasets into 1 Large StreamingDataset

Issue - State: closed - Opened by schopra8 about 1 month ago - 5 comments
Labels: enhancement

#395 - correct the chunk size by adding header size

Pull Request - State: closed - Opened by tchaton about 1 month ago - 1 comment

#394 - correct the chunk size by adding header size

Pull Request - State: closed - Opened by dangthatsright about 1 month ago - 2 comments

#393 - Writing / Reading Bug involving writer `chunk_bytes` information

Issue - State: closed - Opened by dangthatsright about 1 month ago - 5 comments
Labels: bug, help wanted

#392 - Add Support for Custom S3 Configuration in s5cmd

Issue - State: closed - Opened by csy1204 about 1 month ago - 2 comments
Labels: enhancement

#391 - CONTRIBUTING.md for LitData

Pull Request - State: closed - Opened by deependujha about 1 month ago - 5 comments

#390 - fix: non-deterministic CI test failure

Pull Request - State: closed - Opened by deependujha about 1 month ago - 1 comment

#389 - `One of the worker has failed` error in test

Issue - State: closed - Opened by deependujha about 1 month ago - 1 comment
Labels: bug, help wanted

#388 - TreeSpec Error Accessing Data

Issue - State: closed - Opened by jmoller93 about 1 month ago - 5 comments
Labels: bug, help wanted

#386 - Improve CombinedStreamingDataset to handle multiple subdatasets efficiently

Issue - State: open - Opened by bhimrazy about 1 month ago
Labels: enhancement

#385 - πŸ“ Update Docs: Merge multiple optimized datasets into one

Pull Request - State: closed - Opened by bhimrazy about 1 month ago - 1 comment
Labels: documentation

#384 - update tags in pkg metadata

Pull Request - State: closed - Opened by Borda about 2 months ago - 1 comment
Labels: documentation

#383 - Bump version 0.2.29

Pull Request - State: closed - Opened by deependujha about 2 months ago - 3 comments

#382 - Update `PL Data` to `LitData`

Pull Request - State: closed - Opened by bhimrazy about 2 months ago - 1 comment

#381 - Fix/large num chunks error

Pull Request - State: closed - Opened by bhimrazy about 2 months ago - 3 comments

#380 - Revert "Feat: Using fsspec to download files"

Pull Request - State: closed - Opened by tchaton about 2 months ago - 4 comments

#379 - Bump version to 0.2.27

Pull Request - State: closed - Opened by bhimrazy about 2 months ago - 2 comments

#378 - Bump version to 0.2.27.dev

Pull Request - State: closed - Opened by rasbt about 2 months ago - 2 comments

#377 - fix import & asignement issue

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments
Labels: bug

#376 - improve hint readability

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments

#375 - Fix: Chunks deletion issue

Pull Request - State: closed - Opened by deependujha about 2 months ago - 11 comments

#374 - fixing docstrings

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments

#373 - reduce unnecessary `pass`

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments

#372 - remove not violated bandit rules from ignore

Pull Request - State: closed - Opened by Borda about 2 months ago - 1 comment

#371 - fixing typos in errors & docs

Pull Request - State: closed - Opened by Borda about 2 months ago - 2 comments

#370 - The config isn't consistent between chunks

Issue - State: open - Opened by AugustDev about 2 months ago - 5 comments
Labels: bug, help wanted

#369 - switch `lightning-cloud` to lightning SDK

Pull Request - State: closed - Opened by Borda about 2 months ago - 3 comments
Labels: enhancement, dependencies

#368 - How can I shut down automatically distributing data when using StreamingDataset?

Issue - State: open - Opened by ygtxr1997 2 months ago - 3 comments
Labels: enhancement, question

#367 - RuntimeError: All the chunks should have been deleted. Found ['chunk-0-0.bin']

Issue - State: closed - Opened by rasbt 2 months ago - 11 comments
Labels: bug, help wanted

#366 - Large number of chunks causes `OSError: [Errno 24] Too many open files`

Issue - State: closed - Opened by fdalvi 2 months ago - 8 comments
Labels: bug, help wanted

#365 - azure storage options

Pull Request - State: closed - Opened by mohanreddypmr 2 months ago - 3 comments

#364 - Bump cryptography from 42.0.8 to 43.0.1 in /requirements

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: dependencies

#363 - Failed to Resume Training w/ CombinedStreamingDataset

Issue - State: open - Opened by schopra8 2 months ago - 1 comment
Labels: bug, duplicate, help wanted

#362 - [WIP] : Fix resume issues with combined streaming dataset in dataloader

Pull Request - State: open - Opened by bhimrazy 2 months ago - 6 comments

#361 - ci: drop dependabot

Pull Request - State: closed - Opened by Borda 2 months ago - 1 comment

#360 - LitData release 0.2.26

Pull Request - State: closed - Opened by tchaton 2 months ago - 1 comment

#359 - Update README.md

Pull Request - State: closed - Opened by tchaton 2 months ago - 1 comment

#358 - Update README.md

Pull Request - State: closed - Opened by tchaton 2 months ago - 1 comment

#357 - tchaton patch 1

Pull Request - State: closed - Opened by tchaton 2 months ago - 1 comment

#356 - Update README.md

Pull Request - State: closed - Opened by tchaton 2 months ago - 1 comment

#355 - bump/ci: update to `0.11.7`

Pull Request - State: closed - Opened by Borda 2 months ago - 1 comment
Labels: ci / tests

#354 - A contributing.md for the project

Issue - State: closed - Opened by deependujha 2 months ago - 1 comment
Labels: enhancement, good first issue

#353 - Fix: Prevent multiple processes from copying the same file when using…

Pull Request - State: closed - Opened by dallmann-uniwue 2 months ago - 5 comments

#352 - Bump pypa/gh-action-pypi-publish from 1.9.0 to 1.10.0

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago
Labels: ci / tests

#350 - Adds check for existence of dataset path before loading index file

Pull Request - State: closed - Opened by bhimrazy 2 months ago - 1 comment

#349 - Error Should Indicate Missing Folder Instead of Missing index.json File

Issue - State: closed - Opened by bhimrazy 2 months ago - 1 comment
Labels: bug, help wanted

#348 - Feat: Using fsspec to download files

Pull Request - State: closed - Opened by deependujha 2 months ago - 6 comments

#347 - Update numpy requirement from <2.0 to <3.0

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: ci / tests

#346 - Bump mosaicml-streaming from 0.8.0 to 0.8.1

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago
Labels: ci / tests

#345 - Bump coverage from 7.5.3 to 7.6.1

Pull Request - State: closed - Opened by dependabot[bot] 2 months ago - 1 comment
Labels: ci / tests

#344 - Tests related to torchaudio fail

Issue - State: closed - Opened by deependujha 2 months ago - 1 comment
Labels: bug, help wanted

#343 - Bump: release version 0.2.25

Pull Request - State: closed - Opened by bhimrazy 3 months ago - 1 comment

#342 - Fix: Ensure Compression Algorithm is Installed Before Reading Compressed Data

Pull Request - State: closed - Opened by bhimrazy 3 months ago - 3 comments

#341 - Bug: Loading compressed data fails silently (no error message, the application simply hangs up)

Issue - State: closed - Opened by AugustDev 3 months ago - 3 comments
Labels: bug, help wanted

#340 - CombinedStreamingDataset causes NCCL timeout when using multiple nodes

Issue - State: open - Opened by hubenjm 3 months ago - 15 comments
Labels: bug, help wanted

#339 - Lazyload subsamples if subsample=1.0

Issue - State: open - Opened by deependujha 3 months ago
Labels: enhancement, question

#338 - boost(ci): run tests in parallel

Pull Request - State: closed - Opened by Borda 3 months ago - 2 comments

#337 - StreamingDataset intermittently fails due to lack of index.json

Issue - State: open - Opened by plra 3 months ago - 2 comments
Labels: bug, help wanted

#336 - bump: use the latest/fixed version of `RequirementCache`

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment
Labels: enhancement

#335 - ci: enable testing `py3.10` & prune unused workflows

Pull Request - State: closed - Opened by Borda 3 months ago - 1 comment

#334 - fix(lint): prune invalid configurations

Pull Request - State: closed - Opened by Borda 3 months ago

#333 - fix(ci): prune duplicated tests/checks

Pull Request - State: closed - Opened by Borda 3 months ago

#332 - Bump: release version 0.2.24

Pull Request - State: closed - Opened by bhimrazy 3 months ago

#330 - Reset state_dict after resume

Pull Request - State: closed - Opened by vgurev 3 months ago - 5 comments

#328 - Bug: Issues with Dataloader Batching Resulting in Uneven number of Batches and Streamed Items

Issue - State: closed - Opened by bhimrazy 3 months ago - 2 comments
Labels: bug, help wanted

#327 - Use different batch sizes in CombinedStreamingDataset

Issue - State: open - Opened by schopra8 3 months ago - 1 comment
Labels: enhancement, help wanted

#326 - Nitpick: random state best practice

Pull Request - State: closed - Opened by deependujha 3 months ago - 1 comment

#323 - Expose max download param

Pull Request - State: closed - Opened by animan42 3 months ago - 4 comments

#318 - Bugfix: inconsistent streaming dataloader state (specific to StreamingDataset)

Pull Request - State: closed - Opened by bhimrazy 3 months ago - 1 comment
Labels: priority 0

#316 - Bug: Inconsistent Behavior with StreamingDataloader loading states (specific to StreamingDataset)

Issue - State: closed - Opened by bhimrazy 3 months ago
Labels: bug, help wanted, priority 0

#309 - Fix: Optimize function error on linux

Pull Request - State: closed - Opened by deependujha 3 months ago - 2 comments

#272 - Fix: failing tests due to future warning related to torch.loads(weights_only=True)

Pull Request - State: closed - Opened by deependujha 4 months ago - 2 comments

#271 - Fix: optimize() with num_workers > 1 leads to deletion issues

Pull Request - State: closed - Opened by deependujha 4 months ago - 4 comments

#263 - Resuming Training with New Dataset Fails

Issue - State: closed - Opened by schopra8 4 months ago - 6 comments
Labels: bug, help wanted

#245 - `optimize()` with `num_workers > 1` leads to deletion issues

Issue - State: closed - Opened by awaelchli 4 months ago - 7 comments
Labels: bug, help wanted, ci / tests

#218 - Is the multinode data processing only available in lightning studio?

Issue - State: closed - Opened by rishabhm12 4 months ago - 6 comments
Labels: enhancement, help wanted

#191 - Add support for parquet files for storing the chunks

Issue - State: open - Opened by tchaton 5 months ago - 3 comments
Labels: enhancement, help wanted

#181 - Using fsspec to download files

Issue - State: open - Opened by samsja 5 months ago - 5 comments
Labels: enhancement, help wanted

#173 - Resolve num_workers when the user provides 0

Pull Request - State: closed - Opened by tchaton 5 months ago

#172 - Warning Message When Using StreamingDataset with DDP

Issue - State: closed - Opened by taemincho 5 months ago - 2 comments
Labels: bug, help wanted

#100 - Fix `map()` failing to create dataset when `input_dir` is None

Pull Request - State: closed - Opened by awaelchli 7 months ago
Labels: bug