Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / mosaicml/streaming issues and pull requests

#729 - Update huggingface-hub requirement from <0.24,>=0.23.4 to >=0.23.4,<0.25

Pull Request - State: closed - Opened by dependabot[bot] 4 months ago - 1 comment
Labels: dependencies

#727 - Incorrect container name in download_from_azure

Issue - State: closed - Opened by jaehwana2z 4 months ago - 1 comment
Labels: bug

#726 - Download optimal for device_per_stream batching method.

Issue - State: open - Opened by huxuan 4 months ago - 4 comments

#725 - Replication changes sample order

Issue - State: open - Opened by CodeCreator 4 months ago - 3 comments
Labels: bug

#724 - Bump fastapi from 0.111.0 to 0.111.1

Pull Request - State: closed - Opened by dependabot[bot] 4 months ago
Labels: dependencies

#722 - Using mosaicml streaming with accelerate ?

Issue - State: closed - Opened by benihime91 4 months ago - 3 comments

#721 - Add hf - fix lints

Pull Request - State: closed - Opened by XiaohanZhangCMU 5 months ago - 2 comments

#720 - Too much disk usage after transforming to MDS format

Issue - State: closed - Opened by LingxiaoShawn 5 months ago - 5 comments

#719 - Fix Linting from Pillow version update

Pull Request - State: closed - Opened by XiaohanZhangCMU 5 months ago

#718 - Bump pydantic from 2.7.4 to 2.8.2

Pull Request - State: closed - Opened by dependabot[bot] 5 months ago
Labels: dependencies

#717 - 'File exists: "/00000_locals"' when integrated with deepspeed training scripts

Issue - State: open - Opened by Clement25 5 months ago - 4 comments
Labels: bug

#715 - Bump databricks-sdk from 0.28.0 to 0.29.0

Pull Request - State: closed - Opened by dependabot[bot] 5 months ago
Labels: dependencies

#714 - Upgrade ci_testing, remove codeql

Pull Request - State: closed - Opened by snarayan21 5 months ago

#713 - enable adaptive retry for s3 download

Pull Request - State: closed - Opened by bigning 5 months ago

#712 - Remove duplicate `dbfs:` prefix from error message

Pull Request - State: closed - Opened by vanshcsingh 5 months ago

#711 - Add HF File System Support to Streaming

Pull Request - State: closed - Opened by orionw 5 months ago - 14 comments

#710 - Bump pytest-split from 0.8.2 to 0.9.0

Pull Request - State: closed - Opened by dependabot[bot] 5 months ago - 1 comment
Labels: dependencies

#709 - Optional dependency for different storages?

Issue - State: open - Opened by huxuan 5 months ago - 2 comments
Labels: enhancement

#708 - fix convert imagenet

Pull Request - State: closed - Opened by Hprairie 5 months ago

#707 - AttributeError when trying to convert Imagenet1k

Issue - State: closed - Opened by Hprairie 5 months ago - 3 comments
Labels: bug

#705 - Fix linting issues with numpy 2

Pull Request - State: closed - Opened by snarayan21 5 months ago

#704 - Bump pydantic from 2.7.3 to 2.7.4

Pull Request - State: closed - Opened by dependabot[bot] 5 months ago - 1 comment
Labels: dependencies

#703 - Error writing to databricks UC volume

Issue - State: closed - Opened by JK87iab 5 months ago - 5 comments
Labels: bug

#702 - Fix edge cases with scalar or empty numpy array encoding

Pull Request - State: closed - Opened by snarayan21 5 months ago

#701 - Raise IndexError in `Spanner` object instead of `ValueError`

Pull Request - State: closed - Opened by snarayan21 5 months ago - 1 comment

#700 - Enable correct resumption from the end of an epoch

Pull Request - State: closed - Opened by snarayan21 5 months ago

#698 - Different batch_size for different streams

Issue - State: closed - Opened by huxuan 5 months ago - 2 comments
Labels: enhancement

#697 - Bump pytest from 8.2.1 to 8.2.2

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 1 comment
Labels: dependencies

#696 - Bump pydantic from 2.7.2 to 2.7.3

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago
Labels: dependencies

#695 - Handle zero-sized ndarray more gracefully

Issue - State: closed - Opened by huxuan 6 months ago - 1 comment
Labels: bug

#694 - fix: expand user path for Writer's output directory.

Pull Request - State: closed - Opened by huxuan 6 months ago - 1 comment

#693 - Make sure epoch_size is an int

Pull Request - State: closed - Opened by snarayan21 6 months ago

#692 - Bump pydantic from 2.7.1 to 2.7.2

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 1 comment
Labels: dependencies

#691 - Bump uvicorn from 0.29.0 to 0.30.1

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 1 comment
Labels: dependencies

#690 - DeltaTorch Compatability?

Issue - State: closed - Opened by rangi513 6 months ago - 3 comments

#689 - Bug that causes FileExistsError in shm

Issue - State: closed - Opened by Shade5 6 months ago - 6 comments
Labels: bug

#688 - Warning condition changed for Sequence Parallelism

Pull Request - State: closed - Opened by XiaohanZhangCMU 6 months ago

#687 - Bump databricks-sdk from 0.27.1 to 0.28.0

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 2 comments
Labels: dependencies

#685 - Fix node calculation in `replication` for `World` object

Pull Request - State: closed - Opened by snarayan21 6 months ago

#684 - Heterogeneous

Pull Request - State: open - Opened by XiaohanZhangCMU 6 months ago

#683 - Improve local temp directory error when only `remote` is specified

Pull Request - State: closed - Opened by snarayan21 6 months ago - 4 comments

#682 - Fix `batch_size` typo for `Stream` object in docs

Pull Request - State: closed - Opened by snarayan21 6 months ago

#681 - Update CODEOWNERS

Pull Request - State: closed - Opened by karan6181 6 months ago

#680 - Bump pytest from 8.2.0 to 8.2.1

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 1 comment
Labels: dependencies

#679 - Bump databricks-sdk from 0.27.0 to 0.27.1

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 2 comments
Labels: dependencies

#678 - Reading all formats (parquet, csv, tsv, json) etc natively without conversion steps

Issue - State: closed - Opened by abhijithneilabraham 6 months ago - 2 comments
Labels: enhancement

#677 - Last entry in the dataset is causing "Relative sample index $x is not present" error

Issue - State: open - Opened by isidentical 6 months ago - 3 comments
Labels: bug

#676 - Using minio with StreamingDataset

Issue - State: closed - Opened by abhijithneilabraham 6 months ago - 1 comment
Labels: bug

#675 - Update platform references

Pull Request - State: closed - Opened by aspfohl 6 months ago - 1 comment

#674 - Use IndexError instead of ValueError in __getitem__

Issue - State: closed - Opened by keaganlong 6 months ago - 1 comment

#673 - Helpful error on `py1e` for improperly written datasets

Pull Request - State: closed - Opened by snarayan21 6 months ago

#672 - Ensure shards cannot be larger than 4GB

Pull Request - State: closed - Opened by snarayan21 6 months ago

#671 - Shard maximum size should be 4GB for MDS

Issue - State: closed - Opened by smspillaz 6 months ago - 1 comment
Labels: bug

#670 - Bump fastapi from 0.110.2 to 0.111.0

Pull Request - State: closed - Opened by dependabot[bot] 6 months ago - 1 comment
Labels: dependencies

#669 - Version bump to v0.7.6

Pull Request - State: closed - Opened by snarayan21 7 months ago

#667 - Bump databricks-sdk from 0.23.0 to 0.27.0

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago - 1 comment
Labels: dependencies

#666 - Bump pydantic from 2.7.0 to 2.7.1

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago - 3 comments
Labels: dependencies

#665 - Bump databricks-sdk from 0.23.0 to 0.26.0

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago - 1 comment
Labels: dependencies

#664 - Bump pytest from 8.1.1 to 8.2.0

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago - 1 comment
Labels: dependencies

#662 - Support large size index.json (20GB +)

Issue - State: open - Opened by andreamad8 7 months ago - 2 comments
Labels: enhancement

#661 - Adding `device_per_stream` batching

Pull Request - State: closed - Opened by snarayan21 7 months ago

#660 - Bump fastapi from 0.110.0 to 0.110.2

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago
Labels: dependencies

#658 - How does a LRU local cache help with multi-epoch training

Issue - State: closed - Opened by liangjuf 7 months ago - 4 comments
Labels: enhancement

#657 - batching_method=random doesn't seem to work properly

Issue - State: closed - Opened by oscarfossey 7 months ago - 2 comments
Labels: bug

#656 - Does it support Preference data (for training Reward / DPO)?

Issue - State: open - Opened by ericxsun 7 months ago - 4 comments
Labels: enhancement

#654 - Bump databricks-sdk from 0.23.0 to 0.25.1

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago - 1 comment
Labels: dependencies

#653 - Bump pydantic from 2.6.4 to 2.7.0

Pull Request - State: closed - Opened by dependabot[bot] 7 months ago
Labels: dependencies

#652 - Out of Memory when using Streaming Dataloader

Issue - State: open - Opened by VikaasVarma 7 months ago - 15 comments
Labels: bug

#651 - Add support for Alipan Storage backend

Pull Request - State: closed - Opened by PeterDing 8 months ago - 1 comment

#650 - Version bump to 0.7.5

Pull Request - State: closed - Opened by snarayan21 8 months ago

#649 - Bump fastapi from 0.110.0 to 0.110.1

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies

#648 - Bump databricks-sdk from 0.23.0 to 0.24.0

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies

#647 - COCO Dataset fix -- avoids `allow_unsafe_types=True`

Pull Request - State: closed - Opened by snarayan21 8 months ago

#646 - Augment existing dataset

Issue - State: open - Opened by LWprogramming 8 months ago - 3 comments
Labels: enhancement

#645 - Bump gitpython from 3.1.41 to 3.1.43

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 2 comments
Labels: dependencies

#644 - Bump databricks-sdk from 0.22.0 to 0.23.0

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies

#643 - GPU utilisation drop between epochs

Issue - State: closed - Opened by rishabhm12 8 months ago - 21 comments
Labels: bug

#642 - Bump pytest and fix failing test

Pull Request - State: closed - Opened by snarayan21 8 months ago - 1 comment

#641 - Update google-cloud-storage requirement from <2.11.0,>=2.9.0 to >=2.9.0,<2.17.0

Pull Request - State: open - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies

#640 - Bump uvicorn from 0.28.0 to 0.29.0

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 2 comments
Labels: dependencies

#639 - Bump pydantic from 2.5.3 to 2.6.4

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies

#638 - Update pytest-cov requirement from <5,>=4 to >=4,<6

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 3 comments
Labels: dependencies

#636 - Major overhaul of Streaming documentation

Pull Request - State: closed - Opened by snarayan21 8 months ago - 3 comments

#635 - Add batch_size to 1 if not provided for regression testing

Pull Request - State: closed - Opened by karan6181 8 months ago

#634 - Modify StreamingDataset to support passing process_group as construct…

Pull Request - State: closed - Opened by jasonkrone 8 months ago - 4 comments

#633 - Integrating MDS Streaming with HF Dataset Streaming

Issue - State: closed - Opened by siddk 8 months ago - 12 comments
Labels: enhancement

#632 - Fixed docstring note for getting sequential sample ordering

Pull Request - State: closed - Opened by snarayan21 8 months ago

#631 - Bump furo from 2023.7.26 to 2024.1.29

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago
Labels: dependencies

#630 - Bump pypandoc from 1.12 to 1.13

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies