Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / mosaicml/streaming issues and pull requests

#328 - Bump pydantic from 1.10.9 to 1.10.11

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#327 - Bump uvicorn from 0.22.0 to 0.23.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#326 - Bump pydantic from 1.10.9 to 2.0.3

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#324 - DDP with streaming got duplicate data

Issue - State: closed - Opened by gongel over 1 year ago - 4 comments
Labels: bug

#323 - Bump pydantic from 1.10.9 to 2.0.2

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies

#322 - Bump fastapi from 0.98.0 to 0.100.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#321 - Bump pydantic from 1.10.9 to 2.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#320 - Bump fastapi from 0.98.0 to 0.99.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#319 - Add a regression test for StreamingDataset using cloud providers

Pull Request - State: closed - Opened by b-chu over 1 year ago

#318 - Add a regression test for StreamingDataset instantiation and iteration

Pull Request - State: closed - Opened by b-chu over 1 year ago

#317 - Transfer json folder to Streaming

Issue - State: closed - Opened by germanjke over 1 year ago - 2 comments

#316 - Sync tmp directory

Pull Request - State: closed - Opened by b-chu over 1 year ago

#315 - Add GCS authentication for service accounts

Pull Request - State: closed - Opened by b-chu over 1 year ago

#314 - Bump fastapi from 0.97.0 to 0.98.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#313 - Bump pytest from 7.3.2 to 7.4.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#312 - Add secrets check as part of pre-commit

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#311 - Added files to support azure datalake storage

Pull Request - State: closed - Opened by shivshandilya over 1 year ago - 7 comments

#310 - Can't load dataset from S3

Issue - State: closed - Opened by germanjke over 1 year ago - 15 comments

#309 - Bump myst-parser from 1.0.0 to 2.0.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#308 - Bump version to 0.5.1

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#307 - StreamingDataset with DDP hangs and then crashes with NCCL timeout error

Issue - State: open - Opened by greeneggsandyaml over 1 year ago - 17 comments
Labels: bug

#306 - Why can't I run two experiments in parallel which will load from the same dataset location?

Issue - State: closed - Opened by eldarkurtic over 1 year ago - 8 comments
Labels: bug

#305 - Fix LocalDataset (property size for fancy __getitem__).

Pull Request - State: closed - Opened by knighton over 1 year ago

#303 - fix: :bug: LocalDataset

Pull Request - State: closed - Opened by tungdq212 over 1 year ago - 1 comment

#302 - py1bs shuffle algorithm ("staggered py1b")

Pull Request - State: closed - Opened by knighton over 1 year ago - 1 comment

#301 - Round drop_first to be divisible by num_physical_nodes.

Pull Request - State: closed - Opened by knighton over 1 year ago

#300 - LocalDataset bug

Issue - State: closed - Opened by tungdq212 over 1 year ago - 6 comments
Labels: bug

#299 - Added a utility method to clean stale shared memory

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#298 - Improved existing exception and exception messages

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#297 - Terminate the main process if thread died unexpectedly

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#296 - Bump pydantic from 1.10.8 to 1.10.9

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#293 - Timeout Error when local=None in StreamingDataset and training in distributed mode

Issue - State: closed - Opened by vancoykendall over 1 year ago - 1 comment
Labels: bug

#292 - Support for azure Data Lake Gen2 type storage

Issue - State: closed - Opened by shivshandilya over 1 year ago - 6 comments
Labels: enhancement

#288 - Parallel writing of MDS files

Issue - State: closed - Opened by mpetri over 1 year ago - 2 comments
Labels: enhancement

#273 - Update README.md - slack

Pull Request - State: closed - Opened by ejyuen over 1 year ago

#272 - Fix README slack link

Pull Request - State: closed - Opened by growlix over 1 year ago - 2 comments

#271 - Composable datasets

Issue - State: closed - Opened by jacobwjs over 1 year ago - 4 comments
Labels: enhancement

#270 - Bump furo from 2022.9.29 to 2023.5.20

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies

#269 - Bump fastapi from 0.95.1 to 0.95.2

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies

#268 - keep_raw=False doesn't actually delete shards.

Issue - State: open - Opened by tbenthompson over 1 year ago - 3 comments
Labels: bug

#267 - Update Stream documentation

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#266 - Add `Stream` usage example to README

Pull Request - State: closed - Opened by hanlint over 1 year ago - 3 comments

#265 - Support any S3-compatible object store (R2, Coreweave, Backblaze, etc.)

Pull Request - State: open - Opened by abhi-mosaic over 1 year ago - 1 comment

#264 - Memory leak when using `StreamingDataset`'s `__iter__` method.

Issue - State: closed - Opened by wadimiusz over 1 year ago - 8 comments
Labels: bug

#263 - Bugfix in user_guide.md sample code

Pull Request - State: closed - Opened by tginart over 1 year ago

#262 - Fix slack link in readme

Pull Request - State: closed - Opened by growlix over 1 year ago

#261 - Resume support for MDSWriter?

Issue - State: closed - Opened by tbenthompson over 1 year ago - 3 comments
Labels: enhancement

#260 - Add support for any S3 compatible object storage

Issue - State: open - Opened by vancoykendall over 1 year ago - 5 comments

#259 - Fix typo in documentation's conversion `pile.py` link

Pull Request - State: closed - Opened by ouhenio over 1 year ago

#258 - Fixed Pile documentation link

Pull Request - State: closed - Opened by karan6181 over 1 year ago - 1 comment

#256 - Add support for Azure cloud storage

Pull Request - State: closed - Opened by hlky over 1 year ago - 1 comment

#255 - Add support for Cloudflare R2 cloud storage

Pull Request - State: closed - Opened by hlky over 1 year ago - 6 comments

#254 - Is there an ETA for adding azure support?

Issue - State: closed - Opened by njb-ms over 1 year ago - 5 comments
Labels: enhancement

#253 - Bump uvicorn from 0.21.1 to 0.22.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#252 - Bump sphinx from 4.4.0 to 7.0.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies

#251 - Create a new boto3 session per thread

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#250 - Shared lock

Pull Request - State: open - Opened by knighton over 1 year ago

#249 - Update readthedocs python version to 3.9

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#248 - Write example for RedPajama

Issue - State: closed - Opened by mitchellnw over 1 year ago - 8 comments
Labels: enhancement

#247 - Support dataset.filter sample filtering

Issue - State: closed - Opened by mpetri over 1 year ago - 3 comments
Labels: enhancement

#246 - Better organize code

Pull Request - State: closed - Opened by knighton over 1 year ago

#245 - Added py.typed to indicate that the repository has typing annotations

Pull Request - State: closed - Opened by karan6181 over 1 year ago - 1 comment

#244 - Add py.typed marker file for type checking

Issue - State: closed - Opened by micimize over 1 year ago - 2 comments
Labels: bug

#243 - Rename "samples" to "choose" (distinguish underlying vs resampled)

Pull Request - State: closed - Opened by knighton over 1 year ago

#242 - Raise descriptive error message when index.json is corrupted

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#241 - Propagate an exception raise by a thread to its caller

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#240 - Skip distributed all_gather test since CI non-deterministically hangs

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#239 - Bump version to 0.4.1

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#238 - Fixed local directory check

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#237 - Update torch dependency pin to <2.1

Pull Request - State: closed - Opened by bandish-shah over 1 year ago - 1 comment

#236 - Redesign shard index

Pull Request - State: closed - Opened by knighton over 1 year ago - 1 comment

#235 - Removed pushing auto release branch due to GH action permission

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#234 - Support of torch 2.0

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#233 - Bump yamllint from 1.30.0 to 1.31.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#232 - Add documentation for MDSWriter, conversion scripts, and supported format

Pull Request - State: open - Opened by karan6181 over 1 year ago - 1 comment

#230 - Bump pytest from 7.3.0 to 7.3.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#229 - Bump sphinx-copybutton from 0.5.1 to 0.5.2

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#228 - Bump sphinxext-opengraph from 0.8.1 to 0.8.2

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#227 - Bump fastapi from 0.95.0 to 0.95.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#226 - Virtually split the repeats of repeated shards

Pull Request - State: closed - Opened by knighton over 1 year ago - 2 comments

#225 - StreamingDataset with torch.nn.parallel.DistributedDataParallel

Issue - State: closed - Opened by amallia over 1 year ago - 6 comments
Labels: enhancement

#224 - Switch documentation search to use Algolia

Pull Request - State: closed - Opened by bandish-shah over 1 year ago

#223 - Add two shuffling algos: naive (globally) and py1b (fixed-size blocks).

Pull Request - State: closed - Opened by knighton over 1 year ago - 3 comments

#222 - Bump pytest from 7.2.2 to 7.3.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#221 - Add installation and environments documentation

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#220 - Add a readme for multimodal convert script modal type

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#219 - Cold shard eviction

Pull Request - State: open - Opened by knighton over 1 year ago - 4 comments

#218 - Refactor StreamingDataset shared memory prefix setup

Pull Request - State: closed - Opened by knighton over 1 year ago

#217 - Shared dir selection method prone to collisions in concurrent scenarios

Issue - State: open - Opened by mx781 over 1 year ago - 5 comments
Labels: bug

#216 - Bump version to 0.4.0

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#215 - Register atexit handler for resource cleanup

Pull Request - State: closed - Opened by karan6181 over 1 year ago - 1 comment

#214 - Allow for accessing slices of dataset

Issue - State: closed - Opened by VictorSanh over 1 year ago - 5 comments
Labels: enhancement

#213 - Questions about `StreamingDataset` in the case of limited (fast) local disk storage

Issue - State: closed - Opened by VictorSanh over 1 year ago - 2 comments
Labels: enhancement

#212 - Raise an exception if bucket does not exist during upload

Pull Request - State: closed - Opened by karan6181 over 1 year ago

#211 - Bump pydantic from 1.10.6 to 1.10.7

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#210 - Bump furo from 2022.9.29 to 2023.3.27

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies