Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#5795 - Fix spark imports

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 3 comments

#5794 - CI ZeroDivisionError

Issue - State: closed - Opened by albertvillanova over 1 year ago - 2 comments
Labels: bug

#5793 - IterableDataset.with_format("torch") not working

Issue - State: open - Opened by jiangwy99 over 1 year ago - 1 comment
Labels: bug, enhancement, streaming

#5791 - TIFF/TIF support

Issue - State: closed - Opened by sebasmos over 1 year ago - 5 comments
Labels: enhancement

#5790 - Allow to run CI on push to ci-branch

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 2 comments

#5789 - Support streaming datasets that use jsonlines

Issue - State: open - Opened by albertvillanova over 1 year ago
Labels: enhancement

#5788 - Prepare tests for hfh 0.14

Pull Request - State: closed - Opened by Wauplin over 1 year ago - 6 comments

#5787 - Fix inferring module for unsupported data files

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 4 comments

#5786 - Multiprocessing in a `filter` or `map` function with a Pytorch model

Issue - State: closed - Opened by HugoLaurencon over 1 year ago - 2 comments

#5784 - Raise subprocesses traceback when interrupting

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 4 comments

#5783 - Offset overflow while doing regex on a text column

Issue - State: open - Opened by nishanthcgit over 1 year ago - 7 comments

#5782 - Support for various audio-loading backends instead of always relying on SoundFile

Issue - State: closed - Opened by BoringDonut over 1 year ago - 3 comments
Labels: enhancement

#5781 - Error using `load_datasets`

Issue - State: closed - Opened by gjyoungjr over 1 year ago - 2 comments

#5779 - Call fs.makedirs in save_to_disk

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 3 comments

#5775 - ArrowDataset.save_to_disk lost some logic of remote

Issue - State: closed - Opened by Zoupers over 1 year ago - 1 comment

#5773 - train_dataset does not implement __len__

Issue - State: open - Opened by v-yunbin over 1 year ago - 8 comments

#5771 - Support cloud storage for loading datasets

Issue - State: closed - Opened by eli-osherovich over 1 year ago - 1 comment
Labels: duplicate, enhancement

#5770 - Add IterableDataset.from_spark

Pull Request - State: closed - Opened by maddiedawson over 1 year ago - 8 comments

#5769 - Tiktoken tokenizers are not pickable

Issue - State: closed - Opened by markovalexander over 1 year ago - 1 comment

#5766 - Support custom feature types

Issue - State: open - Opened by jmontalt over 1 year ago - 4 comments
Labels: enhancement

#5760 - Multi-image loading in Imagefolder dataset

Issue - State: open - Opened by vvvm23 over 1 year ago - 5 comments
Labels: enhancement

#5752 - Streaming dataset looses `.feature` method after `.add_column`

Issue - State: open - Opened by sanchit-gandhi over 1 year ago - 2 comments
Labels: bug

#5747 - [WIP] Add Dataset.to_spark

Pull Request - State: closed - Opened by maddiedawson over 1 year ago

#5736 - FORCE_REDOWNLOAD raises "Directory not empty" exception on second run

Issue - State: open - Opened by rcasero over 1 year ago - 3 comments

#5735 - Implement sharding on merged iterable datasets

Pull Request - State: closed - Opened by Hubert-Bonisseur over 1 year ago - 11 comments

#5729 - Fix nondeterministic sharded data split order

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 3 comments

#5728 - The order of data split names is nondeterministic

Issue - State: closed - Opened by albertvillanova over 1 year ago
Labels: bug

#5727 - load_dataset fails with FileNotFound error on Windows

Issue - State: open - Opened by joelkowalewski over 1 year ago - 3 comments

#5720 - Streaming IterableDatasets do not work with torch DataLoaders

Issue - State: open - Opened by jlehrer1 over 1 year ago - 6 comments

#5718 - Reorder default data splits to have validation before test

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 3 comments

#5717 - Errror when saving to disk a dataset of images

Issue - State: open - Opened by jplu over 1 year ago - 15 comments

#5716 - Handle empty audio

Issue - State: closed - Opened by v-yunbin over 1 year ago - 2 comments

#5708 - Dataset sizes are in MiB instead of MB in dataset cards

Issue - State: closed - Opened by albertvillanova over 1 year ago - 12 comments
Labels: bug, dataset-viewer

#5706 - Support categorical data types for Parquet

Issue - State: closed - Opened by kklemon over 1 year ago - 17 comments
Labels: enhancement

#5701 - Add Dataset.from_spark

Pull Request - State: closed - Opened by maddiedawson over 1 year ago - 19 comments

#5699 - Issue when wanting to split in memory a cached dataset

Issue - State: open - Opened by FrancoisNoyez over 1 year ago - 2 comments

#5695 - Loading big dataset raises pyarrow.lib.ArrowNotImplementedError

Issue - State: closed - Opened by amariucaitheodor over 1 year ago - 7 comments

#5688 - Wikipedia download_and_prepare for GCS

Issue - State: closed - Opened by adrianfagerland over 1 year ago - 3 comments

#5678 - Add support to create a Dataset from spark dataframe

Issue - State: closed - Opened by lu-wang-dl over 1 year ago - 5 comments
Labels: enhancement

#5674 - Stored XSS

Issue - State: closed - Opened by Fadavvi over 1 year ago - 1 comment

#5665 - Feature request: IterableDataset.push_to_hub

Issue - State: open - Opened by NielsRogge over 1 year ago - 5 comments
Labels: enhancement

#5651 - expanduser in save_to_disk

Issue - State: closed - Opened by RmZeta2718 over 1 year ago - 5 comments
Labels: good first issue

#5613 - Version mismatch with multiprocess and dill on Python 3.10

Issue - State: open - Opened by adampauls over 1 year ago - 6 comments

#5612 - Arrow map type in parquet files unsupported

Issue - State: open - Opened by TevenLeScao over 1 year ago - 4 comments

#5610 - use datasets streaming mode in trainer ddp mode cause memory leak

Issue - State: open - Opened by gromzhu over 1 year ago - 3 comments

#5604 - Problems with downloading The Pile

Issue - State: closed - Opened by sentialx over 1 year ago - 7 comments

#5594 - Error while downloading the xtreme udpos dataset

Issue - State: closed - Opened by simran-khanuja over 1 year ago - 21 comments

#5589 - Revert "pass the dataset features to the IterableDataset.from_generator"

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 5 comments

#5575 - Metadata for each column

Issue - State: open - Opened by parsa-ra over 1 year ago - 5 comments
Labels: enhancement

#5574 - c4 dataset streaming fails with `FileNotFoundError`

Issue - State: closed - Opened by krasserm over 1 year ago - 12 comments

#5554 - Add resampy dep

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 5 comments

#5545 - Added return methods for URL-references to the pushed dataset

Pull Request - State: open - Opened by davidberenstein1957 over 1 year ago - 6 comments

#5537 - Increase speed of data files resolution

Issue - State: closed - Opened by lhoestq over 1 year ago - 5 comments
Labels: enhancement, good second issue

#5536 - Failure to hash function when using .map()

Issue - State: closed - Opened by venzen over 1 year ago - 14 comments

#5528 - Push to hub in a pull request

Pull Request - State: open - Opened by AJDERS over 1 year ago - 11 comments

#5519 - Lint code with `ruff`

Pull Request - State: closed - Opened by mariosasko over 1 year ago - 6 comments

#5517 - `with_format("numpy")` silently downcasts float64 to float32 features

Issue - State: open - Opened by ernestum over 1 year ago - 13 comments

#5511 - Creating a dummy dataset from a bigger one

Issue - State: closed - Opened by patrickvonplaten over 1 year ago - 8 comments

#5492 - Push_to_hub in a pull request

Issue - State: closed - Opened by lhoestq over 1 year ago - 2 comments
Labels: enhancement, good first issue

#5484 - Update docs for `nyu_depth_v2` dataset

Pull Request - State: closed - Opened by awsaf49 over 1 year ago - 6 comments

#5481 - Load a cached dataset as iterable

Issue - State: open - Opened by lhoestq over 1 year ago - 16 comments
Labels: enhancement, good second issue

#5477 - Unpin sqlalchemy once issue is fixed

Issue - State: closed - Opened by albertvillanova over 1 year ago - 2 comments

#5467 - Fix conda command in readme

Pull Request - State: closed - Opened by lhoestq over 1 year ago - 4 comments

#5459 - Disable aiohttp requoting of redirection URL

Pull Request - State: closed - Opened by albertvillanova over 1 year ago - 7 comments

#5454 - Save and resume the state of a DataLoader

Issue - State: open - Opened by lhoestq over 1 year ago - 18 comments
Labels: enhancement, generic discussion

#5451 - ImageFolder BadZipFile: Bad offset for central directory

Issue - State: closed - Opened by hmartiro over 1 year ago - 3 comments

#5430 - Support Apache Beam >= 2.44.0

Issue - State: closed - Opened by albertvillanova over 1 year ago - 1 comment
Labels: enhancement

#5422 - Datasets load error for saved github issues

Issue - State: open - Opened by folterj over 1 year ago - 7 comments

#5364 - Support for writing arrow files directly with BeamWriter

Pull Request - State: closed - Opened by mariosasko almost 2 years ago - 6 comments

#5354 - Consider using "Sequence" instead of "List"

Issue - State: open - Opened by tranhd95 almost 2 years ago - 8 comments
Labels: enhancement, good first issue

#5339 - Add Video feature, videofolder, and video-classification task

Pull Request - State: closed - Opened by nateraw almost 2 years ago - 4 comments

#5337 - Support webdataset format

Issue - State: closed - Opened by lhoestq almost 2 years ago - 5 comments

#5335 - Update tasks.json

Pull Request - State: closed - Opened by sayakpaul almost 2 years ago - 11 comments

#5331 - Support for multiple configs in packaged modules via metadata yaml info

Pull Request - State: open - Opened by polinaeterna almost 2 years ago - 15 comments

#5324 - Fix docstrings and types in documentation that appears on the website

Issue - State: open - Opened by polinaeterna almost 2 years ago - 5 comments
Labels: documentation

#5312 - Add DatasetDict.to_pandas

Pull Request - State: closed - Opened by lhoestq almost 2 years ago - 12 comments

#5301 - Return a split Dataset in load_dataset

Pull Request - State: closed - Opened by lhoestq almost 2 years ago - 2 comments

#5281 - Support cloud storage in load_dataset

Issue - State: open - Opened by lhoestq almost 2 years ago - 28 comments
Labels: enhancement, good second issue

#5274 - load_dataset possibly broken for gated datasets?

Issue - State: closed - Opened by TristanThrush almost 2 years ago - 8 comments

#5272 - Use pyarrow Tensor dtype

Issue - State: open - Opened by franz101 almost 2 years ago - 16 comments
Labels: enhancement

#5264 - `datasets` can't read a Parquet file in Python 3.9.13

Issue - State: closed - Opened by loubnabnl almost 2 years ago - 16 comments
Labels: bug

#5249 - Protect the main branch from inadvertent direct pushes

Issue - State: closed - Opened by albertvillanova almost 2 years ago - 1 comment
Labels: maintenance

#5243 - Download only split data

Issue - State: open - Opened by capsabogdan almost 2 years ago - 6 comments
Labels: enhancement

#5230 - dataclasses error when importing the library in python 3.11

Issue - State: closed - Opened by yonikremer almost 2 years ago - 4 comments

#5224 - Seems to freeze when loading audio dataset with wav files from local folder

Issue - State: closed - Opened by uriii3 almost 2 years ago - 4 comments

#5207 - Connection error of the HuggingFace's dataset Hub due to SSLError with proxy

Issue - State: open - Opened by leemgs almost 2 years ago - 13 comments

#5156 - Unable to download dataset using Azure Data Lake Gen 2

Issue - State: closed - Opened by clarissesimoes almost 2 years ago - 4 comments