Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#6147 - ValueError when running BeamBasedBuilder with GCS path in cache_dir

Issue - State: closed - Opened by ktrk115 about 1 year ago - 2 comments

#6146 - DatasetGenerationError when load glue benchmark datasets from `load_dataset`

Issue - State: closed - Opened by yusx-swapp about 1 year ago - 4 comments

#6145 - Export to_iterable_dataset to document

Pull Request - State: closed - Opened by npuichigo about 1 year ago - 2 comments

#6144 - NIH exporter file not found

Issue - State: open - Opened by brando90 about 1 year ago - 6 comments

#6142 - the-stack-dedup fails to generate

Issue - State: closed - Opened by michaelroyzen about 1 year ago - 4 comments

#6139 - Offline dataset viewer

Issue - State: closed - Opened by yuvalkirstain about 1 year ago - 7 comments
Labels: enhancement, dataset-viewer

#6138 - Ignore CI lint rule violation in Pickler.memoize

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6136 - CI check_code_quality error: E721 Do not compare types, use `isinstance()`

Issue - State: closed - Opened by albertvillanova about 1 year ago
Labels: maintenance

#6135 - Remove unused allowed_extensions param

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6134 - `datasets` cannot be installed alongside `apache-beam`

Issue - State: closed - Opened by boyleconnor about 1 year ago - 1 comment

#6133 - Dataset is slower after calling `to_iterable_dataset`

Issue - State: open - Opened by npuichigo about 1 year ago - 2 comments

#6132 - to_iterable_dataset is missing in document

Issue - State: closed - Opened by npuichigo about 1 year ago - 1 comment

#6131 - AttributeError: type object 'tqdm' has no attribute '_lock'

Issue - State: open - Opened by NielsRogge about 1 year ago - 1 comment

#6130 - default config name doesn't work when config kwargs are specified.

Issue - State: closed - Opened by npuichigo about 1 year ago - 15 comments

#6129 - Release 2.14.4

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 5 comments

#6128 - IndexError: Invalid key: 88 is out of bounds for size 0

Issue - State: closed - Opened by TomasAndersonFang about 1 year ago - 5 comments

#6127 - Fix authentication issues

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 8 comments

#6126 - Private datasets do not load when passing token

Issue - State: closed - Opened by albertvillanova about 1 year ago - 4 comments
Labels: bug

#6124 - Datasets crashing runs due to KeyError

Issue - State: closed - Opened by conceptofmind about 1 year ago - 7 comments

#6123 - Inaccurate Bounding Boxes in "wildreceipt" Dataset

Issue - State: closed - Opened by HamzaGbada about 1 year ago - 1 comment

#6122 - Upload README via `push_to_hub`

Issue - State: closed - Opened by liyucheng09 about 1 year ago - 1 comment
Labels: enhancement

#6121 - Small typo in the code example of create imagefolder dataset

Pull Request - State: closed - Opened by WangXin93 about 1 year ago - 1 comment

#6120 - Lookahead streaming support?

Issue - State: open - Opened by PicoCreator about 1 year ago - 1 comment
Labels: enhancement

#6119 - [Docs] Add description of `select_columns` to guide

Pull Request - State: closed - Opened by unifyh about 1 year ago - 2 comments

#6117 - Set dev version

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6116 - [Docs] The "Process" how-to guide lacks description of `select_columns` function

Issue - State: closed - Opened by unifyh about 1 year ago - 1 comment
Labels: enhancement

#6115 - Release: 2.14.3

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 6 comments

#6114 - Cache not being used when loading commonvoice 8.0.0

Issue - State: closed - Opened by clabornd about 1 year ago - 2 comments

#6113 - load_dataset() fails with streamlit caching inside docker

Issue - State: closed - Opened by fierval about 1 year ago - 1 comment

#6112 - yaml error using push_to_hub with generated README.md

Issue - State: closed - Opened by kevintee about 1 year ago - 1 comment

#6110 - [BUG] Dataset initialized from in-memory data does not create cache.

Issue - State: closed - Opened by MattYoon about 1 year ago - 1 comment

#6109 - Problems in downloading Amazon reviews from HF

Issue - State: closed - Opened by 610v4nn1 about 1 year ago - 2 comments

#6108 - Loading local datasets got strangely stuck

Issue - State: open - Opened by LoveCatc about 1 year ago - 6 comments

#6107 - Fix deprecation of use_auth_token in file_utils

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6106 - load local json_file as dataset

Issue - State: closed - Opened by CiaoHe about 1 year ago - 2 comments

#6105 - Fix error when loading from GCP bucket

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 5 comments

#6104 - HF Datasets data access is extremely slow even when in memory

Issue - State: open - Opened by NightMachinery about 1 year ago - 1 comment

#6103 - Set dev version

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6102 - Release 2.14.2

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6101 - Release 2.14.2

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6100 - TypeError when loading from GCP bucket

Issue - State: closed - Opened by bilelomrani1 about 1 year ago - 2 comments

#6099 - How do i get "amazon_us_reviews

Issue - State: closed - Opened by IqraBaluch about 1 year ago - 10 comments
Labels: enhancement

#6098 - Expanduser in save_to_disk()

Pull Request - State: closed - Opened by Unknown3141592 about 1 year ago - 3 comments

#6096 - Add `fsspec` support for `to_json`, `to_csv`, and `to_parquet`

Pull Request - State: closed - Opened by alvarobartt about 1 year ago - 5 comments

#6095 - Fix deprecation of errors in TextConfig

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6094 - Fix deprecation of use_auth_token in DownloadConfig

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6093 - Deprecate `download_custom`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 6 comments

#6092 - Minor fix in `iter_files` for hidden files

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6091 - Bump fsspec from 2021.11.1 to 2022.3.0

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6090 - FilesIterable skips all the files after a hidden file

Issue - State: closed - Opened by dkrivosic about 1 year ago - 1 comment

#6089 - AssertionError: daemonic processes are not allowed to have children

Issue - State: open - Opened by codingl2k1 about 1 year ago - 2 comments

#6088 - Loading local data files initiates web requests

Issue - State: closed - Opened by lytning98 about 1 year ago

#6087 - fsspec dependency is set too low

Issue - State: closed - Opened by iXce about 1 year ago - 1 comment

#6086 - Support `fsspec` in `Dataset.to_<format>` methods

Issue - State: closed - Opened by mariosasko about 1 year ago - 5 comments
Labels: enhancement

#6085 - Fix `fsspec` download

Pull Request - State: open - Opened by mariosasko about 1 year ago - 3 comments

#6083 - set dev version

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6082 - Release: 2.14.1

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 6 comments

#6081 - Deprecate `Dataset.export`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 2 comments

#6080 - Remove README link to deprecated Colab notebook

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6079 - Iterating over DataLoader based on HF datasets is stuck forever

Issue - State: closed - Opened by arindamsarkar93 about 1 year ago - 15 comments

#6078 - resume_download with streaming=True

Issue - State: closed - Opened by NicolasMICAUX about 1 year ago - 3 comments

#6077 - Mapping gets stuck at 99%

Issue - State: open - Opened by Laurent2916 about 1 year ago - 6 comments

#6076 - No gzip encoding from github

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6075 - Error loading music files using `load_dataset`

Issue - State: closed - Opened by susnato about 1 year ago - 2 comments

#6074 - Misc doc improvements

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6073 - version2.3.2 load_dataset()data_files can't include .xxxx in path

Issue - State: closed - Opened by BUAAChuanWang about 1 year ago - 1 comment

#6072 - Fix fsspec storage_options from load_dataset

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 6 comments

#6070 - Fix Quickstart notebook link

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6069 - KeyError: dataset has no key "image"

Issue - State: closed - Opened by etetteh about 1 year ago - 7 comments

#6068 - fix tqdm lock deletion

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 5 comments

#6066 - AttributeError: '_tqdm_cls' object has no attribute '_lock'

Issue - State: closed - Opened by codingl2k1 about 1 year ago - 7 comments

#6065 - Add column type guessing from map return function

Pull Request - State: closed - Opened by piercefreeman about 1 year ago - 5 comments

#6060 - Dataset.map() execute twice when in PyTorch DDP mode

Issue - State: closed - Opened by wanghaoyucn about 1 year ago - 4 comments

#6059 - Provide ability to load label mappings from file

Issue - State: open - Opened by david-waterworth about 1 year ago - 3 comments
Labels: enhancement

#6057 - Why is the speed difference of gen example so big?

Issue - State: closed - Opened by pixeli99 about 1 year ago - 1 comment

#6053 - Change package name from "datasets" to something less generic

Issue - State: closed - Opened by geajack about 1 year ago - 1 comment
Labels: enhancement

#6049 - Update `ruff` version in pre-commit config

Pull Request - State: closed - Opened by polinaeterna about 1 year ago - 2 comments

#6046 - Support proxy and user-agent in fsspec calls

Issue - State: open - Opened by lhoestq about 1 year ago - 8 comments
Labels: enhancement, good second issue

#6036 - Deprecate search API

Pull Request - State: open - Opened by mariosasko about 1 year ago - 9 comments

#6014 - Request to Share/Update Dataset Viewer Code

Issue - State: closed - Opened by lilyorlilypad about 1 year ago - 10 comments
Labels: duplicate

#6012 - [FR] Transform Chaining, Lazy Mapping

Issue - State: open - Opened by NightMachinery about 1 year ago - 7 comments
Labels: enhancement

#6010 - Improve `Dataset`'s string representation

Issue - State: open - Opened by mariosasko about 1 year ago - 3 comments
Labels: enhancement

#5990 - Pushing a large dataset on the hub consistently hangs

Issue - State: open - Opened by AntreasAntoniou over 1 year ago - 45 comments
Labels: bug

#5984 - AutoSharding IterableDataset's when num_workers > 1

Issue - State: open - Opened by mathephysicist over 1 year ago - 8 comments
Labels: enhancement

#5981 - Only two cores are getting used in sagemaker with pytorch 3.10 kernel

Issue - State: closed - Opened by mmr-crexi over 1 year ago - 4 comments

#5968 - Common Voice datasets still need `use_auth_token=True`

Issue - State: closed - Opened by patrickvonplaten over 1 year ago - 4 comments