Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#7181 - [WIP] Fix datasets export to JSON

Pull Request - State: open - Opened by varadhbhatnagar about 10 hours ago

#7179 - Support Python 3.11

Pull Request - State: open - Opened by albertvillanova 3 days ago - 1 comment

#7178 - Support Python 3.11

Issue - State: open - Opened by albertvillanova 3 days ago
Labels: enhancement

#7177 - Fix release instructions

Pull Request - State: closed - Opened by albertvillanova 3 days ago - 1 comment

#7176 - fix grammar in fingerprint.py

Pull Request - State: open - Opened by jxmorris12 3 days ago

#7175 - [FSTimeoutError] load_dataset

Issue - State: open - Opened by cosmo3769 3 days ago - 2 comments

#7174 - Set dev version

Pull Request - State: closed - Opened by albertvillanova 4 days ago - 1 comment

#7173 - Release: 3.0.1

Pull Request - State: closed - Opened by albertvillanova 4 days ago - 1 comment

#7172 - Add torchdata as a regular test dependency

Pull Request - State: closed - Opened by albertvillanova 4 days ago - 1 comment

#7171 - CI is broken: No solution found when resolving dependencies

Issue - State: closed - Opened by albertvillanova 4 days ago
Labels: bug

#7170 - Support JSON lines with missing columns

Pull Request - State: closed - Opened by albertvillanova 5 days ago - 1 comment

#7169 - JSON lines with missing columns raise CastError

Issue - State: closed - Opened by albertvillanova 5 days ago
Labels: bug

#7168 - sd1.5 diffusers controlnet training script gives new error

Issue - State: open - Opened by Night1099 5 days ago - 2 comments

#7166 - fix docstring code example for distributed shuffle

Pull Request - State: closed - Opened by lhoestq 5 days ago - 1 comment

#7165 - fix increase_load_count

Pull Request - State: closed - Opened by lhoestq 6 days ago - 3 comments

#7164 - fsspec.exceptions.FSTimeoutError when downloading dataset

Issue - State: open - Opened by timonmerk 6 days ago - 2 comments

#7163 - Set explicit seed in iterable dataset ddp shuffling example

Issue - State: closed - Opened by alex-hh 6 days ago - 1 comment

#7162 - Support JSON lines with empty struct

Pull Request - State: closed - Opened by albertvillanova 6 days ago - 1 comment

#7161 - JSON lines with empty struct raise ArrowTypeError

Issue - State: closed - Opened by albertvillanova 7 days ago
Labels: bug

#7160 - Support JSON lines with missing struct fields

Pull Request - State: closed - Opened by albertvillanova 7 days ago - 1 comment

#7158 - google colab ex

Pull Request - State: open - Opened by docfhsp 7 days ago

#7157 - Fix zero proba interleave datasets

Pull Request - State: closed - Opened by lhoestq 8 days ago - 1 comment

#7156 - interleave_datasets resets shuffle state

Issue - State: open - Opened by jonathanasdf 9 days ago

#7155 - Dataset viewer not working! Failure due to more than 32 splits.

Issue - State: closed - Opened by sleepingcat4 11 days ago - 1 comment

#7154 - Support ndjson data files

Pull Request - State: closed - Opened by albertvillanova 12 days ago - 2 comments

#7153 - Support data files with .ndjson extension

Issue - State: closed - Opened by albertvillanova 12 days ago
Labels: enhancement

#7151 - Align filename prefix splitting with WebDataset library

Pull Request - State: closed - Opened by albertvillanova 14 days ago

#7149 - Datasets Unknown Keyword Argument Error - task_templates

Issue - State: closed - Opened by varungupta31 17 days ago - 1 comment

#7148 - Bug: Error when downloading mteb/mtop_domain

Issue - State: closed - Opened by ZiyiXia 17 days ago - 4 comments

#7147 - IterableDataset strange deadlock

Issue - State: closed - Opened by jonathanasdf 17 days ago - 6 comments

#7146 - Set dev version

Pull Request - State: closed - Opened by albertvillanova 18 days ago - 1 comment

#7145 - Release: 3.0.0

Pull Request - State: closed - Opened by albertvillanova 18 days ago - 1 comment

#7144 - Fix key error in webdataset

Pull Request - State: closed - Opened by ragavsachdeva 18 days ago - 7 comments

#7143 - Modify add_column() to optionally accept a FeatureType as param

Pull Request - State: closed - Opened by varadhbhatnagar 21 days ago - 6 comments

#7142 - Specifying datatype when adding a column to a dataset.

Issue - State: closed - Opened by varadhbhatnagar 22 days ago - 1 comment
Labels: enhancement

#7141 - Older datasets throwing safety errors with 2.21.0

Issue - State: closed - Opened by alvations 23 days ago - 17 comments

#7138 - Cache only changed columns?

Issue - State: open - Opened by Modexus 24 days ago - 2 comments
Labels: enhancement

#7137 - [BUG] dataset_info sequence unexpected behavior in README.md YAML

Issue - State: open - Opened by ain-soph 25 days ago - 1 comment

#7136 - Do not consume unnecessary memory during sharding

Pull Request - State: open - Opened by janEbert 25 days ago

#7135 - Bug: Type Mismatch in Dataset Mapping

Issue - State: open - Opened by marko1616 26 days ago - 3 comments

#7133 - remove filecheck to enable symlinks

Pull Request - State: open - Opened by fschlatt about 1 month ago - 5 comments

#7132 - Fix data file module inference

Pull Request - State: open - Opened by HennerM about 1 month ago - 3 comments

#7128 - Filter Large Dataset Entry by Entry

Issue - State: open - Opened by QiyaoWei about 1 month ago - 3 comments
Labels: enhancement

#7127 - Caching shuffles by np.random.Generator results in unintiutive behavior

Issue - State: open - Opened by el-hult about 1 month ago - 5 comments

#7126 - Disable implicit token in CI

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7125 - Fix wrong SHA in CI tests of HubDatasetModuleFactoryWithParquetExport

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7124 - Test get_dataset_config_info with non-existing/gated/private dataset

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7123 - Make dataset viewer more flexible in displaying metadata alongside images

Issue - State: open - Opened by egrace479 about 1 month ago - 1 comment
Labels: enhancement

#7122 - [interleave_dataset] sample batches from a single source at a time

Issue - State: open - Opened by memray about 1 month ago
Labels: enhancement

#7121 - Fix typed examples iterable state dict

Pull Request - State: closed - Opened by lhoestq about 1 month ago - 2 comments

#7120 - don't mention the script if trust_remote_code=False

Pull Request - State: closed - Opened by severo about 1 month ago - 3 comments

#7119 - Install transformers with numpy-2 CI

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7118 - Allow numpy-2.1 and test it without audio extra

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7117 - Audio dataset load everything in RAM and is very slow

Issue - State: open - Opened by Jourdelune about 1 month ago - 3 comments

#7116 - datasets cannot handle nested json if features is given.

Issue - State: closed - Opened by ljw20180420 about 1 month ago - 3 comments

#7115 - module 'pyarrow.lib' has no attribute 'ListViewType'

Issue - State: closed - Opened by neurafusionai about 1 month ago - 1 comment

#7114 - Temporarily pin numpy<2.1 to fix CI

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7111 - CI is broken for numpy-2: Failed to fetch wheel: llvmlite==0.34.0

Issue - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7110 - Fix ConnectionError for gated datasets and unauthenticated users

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 4 comments

#7107 - load_dataset broken in 2.21.0

Issue - State: closed - Opened by anjor about 1 month ago - 4 comments

#7106 - Rename LargeList.dtype to LargeList.feature

Pull Request - State: closed - Opened by albertvillanova about 1 month ago - 2 comments

#7105 - Use `huggingface_hub` cache

Pull Request - State: closed - Opened by lhoestq about 2 months ago - 7 comments

#7104 - remove more script docs

Pull Request - State: closed - Opened by lhoestq about 2 months ago - 2 comments

#7103 - Fix args of feature docstrings

Pull Request - State: closed - Opened by albertvillanova about 2 months ago - 2 comments

#7099 - Set dev version

Pull Request - State: closed - Opened by albertvillanova about 2 months ago - 2 comments

#7098 - Release: 2.21.0

Pull Request - State: closed - Opened by albertvillanova about 2 months ago - 1 comment

#7096 - Automatically create `cache_dir` from `cache_file_name`

Pull Request - State: closed - Opened by ringohoffman about 2 months ago - 3 comments

#7094 - Add Arabic Docs to Datasets

Pull Request - State: open - Opened by AhmedAlmaghz about 2 months ago

#7093 - Add Arabic Docs to datasets

Issue - State: open - Opened by AhmedAlmaghz about 2 months ago
Labels: enhancement

#7092 - load_dataset with multiple jsonlines files interprets datastructure too early

Issue - State: open - Opened by Vipitis about 2 months ago - 5 comments

#7088 - Disable warning when using with_format format on tensors

Issue - State: open - Opened by Haislich about 2 months ago
Labels: enhancement

#7087 - Unable to create dataset card for Lushootseed language

Issue - State: closed - Opened by vaishnavsudarshan about 2 months ago - 2 comments
Labels: enhancement

#7085 - [Regression] IterableDataset is broken on 2.20.0

Issue - State: closed - Opened by AjayP13 2 months ago - 3 comments

#7084 - More easily support streaming local files

Issue - State: open - Opened by fschlatt 2 months ago
Labels: enhancement

#7083 - fix streaming from arrow files

Pull Request - State: closed - Opened by fschlatt 2 months ago

#7082 - Support HTTP authentication in non-streaming mode

Pull Request - State: closed - Opened by albertvillanova 2 months ago - 2 comments

#7081 - Set load_from_disk path type as PathLike

Pull Request - State: closed - Opened by albertvillanova 2 months ago - 2 comments

#7080 - Generating train split takes a long time

Issue - State: open - Opened by alexanderswerdlow 2 months ago

#7079 - HfHubHTTPError: 500 Server Error: Internal Server Error for url:

Issue - State: closed - Opened by neoneye 2 months ago - 17 comments

#7078 - Fix CI test_convert_to_parquet

Pull Request - State: closed - Opened by albertvillanova 2 months ago - 2 comments

#7077 - column_names ignored by load_dataset() when loading CSV file

Issue - State: open - Opened by luismsgomes 2 months ago - 1 comment

#7076 - ๐Ÿงช Do not mock create_commit

Pull Request - State: closed - Opened by coyotte508 2 months ago - 1 comment