Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#7414 - Gracefully cancel async tasks
Pull Request -
State: open - Opened by lhoestq 2 days ago
- 1 comment
#7413 - Documentation on multiple media files of the same type with WebDataset
Issue -
State: open - Opened by DCNemesis 3 days ago
#7412 - Index Error Invalid Ket is out of bounds for size 0 for code-search-net/code_search_net dataset
Issue -
State: open - Opened by harshakhmk 3 days ago
#7411 - Attempt to fix multiprocessing hang by closing and joining the pool before termination
Pull Request -
State: closed - Opened by dakinggg 4 days ago
- 1 comment
#7410 - Set dev version
Pull Request -
State: closed - Opened by lhoestq 4 days ago
- 1 comment
#7409 - Release: 3.3.1
Pull Request -
State: closed - Opened by lhoestq 4 days ago
- 1 comment
#7409 - Release: 3.3.1
Pull Request -
State: closed - Opened by lhoestq 4 days ago
- 1 comment
#7408 - Fix filter speed regression
Pull Request -
State: closed - Opened by lhoestq 4 days ago
- 1 comment
#7408 - Fix filter speed regression
Pull Request -
State: closed - Opened by lhoestq 4 days ago
- 1 comment
#7407 - Update use_with_pandas.mdx: to_pandas() correction in last section
Pull Request -
State: open - Opened by ibarrien 4 days ago
#7407 - Update use_with_pandas.mdx: to_pandas() correction in last section
Pull Request -
State: open - Opened by ibarrien 4 days ago
#7406 - Adding Core Maintainer List to CONTRIBUTING.md
Issue -
State: open - Opened by jp1924 4 days ago
- 3 comments
Labels: enhancement
#7405 - Lazy loading of environment variables
Issue -
State: open - Opened by nikvaessen 5 days ago
- 1 comment
#7405 - Lazy loading of environment variables
Issue -
State: open - Opened by nikvaessen 5 days ago
- 1 comment
#7404 - Performance regression in `dataset.filter`
Issue -
State: closed - Opened by ttim 5 days ago
- 2 comments
#7404 - Performance regression in `dataset.filter`
Issue -
State: closed - Opened by ttim 5 days ago
- 3 comments
#7402 - Fix a typo in arrow_dataset.py
Pull Request -
State: open - Opened by jingedawang 5 days ago
#7402 - Fix a typo in arrow_dataset.py
Pull Request -
State: open - Opened by jingedawang 5 days ago
#7401 - set dev version
Pull Request -
State: closed - Opened by lhoestq 7 days ago
- 1 comment
#7400 - 504 Gateway Timeout when uploading large dataset to Hugging Face Hub
Issue -
State: open - Opened by hotchpotch 7 days ago
- 4 comments
#7399 - Synchronize parameters for various datasets
Issue -
State: open - Opened by grofte 7 days ago
- 2 comments
#7398 - Release: 3.3.0
Pull Request -
State: closed - Opened by lhoestq 7 days ago
- 1 comment
#7397 - Kannada dataset(Conversations, Wikipedia etc)
Pull Request -
State: open - Opened by Likhith2612 7 days ago
#7396 - Update README.md
Pull Request -
State: closed - Opened by lhoestq 8 days ago
- 1 comment
#7395 - Update docs
Pull Request -
State: closed - Opened by lhoestq 8 days ago
- 1 comment
#7394 - Using load_dataset with data_files and split arguments yields an error
Issue -
State: open - Opened by devon-research 9 days ago
#7393 - Optimized sequence encoding for scalars
Pull Request -
State: closed - Opened by lukasgd 10 days ago
- 1 comment
#7392 - push_to_hub payload too large error when using large ClassLabel feature
Issue -
State: open - Opened by DavidRConnell 10 days ago
- 1 comment
#7391 - AttributeError: module 'pyarrow.lib' has no attribute 'ListViewType'
Issue -
State: open - Opened by LinXin04 10 days ago
#7390 - Re-add py.typed
Issue -
State: open - Opened by NeilGirdhar 11 days ago
Labels: enhancement
#7389 - Getting statistics about filtered examples
Issue -
State: closed - Opened by jonathanasdf 11 days ago
- 2 comments
#7388 - OSError: [Errno 22] Invalid argument forbidden character
Issue -
State: closed - Opened by langflogit 11 days ago
- 2 comments
#7387 - Dynamic adjusting dataloader sampling weight
Issue -
State: open - Opened by whc688 11 days ago
- 3 comments
#7386 - Add bookfolder Dataset Builder for Digital Book Formats
Issue -
State: closed - Opened by shikanime 13 days ago
- 1 comment
Labels: enhancement
#7385 - Make IterableDataset (optionally) resumable
Pull Request -
State: open - Opened by yzhangcs 17 days ago
- 1 comment
#7384 - Support async functions in map()
Pull Request -
State: closed - Opened by lhoestq 18 days ago
- 2 comments
#7382 - Add Pandas, PyArrow and Polars docs
Pull Request -
State: closed - Opened by lhoestq 21 days ago
- 1 comment
#7381 - Iterating over values of a column in the IterableDataset
Issue -
State: open - Opened by TopCoder2K 24 days ago
- 2 comments
Labels: enhancement
#7380 - fix: dill default for version bigger 0.3.8
Pull Request -
State: open - Opened by sam-hey 26 days ago
#7378 - Allow pushing config version to hub
Issue -
State: open - Opened by momeara about 1 month ago
- 1 comment
Labels: enhancement
#7377 - Support for sparse arrays with the Arrow Sparse Tensor format?
Issue -
State: open - Opened by JulesGM about 1 month ago
- 1 comment
Labels: enhancement
#7376 - [docs] uv install
Pull Request -
State: open - Opened by stevhliu about 1 month ago
#7375 - vllm批量推理报错
Issue -
State: open - Opened by YuShengzuishuai about 1 month ago
- 1 comment
#7374 - Remove .h5 from imagefolder extensions
Pull Request -
State: closed - Opened by lhoestq about 1 month ago
#7373 - Excessive RAM Usage After Dataset Concatenation concatenate_datasets
Issue -
State: open - Opened by sam-hey about 1 month ago
- 1 comment
#7372 - Inconsistent Behavior Between `load_dataset` and `load_from_disk` When Loading Sharded Datasets
Issue -
State: open - Opened by gaohongkui about 1 month ago
#7371 - 500 Server error with pushing a dataset
Issue -
State: open - Opened by martinmatak about 1 month ago
- 1 comment
#7370 - Support faster processing using pandas or polars functions in `IterableDataset.map()`
Pull Request -
State: closed - Opened by lhoestq about 1 month ago
- 2 comments
#7369 - Importing dataset gives unhelpful error message when filenames in metadata.csv are not found in the directory
Issue -
State: open - Opened by svencornetsdegroot about 1 month ago
- 1 comment
#7368 - Add with_split to DatasetDict.map
Pull Request -
State: open - Opened by jp1924 about 1 month ago
- 5 comments
#7366 - Dataset.from_dict() can't handle large dict
Issue -
State: open - Opened by CSU-OSS about 1 month ago
#7365 - A parameter is specified but not used in datasets.arrow_dataset.Dataset.from_pandas()
Issue -
State: open - Opened by NourOM02 about 1 month ago
#7364 - API endpoints for gated dataset access requests
Issue -
State: closed - Opened by jerome-white about 1 month ago
- 3 comments
Labels: enhancement
#7363 - ImportError: To support decoding images, please install 'Pillow'.
Issue -
State: open - Opened by jamessdixon about 1 month ago
- 3 comments
#7362 - HuggingFace CLI dataset download raises error
Issue -
State: closed - Opened by ajayvohra2005 about 1 month ago
- 3 comments
#7361 - Fix lock permission
Pull Request -
State: open - Opened by cih9088 about 2 months ago
#7360 - error when loading dataset in Hugging Face: NoneType error is not callable
Issue -
State: open - Opened by nanu23333 about 2 months ago
- 3 comments
#7359 - There are multiple 'mteb/arguana' configurations in the cache: default, corpus, queries with HF_HUB_OFFLINE=1
Issue -
State: open - Opened by Bhavya6187 about 2 months ago
- 1 comment
#7358 - Fix remove_columns in the formatted case
Pull Request -
State: open - Opened by lhoestq about 2 months ago
- 1 comment
#7357 - Python process aborded with GIL issue when using image dataset
Issue -
State: open - Opened by AlexKoff88 about 2 months ago
- 1 comment
#7356 - How about adding a feature to pass the key when performing map on DatasetDict?
Issue -
State: open - Opened by jp1924 about 2 months ago
- 6 comments
Labels: enhancement
#7355 - Not available datasets[audio] on python 3.13
Issue -
State: open - Opened by sergiosinlimites about 2 months ago
- 1 comment
#7354 - A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.2 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
Issue -
State: closed - Opened by jamessdixon about 2 months ago
- 1 comment
#7353 - changes to MappedExamplesIterable to resolve #7345
Pull Request -
State: closed - Opened by vttrifonov about 2 months ago
- 2 comments
#7352 - fsspec 2024.12.0
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 1 comment
#7350 - Bump hfh to 0.24 to fix ci
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 1 comment
#7349 - Webdataset special columns in last position
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 1 comment
#7348 - Catch OSError for arrow
Pull Request -
State: closed - Opened by lhoestq about 2 months ago
- 1 comment
#7347 - Converting Arrow to WebDataset TAR Format for Offline Use
Issue -
State: closed - Opened by katie312 about 2 months ago
- 4 comments
Labels: enhancement
#7346 - OSError: Invalid flatbuffers message.
Issue -
State: closed - Opened by antecede about 2 months ago
- 3 comments
#7345 - Different behaviour of IterableDataset.map vs Dataset.map with remove_columns
Issue -
State: closed - Opened by vttrifonov about 2 months ago
- 1 comment
#7344 - HfHubHTTPError: 429 Client Error: Too Many Requests for URL when trying to access SlimPajama-627B or c4 on TPUs
Issue -
State: closed - Opened by clankur 2 months ago
- 2 comments
#7343 - [Bug] Inconsistent behavior of data_files and data_dir in load_dataset method.
Issue -
State: closed - Opened by JasonCZH4 2 months ago
- 4 comments
#7342 - Update LICENSE
Pull Request -
State: closed - Opened by eliebak 2 months ago
- 1 comment
#7341 - minor video docs on how to install
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7340 - don't import soundfile in tests
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7339 - Update CONTRIBUTING.md
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7337 - One or several metadata.jsonl were found, but not in the same directory or in a parent directory of
Issue -
State: open - Opened by mst272 2 months ago
- 1 comment
#7336 - Clarify documentation or Create DatasetCard
Issue -
State: open - Opened by August-murr 2 months ago
Labels: enhancement
#7335 - Too many open files: '/root/.cache/huggingface/token'
Issue -
State: open - Opened by kopyl 2 months ago
#7334 - TypeError: Value.__init__() missing 1 required positional argument: 'dtype'
Issue -
State: open - Opened by kakamond 2 months ago
#7328 - Fix typo in arrow_dataset
Pull Request -
State: closed - Opened by AndreaFrancis 2 months ago
- 1 comment
#7327 - .map() is not caching and ram goes OOM
Issue -
State: open - Opened by simeneide 2 months ago
- 1 comment
#7326 - Remove upper bound for fsspec
Issue -
State: open - Opened by fellhorn 2 months ago
- 1 comment
#7325 - Introduce pdf support (#7318)
Pull Request -
State: open - Opened by yabramuvdi 2 months ago
- 3 comments
#7323 - Unexpected cache behaviour using load_dataset
Issue -
State: closed - Opened by Moritz-Wirth 2 months ago
- 1 comment
#7322 - ArrowInvalid: JSON parse error: Column() changed from object to array in row 0
Issue -
State: open - Opened by CLL112 2 months ago
- 1 comment
#7321 - ImportError: cannot import name 'set_caching_enabled' from 'datasets'
Issue -
State: open - Opened by sankexin 2 months ago
- 2 comments
#7320 - ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['label']
Issue -
State: closed - Opened by atrompeterog 2 months ago
- 1 comment
#7319 - set dev version
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7318 - Introduce support for PDFs
Issue -
State: open - Opened by yabramuvdi 2 months ago
- 6 comments
Labels: enhancement
#7317 - Release: 3.2.0
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7316 - More docs to from_dict to mention that the result lives in RAM
Pull Request -
State: closed - Opened by lhoestq 2 months ago
- 1 comment
#7315 - Allow manual configuration of Dataset Viewer for datasets not created with the `datasets` library
Issue -
State: open - Opened by diarray-hub 3 months ago
- 13 comments
#7314 - Resolved for empty datafiles
Pull Request -
State: open - Opened by sahillihas 2 months ago
- 2 comments
#7313 - Cannot create a dataset with relative audio path
Issue -
State: open - Opened by sedol1339 2 months ago
- 3 comments
#7312 - [Audio Features - DO NOT MERGE] PoC for adding an offset+sliced reading to audio file.
Pull Request -
State: open - Opened by TParcollet 3 months ago
#7312 - [Audio Features - DO NOT MERGE] PoC for adding an offset+sliced reading to audio file.
Pull Request -
State: open - Opened by TParcollet 3 months ago
#7311 - How to get the original dataset name with username?
Issue -
State: open - Opened by npuichigo 3 months ago
- 2 comments
Labels: enhancement
#7311 - How to get the original dataset name with username?
Issue -
State: open - Opened by npuichigo 3 months ago
Labels: enhancement