Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/datasets issues and pull requests

#6256 - load_dataset() function's cache_dir does not seems to work

Issue - State: open - Opened by andyzhu about 1 year ago - 4 comments

#6255 - Parallelize builder configs creation

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 5 comments

#6253 - Check builder cls default config name in inspect

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 4 comments

#6252 - exif_transpose not done to Image (PIL problem)

Issue - State: closed - Opened by rhajou about 1 year ago - 2 comments
Labels: enhancement

#6251 - Support streaming datasets with pyarrow.parquet.read_table

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 10 comments

#6247 - Update create_dataset.mdx

Pull Request - State: closed - Opened by EswarDivi about 1 year ago - 2 comments

#6246 - Add new column to dataset

Issue - State: closed - Opened by andysingal about 1 year ago - 4 comments

#6244 - Add support for `fsspec>=2023.9.0`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 19 comments

#6243 - Fix cast from fixed size list to variable size list

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 6 comments

#6242 - Data alteration when loading dataset with unspecified inner sequence length

Issue - State: closed - Opened by qgallouedec about 1 year ago - 2 comments

#6241 - Remove unused global variables in `audio.py`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6240 - Dataloader stuck on multiple GPUs

Issue - State: closed - Opened by kuri54 about 1 year ago - 2 comments

#6239 - Load local audio data doesn't work

Issue - State: closed - Opened by abodacs about 1 year ago - 2 comments

#6237 - Tokenization with multiple workers is too slow

Issue - State: closed - Opened by macabdul9 about 1 year ago - 1 comment

#6236 - Support buffer shuffle for to_tf_dataset

Issue - State: open - Opened by EthanRock about 1 year ago - 3 comments
Labels: enhancement

#6235 - Support multiprocessing for download/extract nestedly

Issue - State: open - Opened by hgt312 about 1 year ago
Labels: enhancement

#6233 - Update README.md

Pull Request - State: closed - Opened by NinoRisteski about 1 year ago - 2 comments

#6232 - Improve error message for missing function parameters

Pull Request - State: closed - Opened by suavemint about 1 year ago - 3 comments

#6231 - Overwrite legacy default config name in `dataset_infos.json` in packaged datasets

Pull Request - State: open - Opened by polinaeterna about 1 year ago - 9 comments

#6230 - Don't skip hidden files in `dl_manager.iter_files` when they are given as input

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6229 - Apply inference on all images in the dataset

Issue - State: closed - Opened by andysingal about 1 year ago - 3 comments

#6228 - Remove RGB -> BGR image conversion in Object Detection tutorial

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6226 - Add push_to_hub with multiple configs docs

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6225 - Conversion from RGB to BGR in Object Detection tutorial

Issue - State: closed - Opened by samokhinv about 1 year ago - 1 comment

#6224 - Ignore `dataset_info.json` in data files resolution

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6223 - Update README.md

Pull Request - State: closed - Opened by NinoRisteski about 1 year ago - 2 comments

#6222 - fix typo in Audio dataset documentation

Pull Request - State: closed - Opened by prassanna-ravishankar about 1 year ago - 2 comments

#6221 - Support saving datasets with custom formatting

Issue - State: open - Opened by mariosasko about 1 year ago - 1 comment

#6220 - Set dev version

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6219 - Release: 2.14.5

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6218 - Rename old push_to_hub configs to "default" in dataset_infos

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 8 comments

#6217 - `Dataset.to_dict()` ignore `decode=True` with Image feature

Issue - State: open - Opened by qgallouedec about 1 year ago - 1 comment

#6216 - Release: 2.13.2

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 5 comments

#6215 - Fix checking patterns to infer packaged builder

Pull Request - State: closed - Opened by polinaeterna about 1 year ago - 3 comments

#6214 - Unpin fsspec < 2023.9.0

Issue - State: closed - Opened by albertvillanova about 1 year ago
Labels: enhancement

#6213 - Better list array values handling in cast/embed storage

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 5 comments

#6212 - Tilde (~) is not supported for data_files

Issue - State: open - Opened by exs-avianello about 1 year ago - 2 comments

#6211 - Fix empty splitinfo json

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 4 comments

#6210 - Temporarily pin fsspec < 2023.9.0

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6209 - CI is broken with AssertionError: 3 failed, 12 errors

Issue - State: closed - Opened by albertvillanova about 1 year ago
Labels: bug

#6208 - Do not filter out .zip extensions from no-script datasets

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 6 comments

#6207 - No-script datasets with ZIP files do not load

Issue - State: closed - Opened by albertvillanova about 1 year ago
Labels: bug

#6203 - Support loading from a DVC remote repository

Issue - State: closed - Opened by bilelomrani1 about 1 year ago - 4 comments
Labels: enhancement

#6202 - avoid downgrading jax version

Issue - State: closed - Opened by chrisflesher about 1 year ago - 1 comment
Labels: enhancement

#6201 - Fix to_json ValueError and remove pandas pin

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6200 - Temporarily pin pandas < 2.1.0

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 3 comments

#6199 - Use load_dataset for local json files, but it not works

Issue - State: open - Opened by Garen-in-bush about 1 year ago - 2 comments

#6198 - Preserve split order in DataFilesDict

Pull Request - State: closed - Opened by albertvillanova about 1 year ago - 4 comments

#6196 - Split order is not preserved

Issue - State: closed - Opened by albertvillanova about 1 year ago
Labels: bug

#6195 - Force to reuse cache at given path

Issue - State: closed - Opened by Luosuu about 1 year ago - 2 comments

#6194 - Support custom fingerprinting with `Dataset.from_generator`

Issue - State: open - Opened by bilelomrani1 about 1 year ago - 5 comments
Labels: enhancement

#6193 - Dataset loading script method does not work with .pyc file

Issue - State: open - Opened by riteshkumarumassedu about 1 year ago - 3 comments

#6192 - Set minimal fsspec version requirement to 2023.1.0

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 5 comments

#6191 - Add missing `revision` argument

Pull Request - State: closed - Opened by qgallouedec about 1 year ago - 4 comments

#6190 - `Invalid user token` even when correct user token is passed!

Issue - State: closed - Opened by Vaibhavs10 about 1 year ago - 2 comments

#6189 - Don't alter input in Features.from_dict

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6186 - Feature request: add code example of multi-GPU processing

Issue - State: closed - Opened by NielsRogge about 1 year ago - 16 comments
Labels: documentation, enhancement

#6184 - Map cache does not detect function changes in another module

Issue - State: closed - Opened by jonathanasdf about 1 year ago - 2 comments
Labels: duplicate

#6183 - Load dataset with non-existent file

Issue - State: closed - Opened by freQuensy23-coder about 1 year ago - 2 comments

#6181 - Fix import in `image_load` doc

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6180 - Use `hf-internal-testing` repos for hosting test dataset repos

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6179 - Map cache with tokenizer

Issue - State: open - Opened by jonathanasdf about 1 year ago - 4 comments

#6178 - 'import datasets' throws "invalid syntax error"

Issue - State: closed - Opened by elia-ashraf about 1 year ago - 1 comment

#6177 - Use object detection images from `huggingface/documentation-images`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6176 - how to limit the size of memory mapped file?

Issue - State: open - Opened by williamium3000 about 1 year ago - 6 comments

#6175 - PyArrow 13 CI fixes

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 3 comments

#6173 - Fix CI for pyarrow 13.0.0

Issue - State: closed - Opened by lhoestq about 1 year ago

#6172 - Make Dataset streaming queries retryable

Issue - State: open - Opened by rojagtap about 1 year ago - 4 comments
Labels: enhancement

#6171 - Fix typo in about_mapstyle_vs_iterable.mdx

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6170 - feat: Return the name of the currently loaded file

Pull Request - State: open - Opened by Amitesh-Patel about 1 year ago - 1 comment

#6169 - Configurations in yaml not working

Issue - State: open - Opened by tsor13 about 1 year ago - 4 comments

#6168 - Fix ArrayXD YAML conversion

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 6 comments

#6167 - Allow hyphen in split name

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 5 comments

#6166 - Document BUILDER_CONFIG_CLASS

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments

#6165 - Fix multiprocessing with spawn in iterable datasets

Pull Request - State: closed - Opened by Hubert-Bonisseur about 1 year ago - 5 comments

#6161 - Fix protocol prefix for Beam

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 5 comments

#6160 - Fix Parquet loading with `columns`

Pull Request - State: closed - Opened by mariosasko about 1 year ago - 4 comments

#6159 - Add `BoundingBox` feature

Issue - State: open - Opened by mariosasko about 1 year ago
Labels: enhancement

#6158 - [docs] Complete `to_iterable_dataset`

Pull Request - State: closed - Opened by stevhliu about 1 year ago - 2 comments

#6155 - Raise FileNotFoundError when passing data_files that don't exist

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 5 comments

#6154 - Use yaml instead of get data patterns when possible

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 6 comments

#6153 - custom load dataset to hub

Issue - State: closed - Opened by andysingal about 1 year ago - 5 comments

#6152 - FolderBase Dataset automatically resolves under current directory when data_dir is not specified

Issue - State: open - Opened by npuichigo about 1 year ago - 14 comments
Labels: good first issue

#6151 - Faster sorting for single key items

Issue - State: closed - Opened by jackapbutler about 1 year ago - 2 comments
Labels: enhancement

#6150 - Allow dataset implement .take

Issue - State: open - Opened by brando90 about 1 year ago - 4 comments
Labels: enhancement

#6149 - Dataset.from_parquet cannot load subset of columns

Issue - State: closed - Opened by dwyatte about 1 year ago - 1 comment

#6148 - Ignore parallel warning in map_nested

Pull Request - State: closed - Opened by lhoestq about 1 year ago - 3 comments