Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/datasets issues and pull requests
#6256 - load_dataset() function's cache_dir does not seems to work
Issue -
State: open - Opened by andyzhu about 1 year ago
- 4 comments
#6255 - Parallelize builder configs creation
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 5 comments
#6254 - Dataset.from_generator() cost much more time in vscode debugging mode then running mode
Issue -
State: closed - Opened by dontnet-wuenze about 1 year ago
- 1 comment
#6253 - Check builder cls default config name in inspect
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 4 comments
#6252 - exif_transpose not done to Image (PIL problem)
Issue -
State: closed - Opened by rhajou about 1 year ago
- 2 comments
Labels: enhancement
#6251 - Support streaming datasets with pyarrow.parquet.read_table
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 10 comments
#6247 - Update create_dataset.mdx
Pull Request -
State: closed - Opened by EswarDivi about 1 year ago
- 2 comments
#6246 - Add new column to dataset
Issue -
State: closed - Opened by andysingal about 1 year ago
- 4 comments
#6244 - Add support for `fsspec>=2023.9.0`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 19 comments
#6243 - Fix cast from fixed size list to variable size list
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 6 comments
#6242 - Data alteration when loading dataset with unspecified inner sequence length
Issue -
State: closed - Opened by qgallouedec about 1 year ago
- 2 comments
#6241 - Remove unused global variables in `audio.py`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6240 - Dataloader stuck on multiple GPUs
Issue -
State: closed - Opened by kuri54 about 1 year ago
- 2 comments
#6239 - Load local audio data doesn't work
Issue -
State: closed - Opened by abodacs about 1 year ago
- 2 comments
#6238 - `dataset.filter` ALWAYS removes the first item from the dataset when using batched=True
Issue -
State: closed - Opened by Taytay about 1 year ago
- 2 comments
#6237 - Tokenization with multiple workers is too slow
Issue -
State: closed - Opened by macabdul9 about 1 year ago
- 1 comment
#6236 - Support buffer shuffle for to_tf_dataset
Issue -
State: open - Opened by EthanRock about 1 year ago
- 3 comments
Labels: enhancement
#6235 - Support multiprocessing for download/extract nestedly
Issue -
State: open - Opened by hgt312 about 1 year ago
Labels: enhancement
#6233 - Update README.md
Pull Request -
State: closed - Opened by NinoRisteski about 1 year ago
- 2 comments
#6232 - Improve error message for missing function parameters
Pull Request -
State: closed - Opened by suavemint about 1 year ago
- 3 comments
#6231 - Overwrite legacy default config name in `dataset_infos.json` in packaged datasets
Pull Request -
State: open - Opened by polinaeterna about 1 year ago
- 9 comments
#6230 - Don't skip hidden files in `dl_manager.iter_files` when they are given as input
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6229 - Apply inference on all images in the dataset
Issue -
State: closed - Opened by andysingal about 1 year ago
- 3 comments
#6228 - Remove RGB -> BGR image conversion in Object Detection tutorial
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 3 comments
#6226 - Add push_to_hub with multiple configs docs
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 3 comments
#6225 - Conversion from RGB to BGR in Object Detection tutorial
Issue -
State: closed - Opened by samokhinv about 1 year ago
- 1 comment
#6224 - Ignore `dataset_info.json` in data files resolution
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 3 comments
#6223 - Update README.md
Pull Request -
State: closed - Opened by NinoRisteski about 1 year ago
- 2 comments
#6222 - fix typo in Audio dataset documentation
Pull Request -
State: closed - Opened by prassanna-ravishankar about 1 year ago
- 2 comments
#6221 - Support saving datasets with custom formatting
Issue -
State: open - Opened by mariosasko about 1 year ago
- 1 comment
#6220 - Set dev version
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 3 comments
#6219 - Release: 2.14.5
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 4 comments
#6218 - Rename old push_to_hub configs to "default" in dataset_infos
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 8 comments
#6217 - `Dataset.to_dict()` ignore `decode=True` with Image feature
Issue -
State: open - Opened by qgallouedec about 1 year ago
- 1 comment
#6216 - Release: 2.13.2
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 5 comments
#6215 - Fix checking patterns to infer packaged builder
Pull Request -
State: closed - Opened by polinaeterna about 1 year ago
- 3 comments
#6214 - Unpin fsspec < 2023.9.0
Issue -
State: closed - Opened by albertvillanova about 1 year ago
Labels: enhancement
#6213 - Better list array values handling in cast/embed storage
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 5 comments
#6212 - Tilde (~) is not supported for data_files
Issue -
State: open - Opened by exs-avianello about 1 year ago
- 2 comments
#6211 - Fix empty splitinfo json
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 4 comments
#6210 - Temporarily pin fsspec < 2023.9.0
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 3 comments
#6209 - CI is broken with AssertionError: 3 failed, 12 errors
Issue -
State: closed - Opened by albertvillanova about 1 year ago
Labels: bug
#6208 - Do not filter out .zip extensions from no-script datasets
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 6 comments
#6207 - No-script datasets with ZIP files do not load
Issue -
State: closed - Opened by albertvillanova about 1 year ago
Labels: bug
#6206 - When calling load_dataset, raise error: pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
Issue -
State: closed - Opened by aihao2000 about 1 year ago
- 2 comments
#6203 - Support loading from a DVC remote repository
Issue -
State: closed - Opened by bilelomrani1 about 1 year ago
- 4 comments
Labels: enhancement
#6202 - avoid downgrading jax version
Issue -
State: closed - Opened by chrisflesher about 1 year ago
- 1 comment
Labels: enhancement
#6201 - Fix to_json ValueError and remove pandas pin
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 4 comments
#6200 - Temporarily pin pandas < 2.1.0
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 3 comments
#6199 - Use load_dataset for local json files, but it not works
Issue -
State: open - Opened by Garen-in-bush about 1 year ago
- 2 comments
#6198 - Preserve split order in DataFilesDict
Pull Request -
State: closed - Opened by albertvillanova about 1 year ago
- 4 comments
#6197 - ValueError: 'index=True' is only valid when 'orient' is 'split', 'table', 'index', or 'columns'
Issue -
State: closed - Opened by exs-avianello about 1 year ago
- 3 comments
#6196 - Split order is not preserved
Issue -
State: closed - Opened by albertvillanova about 1 year ago
Labels: bug
#6195 - Force to reuse cache at given path
Issue -
State: closed - Opened by Luosuu about 1 year ago
- 2 comments
#6194 - Support custom fingerprinting with `Dataset.from_generator`
Issue -
State: open - Opened by bilelomrani1 about 1 year ago
- 5 comments
Labels: enhancement
#6193 - Dataset loading script method does not work with .pyc file
Issue -
State: open - Opened by riteshkumarumassedu about 1 year ago
- 3 comments
#6192 - Set minimal fsspec version requirement to 2023.1.0
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 5 comments
#6191 - Add missing `revision` argument
Pull Request -
State: closed - Opened by qgallouedec about 1 year ago
- 4 comments
#6190 - `Invalid user token` even when correct user token is passed!
Issue -
State: closed - Opened by Vaibhavs10 about 1 year ago
- 2 comments
#6189 - Don't alter input in Features.from_dict
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 3 comments
#6188 - [Feature Request] Check the length of batch before writing so that empty batch is allowed
Issue -
State: closed - Opened by namespace-Pt about 1 year ago
- 1 comment
#6187 - Couldn't find a dataset script at /content/tsv/tsv.py or any data file in the same directory
Issue -
State: open - Opened by andysingal about 1 year ago
- 1 comment
#6186 - Feature request: add code example of multi-GPU processing
Issue -
State: closed - Opened by NielsRogge about 1 year ago
- 16 comments
Labels: documentation, enhancement
#6185 - Error in saving the PIL image into *.arrow files using datasets.arrow_writer
Issue -
State: open - Opened by HaozheZhao about 1 year ago
- 1 comment
#6184 - Map cache does not detect function changes in another module
Issue -
State: closed - Opened by jonathanasdf about 1 year ago
- 2 comments
Labels: duplicate
#6183 - Load dataset with non-existent file
Issue -
State: closed - Opened by freQuensy23-coder about 1 year ago
- 2 comments
#6182 - Loading Meteor metric in HF evaluate module crashes due to datasets import issue
Issue -
State: closed - Opened by dsashulya about 1 year ago
- 4 comments
#6181 - Fix import in `image_load` doc
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 3 comments
#6180 - Use `hf-internal-testing` repos for hosting test dataset repos
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6179 - Map cache with tokenizer
Issue -
State: open - Opened by jonathanasdf about 1 year ago
- 4 comments
#6178 - 'import datasets' throws "invalid syntax error"
Issue -
State: closed - Opened by elia-ashraf about 1 year ago
- 1 comment
#6177 - Use object detection images from `huggingface/documentation-images`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6176 - how to limit the size of memory mapped file?
Issue -
State: open - Opened by williamium3000 about 1 year ago
- 6 comments
#6175 - PyArrow 13 CI fixes
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 3 comments
#6173 - Fix CI for pyarrow 13.0.0
Issue -
State: closed - Opened by lhoestq about 1 year ago
#6172 - Make Dataset streaming queries retryable
Issue -
State: open - Opened by rojagtap about 1 year ago
- 4 comments
Labels: enhancement
#6171 - Fix typo in about_mapstyle_vs_iterable.mdx
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 3 comments
#6170 - feat: Return the name of the currently loaded file
Pull Request -
State: open - Opened by Amitesh-Patel about 1 year ago
- 1 comment
#6169 - Configurations in yaml not working
Issue -
State: open - Opened by tsor13 about 1 year ago
- 4 comments
#6168 - Fix ArrayXD YAML conversion
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 6 comments
#6167 - Allow hyphen in split name
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 5 comments
#6166 - Document BUILDER_CONFIG_CLASS
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 3 comments
#6165 - Fix multiprocessing with spawn in iterable datasets
Pull Request -
State: closed - Opened by Hubert-Bonisseur about 1 year ago
- 5 comments
#6164 - Fix: Missing a MetadataConfigs init when the repo has a `datasets_info.json` but no README
Pull Request -
State: closed - Opened by clefourrier about 1 year ago
- 3 comments
#6163 - Error type: ArrowInvalid Details: Failed to parse string: '[254,254]' as a scalar of type int32
Issue -
State: open - Opened by shishirCTC about 1 year ago
- 1 comment
#6162 - load_dataset('json',...) from togethercomputer/RedPajama-Data-1T errors when jsonl rows contains different data fields
Issue -
State: open - Opened by rbrugaro about 1 year ago
- 4 comments
#6161 - Fix protocol prefix for Beam
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 5 comments
#6160 - Fix Parquet loading with `columns`
Pull Request -
State: closed - Opened by mariosasko about 1 year ago
- 4 comments
#6159 - Add `BoundingBox` feature
Issue -
State: open - Opened by mariosasko about 1 year ago
Labels: enhancement
#6158 - [docs] Complete `to_iterable_dataset`
Pull Request -
State: closed - Opened by stevhliu about 1 year ago
- 2 comments
#6157 - DatasetInfo.__init__() got an unexpected keyword argument '_column_requires_decoding'
Issue -
State: closed - Opened by aihao2000 about 1 year ago
- 13 comments
#6156 - Why not use self._epoch as seed to shuffle in distributed training with IterableDataset
Issue -
State: closed - Opened by npuichigo about 1 year ago
- 3 comments
#6155 - Raise FileNotFoundError when passing data_files that don't exist
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 5 comments
#6154 - Use yaml instead of get data patterns when possible
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 6 comments
#6153 - custom load dataset to hub
Issue -
State: closed - Opened by andysingal about 1 year ago
- 5 comments
#6152 - FolderBase Dataset automatically resolves under current directory when data_dir is not specified
Issue -
State: open - Opened by npuichigo about 1 year ago
- 14 comments
Labels: good first issue
#6151 - Faster sorting for single key items
Issue -
State: closed - Opened by jackapbutler about 1 year ago
- 2 comments
Labels: enhancement
#6150 - Allow dataset implement .take
Issue -
State: open - Opened by brando90 about 1 year ago
- 4 comments
Labels: enhancement
#6149 - Dataset.from_parquet cannot load subset of columns
Issue -
State: closed - Opened by dwyatte about 1 year ago
- 1 comment
#6148 - Ignore parallel warning in map_nested
Pull Request -
State: closed - Opened by lhoestq about 1 year ago
- 3 comments