Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / rom1504/img2dataset issues and pull requests
#334 - Cloudflare R2 Compatibility
Issue -
State: closed - Opened by zanussbaum over 1 year ago
- 6 comments
#333 - is there a way to retry failed urls after the job has completed?
Issue -
State: open - Opened by yxchng over 1 year ago
- 8 comments
#332 - Do not retry 404 links
Issue -
State: open - Opened by Skylion007 over 1 year ago
- 1 comment
#331 - Implement Exponential Backoff
Issue -
State: open - Opened by Skylion007 over 1 year ago
- 7 comments
#330 - distributed downloader freeze under 'cluster mode'
Issue -
State: open - Opened by zwsjink over 1 year ago
#329 - Any examples on how to pass in a url_list stored on OSS (s3 like)
Issue -
State: open - Opened by zwsjink over 1 year ago
- 3 comments
#328 - Adding DataComp-1B info
Pull Request -
State: closed - Opened by gabrielilharco over 1 year ago
#327 - many write error while using oss (s3-like) remote bucket storage
Issue -
State: open - Opened by ldfandian over 1 year ago
- 3 comments
#326 - support more intput formats (txt.gz, csv.gz, json.gz, jsonl, jsonl.gz) and add test cases for it
Pull Request -
State: closed - Opened by ldfandian over 1 year ago
- 1 comment
#325 - support more intput formats (txt.gz, csv.gz, json.gz, jsonl, jsonl.gz) and add test cases for it
Pull Request -
State: closed - Opened by ldfandian over 1 year ago
#324 - does img2dataset support jsonl as input format?
Issue -
State: closed - Opened by ldfandian over 1 year ago
- 5 comments
#323 - Add instructions to get datacomp1B
Issue -
State: open - Opened by rom1504 over 1 year ago
#322 - The link to commonpool.md is dead
Issue -
State: closed - Opened by wwfcnu over 1 year ago
- 1 comment
#321 - pyarrow.lib.ArrowInvalid: CSV parse error: Expected 17 columns, got 1
Issue -
State: open - Opened by jelech over 1 year ago
#320 - Remove tmp_dir only if the output dir is not in s3
Pull Request -
State: closed - Opened by erezzarum over 1 year ago
- 3 comments
#319 - Temp dir removal - FileNotFoundError: ['mybucket/data/tests/test_1000_parquet/5/_tmp']
Issue -
State: closed - Opened by erezzarum over 1 year ago
- 2 comments
#318 - error: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, g
Issue -
State: open - Opened by tom666tom666 over 1 year ago
- 6 comments
#317 - pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(URL) in NSFW: string
Issue -
State: closed - Opened by qnyan over 1 year ago
- 1 comment
#316 - use a proxy when downloading images
Issue -
State: open - Opened by jelech over 1 year ago
- 1 comment
#315 - Support mosaic streaming
Issue -
State: open - Opened by rom1504 over 1 year ago
#314 - Consider switching to fiddle config
Issue -
State: open - Opened by rom1504 over 1 year ago
#313 - [laion high resolution] How to only extract a certain number of images?
Issue -
State: closed - Opened by jS5t3r over 1 year ago
- 4 comments
#312 - Replace specific opt-out support with datadiligence package for more general opt-out support
Pull Request -
State: open - Opened by Padge91 over 1 year ago
Labels: filtering
#311 - MacOS hidden files cause logger process exit
Issue -
State: open - Opened by FlyHighest over 1 year ago
- 1 comment
#310 - some meta data are missing
Issue -
State: closed - Opened by zhangvia over 1 year ago
- 1 comment
#309 - Add code highlighting to the README
Pull Request -
State: closed - Opened by bryant1410 over 1 year ago
- 2 comments
#308 - Implement the W3C TDM Reservation Protocol and enable a more standard opt-out mechanism
Issue -
State: open - Opened by llemeurfr over 1 year ago
- 7 comments
#307 - Switch to requests to check headers before streaming content
Pull Request -
State: open - Opened by raincoastchris over 1 year ago
- 1 comment
Labels: filtering
#306 - Adding CommonPool instructions
Pull Request -
State: closed - Opened by gabrielilharco over 1 year ago
#304 - Documentation enhancement on robots.txt and scraping (PR issue)
Issue -
State: open - Opened by maathieu over 1 year ago
- 4 comments
#303 - Noncompliance by default with the General Data Protection Regulation (GDPR/RGPD)
Issue -
State: closed - Opened by jmaris over 1 year ago
- 11 comments
#302 - Read and cache robots.txt files for each host using thread-local storage
Pull Request -
State: open - Opened by ephphatha over 1 year ago
- 6 comments
Labels: filtering
#301 - Include none directive if either of noindex,nofollow are specified
Pull Request -
State: open - Opened by ephphatha over 1 year ago
Labels: filtering
#300 - Set a user agent string that matches convention used by libraries/tools
Pull Request -
State: open - Opened by ephphatha over 1 year ago
Labels: filtering
#299 - Implement HEAD followed by GET to reduce traffic when headers are present
Issue -
State: open - Opened by rom1504 over 1 year ago
- 1 comment
#298 - img2dataset ignores X-Robots-Tag
Issue -
State: closed - Opened by Catbuttes over 1 year ago
- 8 comments
#297 - Correct webp lossless statement
Pull Request -
State: closed - Opened by jonasricker over 1 year ago
- 1 comment
#295 - Test pex in ci
Pull Request -
State: closed - Opened by rom1504 over 1 year ago
#294 - Support recording image license
Issue -
State: closed - Opened by tomchiverton over 1 year ago
- 7 comments
#293 - Please make this tool "opt-in" by default
Issue -
State: closed - Opened by edent over 1 year ago
- 29 comments
#292 - Unable to use img2dataset to download laion-high-resolution without install chardet
Issue -
State: open - Opened by Shamik-07 over 1 year ago
#291 - Unable to use img2dataset to download laion-high-resolution without install chardet
Pull Request -
State: closed - Opened by Shamik-07 over 1 year ago
- 5 comments
#290 - Widen pyarrow dependency range
Pull Request -
State: closed - Opened by malcolmgreaves over 1 year ago
- 1 comment
#289 - Process hanging forever before the end
Issue -
State: open - Opened by HugoLaurencon over 1 year ago
- 13 comments
#288 - Fix README regarding lossless webp
Issue -
State: closed - Opened by jonasricker over 1 year ago
- 1 comment
#287 - Are there plans to support WebP?
Issue -
State: closed - Opened by CS123n over 1 year ago
- 1 comment
#286 - Official .pex File Does not Support output_format="tfrecord"
Issue -
State: closed - Opened by zw615 over 1 year ago
- 4 comments
#285 - Can I try downloading LAION400M with multiple PC?
Issue -
State: open - Opened by sunggukcha over 1 year ago
- 1 comment
#284 - Respect robots.txt
Issue -
State: closed - Opened by slavakurilyak over 1 year ago
- 1 comment
#283 - Add option to allow newlines in captions
Pull Request -
State: open - Opened by achalddave over 1 year ago
- 5 comments
#282 - When use parquet as output format decode the bytes in jpg, the image result color seems wrong.
Issue -
State: closed - Opened by svjack over 1 year ago
- 1 comment
#281 - Multiple Alerts of Malicious URLs
Issue -
State: closed - Opened by vatsalmoradiya over 1 year ago
- 6 comments
#280 - ERROR: Could not build wheels for pyarrow, scikit-image, which is required to install pyproject.toml-based projects
Issue -
State: closed - Opened by precurcor over 1 year ago
- 1 comment
#279 - Lower Success Rate when output_format=files
Issue -
State: open - Opened by zw615 over 1 year ago
#278 - Downloading cc3m with some wrong
Issue -
State: open - Opened by knight4u13 over 1 year ago
- 1 comment
#277 - Failed to download all of LAION-400M
Issue -
State: closed - Opened by zw615 over 1 year ago
- 10 comments
#276 - Bbox crop implementation
Pull Request -
State: open - Opened by vanga almost 2 years ago
- 3 comments
#275 - Support for bounding box cropping
Issue -
State: open - Opened by vanga almost 2 years ago
- 4 comments
#274 - white borders when downloading image
Issue -
State: open - Opened by youngfly11 almost 2 years ago
- 1 comment
#273 - add databricks notebook
Pull Request -
State: closed - Opened by smellslikeml almost 2 years ago
#272 - Adding ray as a distributor
Pull Request -
State: closed - Opened by Vaishaal almost 2 years ago
- 10 comments
#271 - CItation
Pull Request -
State: closed - Opened by dpaleka almost 2 years ago
- 1 comment
#270 - Citing this repo
Issue -
State: closed - Opened by dpaleka almost 2 years ago
- 2 comments
#269 - [wip] Duration + head
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
- 1 comment
#268 - consider reverting breaking change md5 -> sha256
Issue -
State: open - Opened by rom1504 almost 2 years ago
#267 - Add ocifs for object-storage users.
Pull Request -
State: closed - Opened by kuno989 almost 2 years ago
- 2 comments
#266 - consider adding option to use only head and compute stats instead of actually downloading
Issue -
State: open - Opened by rom1504 almost 2 years ago
- 2 comments
#265 - use a pipeline concept to refactor downloader.py
Issue -
State: open - Opened by rom1504 almost 2 years ago
#263 - pycurl
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
- 1 comment
#262 - wip good bad pool
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
- 3 comments
#261 - Figure out how to timeout
Issue -
State: open - Opened by rom1504 almost 2 years ago
- 19 comments
#260 - Add asyncio implementation of downloader
Pull Request -
State: closed - Opened by KohakuBlueleaf almost 2 years ago
- 4 comments
#259 - opencv-python => opencv-python-headless
Pull Request -
State: closed - Opened by shionhonda almost 2 years ago
- 2 comments
#258 - Verify hashes during download.
Pull Request -
State: closed - Opened by GeorgiosSmyrnis almost 2 years ago
- 6 comments
#257 - Release 1.40.0
Pull Request -
State: closed - Opened by gabrielilharco almost 2 years ago
#256 - investigate async
Issue -
State: open - Opened by rom1504 almost 2 years ago
- 2 comments
#255 - Add support for extra hashes
Pull Request -
State: closed - Opened by GeorgiosSmyrnis almost 2 years ago
#254 - Bump ffspec version to 2022.11
Pull Request -
State: closed - Opened by gabrielilharco almost 2 years ago
#253 - Proper blurring when padding/cropping.
Pull Request -
State: closed - Opened by GeorgiosSmyrnis almost 2 years ago
- 3 comments
#252 - Very low speed on subcaptions and mscoco dataset
Issue -
State: open - Opened by KohakuBlueleaf almost 2 years ago
- 13 comments
#251 - Release 1.38.0
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
#250 - Add information in the readme to promote the rights of AI artists and AI trainers
Issue -
State: closed - Opened by rom1504 almost 2 years ago
- 4 comments
#249 - Respect x robots tag by default
Pull Request -
State: closed - Opened by Stealcase almost 2 years ago
- 19 comments
#248 - Respect x-robots-tag directives by default
Issue -
State: closed - Opened by Stealcase almost 2 years ago
- 1 comment
#247 - Does an 'noai' opt-out directive even work?
Issue -
State: closed - Opened by milotheirself almost 2 years ago
- 5 comments
#246 - Release 1.37.0
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
#245 - Respect x-robots-directives by default
Issue -
State: closed - Opened by Stealcase almost 2 years ago
- 5 comments
#244 - Duplicate images in ms coco
Issue -
State: open - Opened by tungdop2 almost 2 years ago
- 3 comments
#243 - Consider to not recommend skip_reencode
Issue -
State: open - Opened by rom1504 almost 2 years ago
- 3 comments
#242 - Failed to download all of CC12M
Issue -
State: closed - Opened by AlaaKhaddaj almost 2 years ago
- 4 comments
#241 - Incorporate face blurring with bounding boxes.
Pull Request -
State: closed - Opened by GeorgiosSmyrnis almost 2 years ago
- 16 comments
#240 - Add support for resizing with fixed aspect ratio while fixing the largest image dimension
Pull Request -
State: closed - Opened by gabrielilharco almost 2 years ago
- 3 comments
#239 - update fsspec
Issue -
State: closed - Opened by rom1504 almost 2 years ago
- 6 comments
#238 - Move resizer arg check in resizer.
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
#237 - Better logger
Issue -
State: open - Opened by rom1504 almost 2 years ago
- 1 comment
#236 - Remove support for python 3.6.
Pull Request -
State: closed - Opened by rom1504 almost 2 years ago
#235 - bumping webdataset version to 0.2.5+
Pull Request -
State: closed - Opened by gabrielilharco almost 2 years ago
- 1 comment
#234 - Export as Arrow
Issue -
State: open - Opened by lhoestq almost 2 years ago
- 2 comments
#233 - Resize mode based on bounding boxes
Pull Request -
State: closed - Opened by vanga almost 2 years ago
- 3 comments
#232 - Error when using "/img2dataset.pex"
Issue -
State: closed - Opened by SCZwangxiao almost 2 years ago
- 1 comment