Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / rom1504/img2dataset issues and pull requests

#334 - Cloudflare R2 Compatibility

Issue - State: closed - Opened by zanussbaum over 1 year ago - 6 comments

#333 - is there a way to retry failed urls after the job has completed?

Issue - State: open - Opened by yxchng over 1 year ago - 8 comments

#332 - Do not retry 404 links

Issue - State: open - Opened by Skylion007 over 1 year ago - 1 comment

#331 - Implement Exponential Backoff

Issue - State: open - Opened by Skylion007 over 1 year ago - 7 comments

#330 - distributed downloader freeze under 'cluster mode'

Issue - State: open - Opened by zwsjink over 1 year ago

#329 - Any examples on how to pass in a url_list stored on OSS (s3 like)

Issue - State: open - Opened by zwsjink over 1 year ago - 3 comments

#328 - Adding DataComp-1B info

Pull Request - State: closed - Opened by gabrielilharco over 1 year ago

#327 - many write error while using oss (s3-like) remote bucket storage

Issue - State: open - Opened by ldfandian over 1 year ago - 3 comments

#324 - does img2dataset support jsonl as input format?

Issue - State: closed - Opened by ldfandian over 1 year ago - 5 comments

#323 - Add instructions to get datacomp1B

Issue - State: open - Opened by rom1504 over 1 year ago

#322 - The link to commonpool.md is dead

Issue - State: closed - Opened by wwfcnu over 1 year ago - 1 comment

#320 - Remove tmp_dir only if the output dir is not in s3

Pull Request - State: closed - Opened by erezzarum over 1 year ago - 3 comments

#318 - error: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, g

Issue - State: open - Opened by tom666tom666 over 1 year ago - 6 comments

#317 - pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(URL) in NSFW: string

Issue - State: closed - Opened by qnyan over 1 year ago - 1 comment

#316 - use a proxy when downloading images

Issue - State: open - Opened by jelech over 1 year ago - 1 comment

#315 - Support mosaic streaming

Issue - State: open - Opened by rom1504 over 1 year ago

#314 - Consider switching to fiddle config

Issue - State: open - Opened by rom1504 over 1 year ago

#313 - [laion high resolution] How to only extract a certain number of images?

Issue - State: closed - Opened by jS5t3r over 1 year ago - 4 comments

#312 - Replace specific opt-out support with datadiligence package for more general opt-out support

Pull Request - State: open - Opened by Padge91 over 1 year ago
Labels: filtering

#311 - MacOS hidden files cause logger process exit

Issue - State: open - Opened by FlyHighest over 1 year ago - 1 comment

#310 - some meta data are missing

Issue - State: closed - Opened by zhangvia over 1 year ago - 1 comment

#309 - Add code highlighting to the README

Pull Request - State: closed - Opened by bryant1410 over 1 year ago - 2 comments

#307 - Switch to requests to check headers before streaming content

Pull Request - State: open - Opened by raincoastchris over 1 year ago - 1 comment
Labels: filtering

#306 - Adding CommonPool instructions

Pull Request - State: closed - Opened by gabrielilharco over 1 year ago

#304 - Documentation enhancement on robots.txt and scraping (PR issue)

Issue - State: open - Opened by maathieu over 1 year ago - 4 comments

#303 - Noncompliance by default with the General Data Protection Regulation (GDPR/RGPD)

Issue - State: closed - Opened by jmaris over 1 year ago - 11 comments

#302 - Read and cache robots.txt files for each host using thread-local storage

Pull Request - State: open - Opened by ephphatha over 1 year ago - 6 comments
Labels: filtering

#301 - Include none directive if either of noindex,nofollow are specified

Pull Request - State: open - Opened by ephphatha over 1 year ago
Labels: filtering

#300 - Set a user agent string that matches convention used by libraries/tools

Pull Request - State: open - Opened by ephphatha over 1 year ago
Labels: filtering

#299 - Implement HEAD followed by GET to reduce traffic when headers are present

Issue - State: open - Opened by rom1504 over 1 year ago - 1 comment

#298 - img2dataset ignores X-Robots-Tag

Issue - State: closed - Opened by Catbuttes over 1 year ago - 8 comments

#297 - Correct webp lossless statement

Pull Request - State: closed - Opened by jonasricker over 1 year ago - 1 comment

#295 - Test pex in ci

Pull Request - State: closed - Opened by rom1504 over 1 year ago

#294 - Support recording image license

Issue - State: closed - Opened by tomchiverton over 1 year ago - 7 comments

#293 - Please make this tool "opt-in" by default

Issue - State: closed - Opened by edent over 1 year ago - 29 comments

#291 - Unable to use img2dataset to download laion-high-resolution without install chardet

Pull Request - State: closed - Opened by Shamik-07 over 1 year ago - 5 comments

#290 - Widen pyarrow dependency range

Pull Request - State: closed - Opened by malcolmgreaves over 1 year ago - 1 comment

#289 - Process hanging forever before the end

Issue - State: open - Opened by HugoLaurencon over 1 year ago - 13 comments

#288 - Fix README regarding lossless webp

Issue - State: closed - Opened by jonasricker over 1 year ago - 1 comment

#287 - Are there plans to support WebP?

Issue - State: closed - Opened by CS123n over 1 year ago - 1 comment

#286 - Official .pex File Does not Support output_format="tfrecord"

Issue - State: closed - Opened by zw615 over 1 year ago - 4 comments

#285 - Can I try downloading LAION400M with multiple PC?

Issue - State: open - Opened by sunggukcha over 1 year ago - 1 comment

#284 - Respect robots.txt

Issue - State: closed - Opened by slavakurilyak over 1 year ago - 1 comment

#283 - Add option to allow newlines in captions

Pull Request - State: open - Opened by achalddave over 1 year ago - 5 comments

#281 - Multiple Alerts of Malicious URLs

Issue - State: closed - Opened by vatsalmoradiya over 1 year ago - 6 comments

#279 - Lower Success Rate when output_format=files

Issue - State: open - Opened by zw615 over 1 year ago

#278 - Downloading cc3m with some wrong

Issue - State: open - Opened by knight4u13 over 1 year ago - 1 comment

#277 - Failed to download all of LAION-400M

Issue - State: closed - Opened by zw615 over 1 year ago - 10 comments

#276 - Bbox crop implementation

Pull Request - State: open - Opened by vanga almost 2 years ago - 3 comments

#275 - Support for bounding box cropping

Issue - State: open - Opened by vanga almost 2 years ago - 4 comments

#274 - white borders when downloading image

Issue - State: open - Opened by youngfly11 almost 2 years ago - 1 comment

#273 - add databricks notebook

Pull Request - State: closed - Opened by smellslikeml almost 2 years ago

#272 - Adding ray as a distributor

Pull Request - State: closed - Opened by Vaishaal almost 2 years ago - 10 comments

#271 - CItation

Pull Request - State: closed - Opened by dpaleka almost 2 years ago - 1 comment

#270 - Citing this repo

Issue - State: closed - Opened by dpaleka almost 2 years ago - 2 comments

#269 - [wip] Duration + head

Pull Request - State: closed - Opened by rom1504 almost 2 years ago - 1 comment

#268 - consider reverting breaking change md5 -> sha256

Issue - State: open - Opened by rom1504 almost 2 years ago

#267 - Add ocifs for object-storage users.

Pull Request - State: closed - Opened by kuno989 almost 2 years ago - 2 comments

#265 - use a pipeline concept to refactor downloader.py

Issue - State: open - Opened by rom1504 almost 2 years ago

#263 - pycurl

Pull Request - State: closed - Opened by rom1504 almost 2 years ago - 1 comment

#262 - wip good bad pool

Pull Request - State: closed - Opened by rom1504 almost 2 years ago - 3 comments

#261 - Figure out how to timeout

Issue - State: open - Opened by rom1504 almost 2 years ago - 19 comments

#260 - Add asyncio implementation of downloader

Pull Request - State: closed - Opened by KohakuBlueleaf almost 2 years ago - 4 comments

#259 - opencv-python => opencv-python-headless

Pull Request - State: closed - Opened by shionhonda almost 2 years ago - 2 comments

#258 - Verify hashes during download.

Pull Request - State: closed - Opened by GeorgiosSmyrnis almost 2 years ago - 6 comments

#257 - Release 1.40.0

Pull Request - State: closed - Opened by gabrielilharco almost 2 years ago

#256 - investigate async

Issue - State: open - Opened by rom1504 almost 2 years ago - 2 comments

#255 - Add support for extra hashes

Pull Request - State: closed - Opened by GeorgiosSmyrnis almost 2 years ago

#254 - Bump ffspec version to 2022.11

Pull Request - State: closed - Opened by gabrielilharco almost 2 years ago

#253 - Proper blurring when padding/cropping.

Pull Request - State: closed - Opened by GeorgiosSmyrnis almost 2 years ago - 3 comments

#252 - Very low speed on subcaptions and mscoco dataset

Issue - State: open - Opened by KohakuBlueleaf almost 2 years ago - 13 comments

#251 - Release 1.38.0

Pull Request - State: closed - Opened by rom1504 almost 2 years ago

#250 - Add information in the readme to promote the rights of AI artists and AI trainers

Issue - State: closed - Opened by rom1504 almost 2 years ago - 4 comments

#249 - Respect x robots tag by default

Pull Request - State: closed - Opened by Stealcase almost 2 years ago - 19 comments

#248 - Respect x-robots-tag directives by default

Issue - State: closed - Opened by Stealcase almost 2 years ago - 1 comment

#247 - Does an 'noai' opt-out directive even work?

Issue - State: closed - Opened by milotheirself almost 2 years ago - 5 comments

#246 - Release 1.37.0

Pull Request - State: closed - Opened by rom1504 almost 2 years ago

#245 - Respect x-robots-directives by default

Issue - State: closed - Opened by Stealcase almost 2 years ago - 5 comments

#244 - Duplicate images in ms coco

Issue - State: open - Opened by tungdop2 almost 2 years ago - 3 comments

#243 - Consider to not recommend skip_reencode

Issue - State: open - Opened by rom1504 almost 2 years ago - 3 comments

#242 - Failed to download all of CC12M

Issue - State: closed - Opened by AlaaKhaddaj almost 2 years ago - 4 comments

#241 - Incorporate face blurring with bounding boxes.

Pull Request - State: closed - Opened by GeorgiosSmyrnis almost 2 years ago - 16 comments

#239 - update fsspec

Issue - State: closed - Opened by rom1504 almost 2 years ago - 6 comments

#238 - Move resizer arg check in resizer.

Pull Request - State: closed - Opened by rom1504 almost 2 years ago

#237 - Better logger

Issue - State: open - Opened by rom1504 almost 2 years ago - 1 comment

#236 - Remove support for python 3.6.

Pull Request - State: closed - Opened by rom1504 almost 2 years ago

#235 - bumping webdataset version to 0.2.5+

Pull Request - State: closed - Opened by gabrielilharco almost 2 years ago - 1 comment

#234 - Export as Arrow

Issue - State: open - Opened by lhoestq almost 2 years ago - 2 comments

#233 - Resize mode based on bounding boxes

Pull Request - State: closed - Opened by vanga almost 2 years ago - 3 comments

#232 - Error when using "/img2dataset.pex"

Issue - State: closed - Opened by SCZwangxiao almost 2 years ago - 1 comment