Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / rom1504/img2dataset issues and pull requests

#349 - ValueError: Invalid output format webdataset\--output_folder

Issue - State: closed - Opened by YMKiii over 1 year ago

#348 - How can I download directly to the original size without resizing?

Issue - State: closed - Opened by TiaoziLiao over 1 year ago - 7 comments

#346 - Implement sanity check for hash value

Issue - State: closed - Opened by geroldmeisinger over 1 year ago - 3 comments

#345 - Ctrl+C doesn't work on Windows 10 using Miniconda

Issue - State: open - Opened by geroldmeisinger over 1 year ago - 1 comment

#344 - json not written and text file empty for files omitted from re-encoding

Issue - State: closed - Opened by geroldmeisinger over 1 year ago - 2 comments

#343 - The download process goes on forever

Issue - State: open - Opened by novruzgurbanov over 1 year ago - 6 comments

#342 - [BUGFIX] .txt file with commas in URLs

Pull Request - State: open - Opened by bkj over 1 year ago - 3 comments

#341 - fix int conversion failure caused by macos hidden file in logger.py

Pull Request - State: closed - Opened by FlyHighest over 1 year ago - 1 comment

#340 - fix int conversion failure caused by macos hidden file in logger.py

Pull Request - State: closed - Opened by FlyHighest over 1 year ago

#339 - Refactor as a (self hosted) service

Issue - State: open - Opened by rom1504 over 1 year ago - 2 comments

#338 - High Initial RAM Usage Leads to Crashes

Issue - State: open - Opened by Sypherd over 1 year ago - 2 comments

#337 - Support all compression types via fsspec

Issue - State: open - Opened by Skylion007 over 1 year ago - 1 comment

#336 - Package knot resolver as a python package and use it

Issue - State: open - Opened by rom1504 over 1 year ago - 1 comment
Labels: help wanted

#335 - minor fix laion-high-resolution.md

Pull Request - State: closed - Opened by ShoufaChen over 1 year ago

#334 - Cloudflare R2 Compatibility

Issue - State: closed - Opened by zanussbaum over 1 year ago - 6 comments

#333 - is there a way to retry failed urls after the job has completed?

Issue - State: open - Opened by yxchng over 1 year ago - 8 comments

#332 - Do not retry 404 links

Issue - State: open - Opened by Skylion007 over 1 year ago - 1 comment

#331 - Implement Exponential Backoff

Issue - State: open - Opened by Skylion007 over 1 year ago - 7 comments

#330 - distributed downloader freeze under 'cluster mode'

Issue - State: open - Opened by zwsjink over 1 year ago

#329 - Any examples on how to pass in a url_list stored on OSS (s3 like)

Issue - State: open - Opened by zwsjink over 1 year ago - 3 comments

#328 - Adding DataComp-1B info

Pull Request - State: closed - Opened by gabrielilharco over 1 year ago

#327 - many write error while using oss (s3-like) remote bucket storage

Issue - State: open - Opened by ldfandian over 1 year ago - 3 comments

#324 - does img2dataset support jsonl as input format?

Issue - State: closed - Opened by ldfandian over 1 year ago - 5 comments

#323 - Add instructions to get datacomp1B

Issue - State: open - Opened by rom1504 over 1 year ago

#322 - The link to commonpool.md is dead

Issue - State: closed - Opened by wwfcnu over 1 year ago - 1 comment

#320 - Remove tmp_dir only if the output dir is not in s3

Pull Request - State: closed - Opened by erezzarum over 1 year ago - 3 comments

#318 - error: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, g

Issue - State: open - Opened by tom666tom666 over 1 year ago - 6 comments

#317 - pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(URL) in NSFW: string

Issue - State: closed - Opened by qnyan over 1 year ago - 2 comments

#316 - use a proxy when downloading images

Issue - State: open - Opened by jelech over 1 year ago - 1 comment

#315 - Support mosaic streaming

Issue - State: open - Opened by rom1504 over 1 year ago

#314 - Consider switching to fiddle config

Issue - State: open - Opened by rom1504 over 1 year ago

#313 - [laion high resolution] How to only extract a certain number of images?

Issue - State: closed - Opened by jS5t3r almost 2 years ago - 4 comments

#312 - Replace specific opt-out support with datadiligence package for more general opt-out support

Pull Request - State: open - Opened by Padge91 almost 2 years ago
Labels: filtering

#311 - MacOS hidden files cause logger process exit

Issue - State: open - Opened by FlyHighest almost 2 years ago - 1 comment

#310 - some meta data are missing

Issue - State: closed - Opened by zhangvia almost 2 years ago - 1 comment

#309 - Add code highlighting to the README

Pull Request - State: closed - Opened by bryant1410 almost 2 years ago - 2 comments

#307 - Switch to requests to check headers before streaming content

Pull Request - State: open - Opened by raincoastchris almost 2 years ago - 1 comment
Labels: filtering

#306 - Adding CommonPool instructions

Pull Request - State: closed - Opened by gabrielilharco almost 2 years ago

#304 - Documentation enhancement on robots.txt and scraping (PR issue)

Issue - State: open - Opened by maathieu almost 2 years ago - 4 comments

#303 - Noncompliance by default with the General Data Protection Regulation (GDPR/RGPD)

Issue - State: closed - Opened by jmaris almost 2 years ago - 11 comments

#302 - Read and cache robots.txt files for each host using thread-local storage

Pull Request - State: open - Opened by ephphatha almost 2 years ago - 6 comments
Labels: filtering

#301 - Include none directive if either of noindex,nofollow are specified

Pull Request - State: open - Opened by ephphatha almost 2 years ago
Labels: filtering

#300 - Set a user agent string that matches convention used by libraries/tools

Pull Request - State: open - Opened by ephphatha almost 2 years ago
Labels: filtering

#299 - Implement HEAD followed by GET to reduce traffic when headers are present

Issue - State: open - Opened by rom1504 almost 2 years ago - 1 comment

#298 - img2dataset ignores X-Robots-Tag

Issue - State: closed - Opened by Catbuttes almost 2 years ago - 8 comments

#297 - Correct webp lossless statement

Pull Request - State: closed - Opened by jonasricker almost 2 years ago - 1 comment

#295 - Test pex in ci

Pull Request - State: closed - Opened by rom1504 almost 2 years ago

#294 - Support recording image license

Issue - State: closed - Opened by tomchiverton almost 2 years ago - 7 comments

#293 - Please make this tool "opt-in" by default

Issue - State: closed - Opened by edent almost 2 years ago - 29 comments

#291 - Unable to use img2dataset to download laion-high-resolution without install chardet

Pull Request - State: closed - Opened by Shamik-07 almost 2 years ago - 5 comments

#290 - Widen pyarrow dependency range

Pull Request - State: closed - Opened by malcolmgreaves almost 2 years ago - 1 comment

#289 - Process hanging forever before the end

Issue - State: open - Opened by HugoLaurencon almost 2 years ago - 13 comments

#288 - Fix README regarding lossless webp

Issue - State: closed - Opened by jonasricker almost 2 years ago - 1 comment

#287 - Are there plans to support WebP?

Issue - State: closed - Opened by CS123n almost 2 years ago - 1 comment

#286 - Official .pex File Does not Support output_format="tfrecord"

Issue - State: closed - Opened by zw615 almost 2 years ago - 4 comments

#285 - Can I try downloading LAION400M with multiple PC?

Issue - State: open - Opened by sunggukcha almost 2 years ago - 1 comment

#284 - Respect robots.txt

Issue - State: closed - Opened by slavakurilyak almost 2 years ago - 1 comment

#283 - Add option to allow newlines in captions

Pull Request - State: open - Opened by achalddave almost 2 years ago - 5 comments

#281 - Multiple Alerts of Malicious URLs

Issue - State: closed - Opened by vatsalmoradiya almost 2 years ago - 6 comments

#279 - Lower Success Rate when output_format=files

Issue - State: open - Opened by zw615 about 2 years ago

#278 - Downloading cc3m with some wrong

Issue - State: open - Opened by knight4u13 about 2 years ago - 1 comment

#277 - Failed to download all of LAION-400M

Issue - State: closed - Opened by zw615 about 2 years ago - 10 comments

#276 - Bbox crop implementation

Pull Request - State: open - Opened by vanga about 2 years ago - 3 comments

#275 - Support for bounding box cropping

Issue - State: open - Opened by vanga about 2 years ago - 4 comments

#274 - white borders when downloading image

Issue - State: open - Opened by youngfly11 about 2 years ago - 1 comment

#273 - add databricks notebook

Pull Request - State: closed - Opened by smellslikeml about 2 years ago

#272 - Adding ray as a distributor

Pull Request - State: closed - Opened by Vaishaal about 2 years ago - 10 comments

#271 - CItation

Pull Request - State: closed - Opened by dpaleka about 2 years ago - 1 comment

#270 - Citing this repo

Issue - State: closed - Opened by dpaleka about 2 years ago - 2 comments

#269 - [wip] Duration + head

Pull Request - State: closed - Opened by rom1504 about 2 years ago - 1 comment

#268 - consider reverting breaking change md5 -> sha256

Issue - State: open - Opened by rom1504 about 2 years ago

#267 - Add ocifs for object-storage users.

Pull Request - State: closed - Opened by kuno989 about 2 years ago - 2 comments

#265 - use a pipeline concept to refactor downloader.py

Issue - State: open - Opened by rom1504 about 2 years ago

#263 - pycurl

Pull Request - State: closed - Opened by rom1504 about 2 years ago - 1 comment

#262 - wip good bad pool

Pull Request - State: closed - Opened by rom1504 about 2 years ago - 3 comments

#261 - Figure out how to timeout

Issue - State: open - Opened by rom1504 about 2 years ago - 19 comments

#260 - Add asyncio implementation of downloader

Pull Request - State: closed - Opened by KohakuBlueleaf about 2 years ago - 4 comments

#259 - opencv-python => opencv-python-headless

Pull Request - State: closed - Opened by shionhonda about 2 years ago - 2 comments

#258 - Verify hashes during download.

Pull Request - State: closed - Opened by GeorgiosSmyrnis about 2 years ago - 6 comments

#257 - Release 1.40.0

Pull Request - State: closed - Opened by gabrielilharco about 2 years ago

#256 - investigate async

Issue - State: open - Opened by rom1504 about 2 years ago - 2 comments

#255 - Add support for extra hashes

Pull Request - State: closed - Opened by GeorgiosSmyrnis about 2 years ago

#254 - Bump ffspec version to 2022.11

Pull Request - State: closed - Opened by gabrielilharco about 2 years ago

#253 - Proper blurring when padding/cropping.

Pull Request - State: closed - Opened by GeorgiosSmyrnis about 2 years ago - 3 comments

#252 - Very low speed on subcaptions and mscoco dataset

Issue - State: open - Opened by KohakuBlueleaf about 2 years ago - 13 comments

#251 - Release 1.38.0

Pull Request - State: closed - Opened by rom1504 about 2 years ago

#250 - Add information in the readme to promote the rights of AI artists and AI trainers

Issue - State: closed - Opened by rom1504 about 2 years ago - 4 comments

#249 - Respect x robots tag by default

Pull Request - State: closed - Opened by Stealcase about 2 years ago - 19 comments

#248 - Respect x-robots-tag directives by default

Issue - State: closed - Opened by Stealcase about 2 years ago - 1 comment

#247 - Does an 'noai' opt-out directive even work?

Issue - State: closed - Opened by milotheirself about 2 years ago - 5 comments