Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / adbar/trafilatura issues and pull requests
#743 - docs: remove from published packages
Pull Request -
State: closed - Opened by adbar 4 days ago
- 1 comment
#742 - extraction: move max_tree_size parameter to settings.cfg
Pull Request -
State: closed - Opened by adbar 5 days ago
- 1 comment
#741 - Extraction: move `max_tree_size` to config file
Issue -
State: closed - Opened by adbar 5 days ago
Labels: enhancement
#740 - setup: explicit exports through `__all__`
Pull Request -
State: closed - Opened by adbar 9 days ago
- 1 comment
#739 - Extracting full text from an URL returns None
Issue -
State: open - Opened by vrnch 10 days ago
- 2 comments
Labels: question
#738 - Explicitly and fully support type hinting
Issue -
State: open - Opened by adbar 11 days ago
Labels: enhancement
#737 - build(deps): bump the dependencies group with 5 updates
Pull Request -
State: closed - Opened by dependabot[bot] 15 days ago
- 1 comment
Labels: dependencies
#736 - downloads: cleaner urllib3 code
Pull Request -
State: closed - Opened by adbar 15 days ago
- 1 comment
#735 - downloads: better urllib3 setup
Pull Request -
State: closed - Opened by adbar 16 days ago
#734 - CLI downloads: use all information in settings file
Pull Request -
State: closed - Opened by adbar 16 days ago
- 1 comment
#733 - Downloads: fully use information from both `config` and `options` variables
Issue -
State: closed - Opened by adbar 17 days ago
Labels: maintenance
#732 - CLI downloads: make sure all user-specified options are used
Issue -
State: closed - Opened by andyskipper 19 days ago
- 4 comments
Labels: enhancement
#731 - evaluation: review data, update packages, add magic_html
Pull Request -
State: closed - Opened by adbar 19 days ago
- 1 comment
#730 - extraction: deprecate no_fallback and as_dict parameters
Pull Request -
State: closed - Opened by adbar 23 days ago
- 1 comment
#729 - `bare_extraction()`: deprecate `as_dict` parameter
Issue -
State: closed - Opened by adbar 24 days ago
Labels: maintenance
#728 - typing: fix mypy errors
Pull Request -
State: closed - Opened by adbar 24 days ago
- 1 comment
#727 - simplify trim() function
Pull Request -
State: closed - Opened by adbar 25 days ago
- 1 comment
#726 - Focused crawler returns 404 response for robots.txt and stops crawling
Issue -
State: closed - Opened by Guthman 28 days ago
- 1 comment
#725 - `extract()`: replace `no_fallback` argument by `fast`
Issue -
State: closed - Opened by adbar 29 days ago
Labels: maintenance
#724 - downloads: remove `decode` argument in `fetch_url()`
Pull Request -
State: closed - Opened by adbar 29 days ago
- 1 comment
#723 - refactoring: add type hints
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#722 - Deprecate `fetch_url(decode=False)`
Issue -
State: closed - Opened by adbar about 1 month ago
Labels: maintenance
#721 - fix: more robust mapping for conversion to HTML
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#720 - Review HTML element list and conversion
Issue -
State: open - Opened by adbar about 1 month ago
Labels: enhancement
#718 - setup: set `__all__` in `__init__.py`
Issue -
State: closed - Opened by adbar about 1 month ago
Labels: maintenance
#717 - fix: robust encoding in options.source
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#716 - breaking: remove deprecated functions and args
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#715 - setup: use pyproject.toml file
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#714 - logging: better debug messages in main_extractor
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#713 - setup: deprecate current GUI
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#712 - setup: use `pyproject.toml` file
Issue -
State: closed - Opened by adbar about 1 month ago
Labels: maintenance
#711 - Use rst link instead of markdown link in `docs/index.html`
Pull Request -
State: closed - Opened by nzw0301 about 1 month ago
- 1 comment
#710 - metadata: more robust URL extraction
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#709 - maintenance: deprecate 3.6 & 3.7 and simplify code base
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#708 - maintenance: remove superfluous RuntimeError catch
Pull Request -
State: closed - Opened by adbar about 1 month ago
- 1 comment
#707 - fix: set options.source before raising error on empty doc tree
Pull Request -
State: closed - Opened by dmoklaf about 2 months ago
- 2 comments
#706 - build(deps): bump the dependencies group with 5 updates
Pull Request -
State: closed - Opened by dependabot[bot] about 2 months ago
- 1 comment
Labels: dependencies
#705 - Trafilatura crashing due to `options` variable not backfilled yet
Issue -
State: closed - Opened by dmoklaf about 2 months ago
- 1 comment
Labels: bug
#704 - extract function runs indefinitely on large HTML body content
Issue -
State: closed - Opened by hitesh1997 about 2 months ago
- 1 comment
Labels: question
#703 - Download multiple urls with download timeout
Issue -
State: closed - Opened by vodkaslime about 2 months ago
- 2 comments
Labels: documentation
#702 - I can't extract main content from this html,could anyone help me?
Issue -
State: closed - Opened by CNXDZS about 2 months ago
- 1 comment
Labels: feedback
#701 - HTML_TAG_MAPPING error during scrape
Issue -
State: closed - Opened by beefyandbeef about 2 months ago
- 2 comments
Labels: bug
#700 - prepare v1.12.2
Pull Request -
State: closed - Opened by adbar 2 months ago
- 1 comment
#699 - update docs
Pull Request -
State: closed - Opened by adbar 2 months ago
- 1 comment
#698 - Docs: add page explaining how to run tests
Issue -
State: open - Opened by adbar 2 months ago
Labels: documentation
#697 - Downloads: add support to switch between proxies
Issue -
State: open - Opened by adbar 2 months ago
Labels: enhancement
#696 - Empty Results When Using Spider Function with Category URL
Issue -
State: open - Opened by felipehertzer 2 months ago
- 5 comments
Labels: question
#695 - Link on the quickstart page to the overview notebook is broken
Issue -
State: closed - Opened by cdfuller 2 months ago
- 1 comment
Labels: documentation
#694 - metadata: review and lint code
Pull Request -
State: closed - Opened by adbar 2 months ago
- 1 comment
#693 - ImportError: lxml.html.clean module is now a separate project
Issue -
State: closed - Opened by regstuff 2 months ago
- 2 comments
Labels: feedback
#692 - Javascript port of all 35 files
Pull Request -
State: closed - Opened by vtempest 2 months ago
- 1 comment
#691 - maintenance: make compression libraries optional
Pull Request -
State: closed - Opened by adbar 2 months ago
- 1 comment
#690 - Add max_sitemaps parameter to sitemap_search
Pull Request -
State: closed - Opened by felipehertzer 2 months ago
- 2 comments
#689 - build(deps): bump the dependencies group with 4 updates
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies
#688 - Javascript Version has landed. 🚀
Issue -
State: closed - Opened by vtempest 3 months ago
- 3 comments
Labels: question
#687 - spider: relax strict parameter for link extraction
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#685 - extraction fix: ValueError in table spans
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#684 - Added prune xpath to spider
Pull Request -
State: closed - Opened by felipehertzer 3 months ago
- 9 comments
#682 - Add SOCKS Proxy support
Pull Request -
State: closed - Opened by gremid 3 months ago
- 8 comments
#681 - ValueError in xml
Issue -
State: closed - Opened by Honesty-of-the-Cavernous-Tissue 3 months ago
- 3 comments
Labels: bug
#680 - Crawler doesn't extract any links from Google Cloud documentation website
Issue -
State: closed - Opened by Guthman 3 months ago
- 6 comments
Labels: bug
#679 - prepare version 1.12.1
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#678 - Fixed incorrect variable passed to extract_metadata
Pull Request -
State: closed - Opened by jpigla 3 months ago
- 2 comments
#677 - CLI: review code, add types and tests
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#676 - Remove deprecations (mostly CLI)
Issue -
State: closed - Opened by adbar 3 months ago
Labels: maintenance
#675 - crawler: add params class
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#674 - maintenance: simplify link discovery
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#673 - spider: restrict search to site section targeted by input URL
Pull Request -
State: closed - Opened by adbar 3 months ago
- 1 comment
#672 - spider: restrict search to given URL pattern
Issue -
State: closed - Opened by adbar 3 months ago
Labels: enhancement
#670 - trafilatura version > 1.10.0 doesnt fetch images
Issue -
State: closed - Opened by rkiacnhg 3 months ago
- 3 comments
#669 - build(deps): bump the dependencies group with 2 updates
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
- 1 comment
Labels: dependencies
#668 - robust element deletion: fix AttributeError
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#667 - AttributeError in prune_unwanted_sections
Issue -
State: closed - Opened by Honesty-of-the-Cavernous-Tissue 4 months ago
- 3 comments
Labels: bug
#666 - How can I set the proxy IP port and userAgent to avoid the web anti-crawler mechanism?
Issue -
State: closed - Opened by coderwpf 4 months ago
- 2 comments
#665 - table fix: maximum number of header columns
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#664 - prepare v1.12.0
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#663 - feat(cli/lib): Add tqdm based progress bar as an option
Issue -
State: open - Opened by chitralverma 4 months ago
- 1 comment
Labels: enhancement
#662 - Bug or feature, I'm not sure!
Issue -
State: closed - Opened by szj2ys 4 months ago
- 1 comment
Labels: duplicate
#661 - Investigate spacing in element tails
Issue -
State: open - Opened by adbar 4 months ago
- 3 comments
Labels: question
#660 - Faulty extraction for very short documents
Issue -
State: open - Opened by Psynbiotik 4 months ago
- 4 comments
Labels: enhancement
#659 - Duplicating sections, removing spaces between words, simple example
Issue -
State: closed - Opened by nthomas-whistic 4 months ago
#658 - table fix: MemoryError & ValueError during conversion to text
Pull Request -
State: closed - Opened by adbar 4 months ago
- 3 comments
#657 - MemoryError in table conversion
Issue -
State: closed - Opened by Honesty-of-the-Cavernous-Tissue 4 months ago
- 2 comments
Labels: bug
#656 - formatting & markdown fix: add newlines
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#655 - XML-TEI: replace RelaxNG by DTD, remove pickle, and update
Pull Request -
State: closed - Opened by adbar 4 months ago
#654 - images fix: use a length threshold on src attribute
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#653 - extraction: review link and structure checks
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#652 - extraction: improve justext fallback
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#651 - Extraction with `include_images=True` takes too much time
Issue -
State: closed - Opened by Honesty-of-the-Cavernous-Tissue 4 months ago
- 3 comments
Labels: bug
#650 - Add magic_html to benchmarks
Issue -
State: closed - Opened by dantetemplar 4 months ago
- 2 comments
Labels: evaluation
#649 - CLI fix: markdown format should trigger include_formatting
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#648 - CLI: Trigger formatting parameter when the output is in Markdown format
Issue -
State: closed - Opened by adbar 4 months ago
Labels: bug
#647 - output formats: enforce fixed list, deprecate -out on the CLI
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#646 - precision fix: do not use baseline as backup extraction
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#645 - review XPaths for undesirable content
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#644 - Validate value of `output_format` in `extract()` and `bare_extraction()`
Issue -
State: closed - Opened by adbar 4 months ago
Labels: enhancement
#643 - baseline fix: prevent LXML error in JSON-LD
Pull Request -
State: closed - Opened by adbar 4 months ago
- 1 comment
#642 - Missing h1 heading if <header> outside of <article>
Issue -
State: open - Opened by chrisgoddard 4 months ago
- 2 comments
Labels: question
#641 - Impossible to extract Ryan Reynolds website
Issue -
State: closed - Opened by Philrobots 4 months ago
- 1 comment
Labels: feedback
#640 - AttributeError in baseline extraction of JSON text
Issue -
State: closed - Opened by Honesty-of-the-Cavernous-Tissue 4 months ago
Labels: bug