Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / adbar/trafilatura issues and pull requests

#537 - Downloads: Add ZStandard as optional Accept-Encoding header

Issue - State: closed - Opened by adbar 8 months ago
Labels: enhancement

#536 - Wrong order of tail and children

Pull Request - State: closed - Opened by knit-bee 8 months ago - 1 comment

#535 - fix lxml/justext issue

Pull Request - State: closed - Opened by adbar 8 months ago - 1 comment

#534 - Fixed lists inside tables when include_tables=True

Pull Request - State: closed - Opened by mikhainin 8 months ago - 16 comments

#533 - Bump the dependencies group with 8 updates

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies

#532 - LXML 5.2.0 breaks import

Issue - State: closed - Opened by marban 8 months ago - 14 comments
Labels: bug

#531 - List element inside a table is lost

Issue - State: open - Opened by mikhainin 8 months ago - 5 comments
Labels: bug

#530 - XPaths: improve accuracy for major news outlets

Pull Request - State: closed - Opened by adbar 8 months ago - 4 comments

#529 - Link proportion heuristic fails for link paragraph

Issue - State: closed - Opened by adbar 8 months ago
Labels: bug

#528 - fix formatting by correcting order of element generation, space handling

Pull Request - State: closed - Opened by dlwh 8 months ago - 10 comments

#527 - prepare version 1.8.0

Pull Request - State: closed - Opened by adbar 8 months ago - 1 comment

#526 - change license to Apache 2.0

Pull Request - State: closed - Opened by adbar 8 months ago - 1 comment

#525 - Bump the dependencies group with 8 updates

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies

#524 - CI/CD: update workflows

Pull Request - State: closed - Opened by adbar 8 months ago - 1 comment

#523 - Doesn't extract links in table

Issue - State: open - Opened by obeone 8 months ago - 1 comment
Labels: bug

#522 - CLI fixes: parallel cores and processing

Pull Request - State: closed - Opened by adbar 8 months ago - 1 comment

#521 - perf: avoid to load metadata causing perf leak

Pull Request - State: closed - Opened by Spasfonx 8 months ago - 1 comment

#520 - Remove incorrect text from the body

Pull Request - State: closed - Opened by felipehertzer 8 months ago - 3 comments

#519 - PDF as output format?

Issue - State: closed - Opened by adbar 9 months ago
Labels: feedback

#518 - Link section missed at bottom of page

Issue - State: open - Opened by adbar 9 months ago - 3 comments
Labels: bug

#517 - build(deps): bump html2text from 2020.1.16 to 2024.2.26

Pull Request - State: closed - Opened by dependabot[bot] 9 months ago - 2 comments
Labels: dependencies

#514 - Trafilatura to support more robust async library than standard request

Issue - State: closed - Opened by krstp 9 months ago - 4 comments
Labels: question

#513 - fix: lowercase headers in response object

Pull Request - State: closed - Opened by adbar 9 months ago - 1 comment

#512 - Change of license? GPLv3+ → Apache 2.0

Issue - State: closed - Opened by adbar 9 months ago - 25 comments
Labels: maintenance

#511 - Include links and Include formatting do not work together properly

Issue - State: open - Opened by ibestvina 9 months ago - 5 comments
Labels: bug

#510 - OVERALL_DISCARD_XPATH not discarding in some cases

Issue - State: open - Opened by felipehertzer 9 months ago - 1 comment
Labels: question

#509 - Fixed unwanted content in some websites

Pull Request - State: closed - Opened by felipehertzer 9 months ago - 9 comments

#508 - update docs

Pull Request - State: closed - Opened by adbar 9 months ago - 1 comment

#507 - feeds: also use feedparser if available

Pull Request - State: closed - Opened by adbar 9 months ago - 2 comments

#506 - sitemaps: use safeguards

Pull Request - State: closed - Opened by adbar 9 months ago - 1 comment

#505 - Sitemaps: implement sleep and/or backoff strategy

Issue - State: closed - Opened by adbar 9 months ago
Labels: enhancement

#504 - LXML: compile XPath expressions

Pull Request - State: closed - Opened by adbar 9 months ago - 1 comment

#503 - simplify and improve sitemap init

Pull Request - State: closed - Opened by adbar 9 months ago - 1 comment

#501 - is_live_page: use pycurl and urllib3 when available

Pull Request - State: closed - Opened by adbar 9 months ago

#500 - Regroup functions dedicated to output conversion

Issue - State: closed - Opened by adbar 9 months ago - 1 comment
Labels: enhancement

#499 - hack: add symbol to preserve vertical spacing

Pull Request - State: closed - Opened by adbar 10 months ago - 2 comments

#498 - add code formatting in TXT/Markdown output

Pull Request - State: closed - Opened by adbar 10 months ago - 4 comments

#497 - Response class: add convenience functions

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#496 - improve and standardize CSV output

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#495 - build(deps): bump goose3 from 3.1.13 to 3.1.19

Pull Request - State: closed - Opened by dependabot[bot] 10 months ago - 2 comments
Labels: dependencies

#494 - build(deps): bump beautifulsoup4 from 4.12.1 to 4.12.3

Pull Request - State: closed - Opened by dependabot[bot] 10 months ago - 2 comments
Labels: dependencies

#493 - build(deps): bump trafilatura from 1.5.0 to 1.7.0

Pull Request - State: closed - Opened by dependabot[bot] 10 months ago - 2 comments
Labels: dependencies

#492 - build(deps): bump inscriptis from 2.3.2 to 2.4.0.1

Pull Request - State: closed - Opened by dependabot[bot] 10 months ago - 2 comments
Labels: dependencies

#491 - scrap lxml.html.Cleaner

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#490 - Add download/processing date to metadata

Issue - State: closed - Opened by adbar 10 months ago - 1 comment
Labels: enhancement

#489 - Make markdown an explicit output format

Issue - State: closed - Opened by adbar 10 months ago - 1 comment
Labels: enhancement

#488 - Extract more text

Issue - State: open - Opened by vulinh48936 10 months ago - 6 comments
Labels: bug

#487 - Merge multiple nodes returned by XPath

Pull Request - State: closed - Opened by hugoobauer 10 months ago - 8 comments

#486 - update dependencies and prepare v1.7.0

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#485 - apply HTML fix and accept LXML v5+

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#484 - Handle changed behaviour of `lxml` `addnext` method

Pull Request - State: closed - Opened by knit-bee 10 months ago - 2 comments

#483 - better html2txt extraction

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#482 - CLI: raise an error if `--config-file` doesn't exist

Issue - State: closed - Opened by adbar 10 months ago
Labels: enhancement

#481 - update docs

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#480 - Deprecate functions and arguments

Issue - State: closed - Opened by adbar 10 months ago
Labels: documentation

#479 - add advanced fetch_response() and amend fetch_url()

Pull Request - State: closed - Opened by adbar 10 months ago - 1 comment

#478 - save cookies on redirect

Issue - State: open - Opened by zeliboba7 10 months ago - 1 comment
Labels: enhancement

#477 - Update LXML to version 5.1+

Issue - State: closed - Opened by adbar 10 months ago - 1 comment
Labels: dependencies

#476 - include_links option mixes texts and links

Issue - State: open - Opened by hugoobauer 10 months ago - 6 comments
Labels: bug

#475 - License

Issue - State: closed - Opened by fakerybakery 10 months ago - 7 comments
Labels: question

#474 - fetch_url('spiegel.de/....') returns None

Issue - State: closed - Opened by robertour 10 months ago - 5 comments
Labels: question

#473 - Add support for Netscape cookies file format

Issue - State: open - Opened by adbar 10 months ago
Labels: enhancement

#472 - Add HTML output option

Issue - State: closed - Opened by adbar 10 months ago - 1 comment
Labels: enhancement

#470 - Standardize CSV output

Issue - State: closed - Opened by adbar 10 months ago
Labels: enhancement

#469 - Add correct image links for Pypi

Issue - State: closed - Opened by adbar 10 months ago
Labels: documentation

#467 - TXT output doesn't produce markdown-compliant paragraphs

Issue - State: closed - Opened by claudehenchoz 10 months ago - 1 comment
Labels: enhancement

#466 - Configure pre-commit for this repository and update documentation

Issue - State: closed - Opened by adbar 11 months ago - 1 comment
Labels: up for grabs, documentation

#465 - build(deps): bump goose3 from 3.1.13 to 3.1.18

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#465 - build(deps): bump goose3 from 3.1.13 to 3.1.18

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#464 - build(deps): bump news-please from 1.5.22 to 1.5.44

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#464 - build(deps): bump news-please from 1.5.22 to 1.5.44

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#463 - build(deps): bump lxml from 4.9.2 to 5.0.0

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#463 - build(deps): bump lxml from 4.9.2 to 5.0.0

Pull Request - State: closed - Opened by dependabot[bot] 11 months ago - 1 comment
Labels: dependencies

#462 - Drop invalid XML element attributes

Pull Request - State: closed - Opened by vbarbaresi 11 months ago - 1 comment

#461 - introduce `MAX_REDIRECTS` config setting and fix urllib3 redirect handling

Pull Request - State: closed - Opened by vbarbaresi 11 months ago - 7 comments

#460 - fix setup, update htmldate and add tests

Pull Request - State: closed - Opened by adbar 11 months ago - 1 comment

#459 - Here is an interesting example... any tips?

Issue - State: open - Opened by krstp 11 months ago - 1 comment
Labels: question

#458 - fix: remove cyclic imports

Pull Request - State: closed - Opened by adbar 11 months ago - 1 comment

#457 - improve feed detection

Pull Request - State: closed - Opened by adbar 11 months ago - 1 comment

#456 - Enhancements to Documentation, Testing, and Configuration

Pull Request - State: open - Opened by Maddesea 11 months ago - 4 comments

#455 - fix tests: httpbun.org → .com

Pull Request - State: closed - Opened by adbar 11 months ago - 1 comment

#454 - Few issues with tests.

Issue - State: open - Opened by majcl 11 months ago - 1 comment
Labels: evaluation

#453 - i could not extract tables from trafilatura library

Issue - State: closed - Opened by saki021989 11 months ago - 1 comment
Labels: question

#452 - build(deps): bump trafilatura from 1.5.0 to 1.6.3

Pull Request - State: closed - Opened by dependabot[bot] 12 months ago - 2 comments
Labels: dependencies

#451 - build(deps): bump boilerpy3 from 1.0.6 to 1.0.7

Pull Request - State: closed - Opened by dependabot[bot] 12 months ago - 2 comments
Labels: dependencies

#450 - Cannot fetch url with more than 2 redirections

Issue - State: closed - Opened by julienlambert42 12 months ago - 1 comment
Labels: enhancement

#449 - Python 3.12 support for MacOS

Issue - State: closed - Opened by efecan-circlelabs 12 months ago - 2 comments
Labels: bug

#448 - prepare v1.6.3 (setup and docs)

Pull Request - State: closed - Opened by adbar 12 months ago - 1 comment

#447 - update docs: theme, text embeddings, used-by

Pull Request - State: closed - Opened by tonyyanga 12 months ago - 5 comments

#446 - Text extraction performance fix.

Issue - State: open - Opened by majcl 12 months ago - 1 comment
Labels: question

#444 - htmldate and courlan: update setup and tests

Pull Request - State: closed - Opened by adbar almost 1 year ago - 1 comment

#443 - feeds: review code

Pull Request - State: closed - Opened by adbar about 1 year ago - 1 comment

#442 - XML Parsing breaks on valid HTML

Issue - State: closed - Opened by Jufik about 1 year ago - 6 comments
Labels: feedback

#441 - add config option: external URLs for feeds and sitemaps

Pull Request - State: closed - Opened by adbar about 1 year ago - 1 comment

#440 - xml namespace support in sitemaps

Issue - State: open - Opened by stdweird about 1 year ago - 1 comment

#439 - Update tutorial-epsilla.rst

Pull Request - State: closed - Opened by richard-epsilla about 1 year ago - 1 comment

#438 - colors and error in running gui

Issue - State: open - Opened by co2nunes about 1 year ago - 2 comments

#437 - update docs

Pull Request - State: closed - Opened by adbar about 1 year ago - 1 comment

#436 - add_raw_html_prop

Pull Request - State: closed - Opened by HawkClaws about 1 year ago - 2 comments