Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / adbar/trafilatura issues and pull requests
#537 - Downloads: Add ZStandard as optional Accept-Encoding header
Issue -
State: closed - Opened by adbar 8 months ago
Labels: enhancement
#536 - Wrong order of tail and children
Pull Request -
State: closed - Opened by knit-bee 8 months ago
- 1 comment
#535 - fix lxml/justext issue
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#534 - Fixed lists inside tables when include_tables=True
Pull Request -
State: closed - Opened by mikhainin 8 months ago
- 16 comments
#533 - Bump the dependencies group with 8 updates
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#532 - LXML 5.2.0 breaks import
Issue -
State: closed - Opened by marban 8 months ago
- 14 comments
Labels: bug
#531 - List element inside a table is lost
Issue -
State: open - Opened by mikhainin 8 months ago
- 5 comments
Labels: bug
#530 - XPaths: improve accuracy for major news outlets
Pull Request -
State: closed - Opened by adbar 8 months ago
- 4 comments
#529 - Link proportion heuristic fails for link paragraph
Issue -
State: closed - Opened by adbar 8 months ago
Labels: bug
#528 - fix formatting by correcting order of element generation, space handling
Pull Request -
State: closed - Opened by dlwh 8 months ago
- 10 comments
#527 - prepare version 1.8.0
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#526 - change license to Apache 2.0
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#525 - Bump the dependencies group with 8 updates
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#524 - CI/CD: update workflows
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#523 - Doesn't extract links in table
Issue -
State: open - Opened by obeone 8 months ago
- 1 comment
Labels: bug
#522 - CLI fixes: parallel cores and processing
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#521 - perf: avoid to load metadata causing perf leak
Pull Request -
State: closed - Opened by Spasfonx 8 months ago
- 1 comment
#520 - Remove incorrect text from the body
Pull Request -
State: closed - Opened by felipehertzer 8 months ago
- 3 comments
#519 - PDF as output format?
Issue -
State: closed - Opened by adbar 9 months ago
Labels: feedback
#518 - Link section missed at bottom of page
Issue -
State: open - Opened by adbar 9 months ago
- 3 comments
Labels: bug
#517 - build(deps): bump html2text from 2020.1.16 to 2024.2.26
Pull Request -
State: closed - Opened by dependabot[bot] 9 months ago
- 2 comments
Labels: dependencies
#514 - Trafilatura to support more robust async library than standard request
Issue -
State: closed - Opened by krstp 9 months ago
- 4 comments
Labels: question
#513 - fix: lowercase headers in response object
Pull Request -
State: closed - Opened by adbar 9 months ago
- 1 comment
#512 - Change of license? GPLv3+ → Apache 2.0
Issue -
State: closed - Opened by adbar 9 months ago
- 25 comments
Labels: maintenance
#511 - Include links and Include formatting do not work together properly
Issue -
State: open - Opened by ibestvina 9 months ago
- 5 comments
Labels: bug
#510 - OVERALL_DISCARD_XPATH not discarding in some cases
Issue -
State: open - Opened by felipehertzer 9 months ago
- 1 comment
Labels: question
#509 - Fixed unwanted content in some websites
Pull Request -
State: closed - Opened by felipehertzer 9 months ago
- 9 comments
#508 - update docs
Pull Request -
State: closed - Opened by adbar 9 months ago
- 1 comment
#507 - feeds: also use feedparser if available
Pull Request -
State: closed - Opened by adbar 9 months ago
- 2 comments
#506 - sitemaps: use safeguards
Pull Request -
State: closed - Opened by adbar 9 months ago
- 1 comment
#505 - Sitemaps: implement sleep and/or backoff strategy
Issue -
State: closed - Opened by adbar 9 months ago
Labels: enhancement
#504 - LXML: compile XPath expressions
Pull Request -
State: closed - Opened by adbar 9 months ago
- 1 comment
#503 - simplify and improve sitemap init
Pull Request -
State: closed - Opened by adbar 9 months ago
- 1 comment
#502 - For all the articles from the source https://ognnews.com/ the extracted title is not right.
Issue -
State: closed - Opened by rithvikshetty 9 months ago
#501 - is_live_page: use pycurl and urllib3 when available
Pull Request -
State: closed - Opened by adbar 9 months ago
#500 - Regroup functions dedicated to output conversion
Issue -
State: closed - Opened by adbar 9 months ago
- 1 comment
Labels: enhancement
#499 - hack: add symbol to preserve vertical spacing
Pull Request -
State: closed - Opened by adbar 10 months ago
- 2 comments
#498 - add code formatting in TXT/Markdown output
Pull Request -
State: closed - Opened by adbar 10 months ago
- 4 comments
#497 - Response class: add convenience functions
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#496 - improve and standardize CSV output
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#495 - build(deps): bump goose3 from 3.1.13 to 3.1.19
Pull Request -
State: closed - Opened by dependabot[bot] 10 months ago
- 2 comments
Labels: dependencies
#494 - build(deps): bump beautifulsoup4 from 4.12.1 to 4.12.3
Pull Request -
State: closed - Opened by dependabot[bot] 10 months ago
- 2 comments
Labels: dependencies
#493 - build(deps): bump trafilatura from 1.5.0 to 1.7.0
Pull Request -
State: closed - Opened by dependabot[bot] 10 months ago
- 2 comments
Labels: dependencies
#492 - build(deps): bump inscriptis from 2.3.2 to 2.4.0.1
Pull Request -
State: closed - Opened by dependabot[bot] 10 months ago
- 2 comments
Labels: dependencies
#491 - scrap lxml.html.Cleaner
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#490 - Add download/processing date to metadata
Issue -
State: closed - Opened by adbar 10 months ago
- 1 comment
Labels: enhancement
#489 - Make markdown an explicit output format
Issue -
State: closed - Opened by adbar 10 months ago
- 1 comment
Labels: enhancement
#488 - Extract more text
Issue -
State: open - Opened by vulinh48936 10 months ago
- 6 comments
Labels: bug
#487 - Merge multiple nodes returned by XPath
Pull Request -
State: closed - Opened by hugoobauer 10 months ago
- 8 comments
#486 - update dependencies and prepare v1.7.0
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#485 - apply HTML fix and accept LXML v5+
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#484 - Handle changed behaviour of `lxml` `addnext` method
Pull Request -
State: closed - Opened by knit-bee 10 months ago
- 2 comments
#483 - better html2txt extraction
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#482 - CLI: raise an error if `--config-file` doesn't exist
Issue -
State: closed - Opened by adbar 10 months ago
Labels: enhancement
#481 - update docs
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#480 - Deprecate functions and arguments
Issue -
State: closed - Opened by adbar 10 months ago
Labels: documentation
#479 - add advanced fetch_response() and amend fetch_url()
Pull Request -
State: closed - Opened by adbar 10 months ago
- 1 comment
#478 - save cookies on redirect
Issue -
State: open - Opened by zeliboba7 10 months ago
- 1 comment
Labels: enhancement
#477 - Update LXML to version 5.1+
Issue -
State: closed - Opened by adbar 10 months ago
- 1 comment
Labels: dependencies
#476 - include_links option mixes texts and links
Issue -
State: open - Opened by hugoobauer 10 months ago
- 6 comments
Labels: bug
#475 - License
Issue -
State: closed - Opened by fakerybakery 10 months ago
- 7 comments
Labels: question
#474 - fetch_url('spiegel.de/....') returns None
Issue -
State: closed - Opened by robertour 10 months ago
- 5 comments
Labels: question
#473 - Add support for Netscape cookies file format
Issue -
State: open - Opened by adbar 10 months ago
Labels: enhancement
#472 - Add HTML output option
Issue -
State: closed - Opened by adbar 10 months ago
- 1 comment
Labels: enhancement
#470 - Standardize CSV output
Issue -
State: closed - Opened by adbar 10 months ago
Labels: enhancement
#469 - Add correct image links for Pypi
Issue -
State: closed - Opened by adbar 10 months ago
Labels: documentation
#467 - TXT output doesn't produce markdown-compliant paragraphs
Issue -
State: closed - Opened by claudehenchoz 10 months ago
- 1 comment
Labels: enhancement
#466 - Configure pre-commit for this repository and update documentation
Issue -
State: closed - Opened by adbar 11 months ago
- 1 comment
Labels: up for grabs, documentation
#465 - build(deps): bump goose3 from 3.1.13 to 3.1.18
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#465 - build(deps): bump goose3 from 3.1.13 to 3.1.18
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#464 - build(deps): bump news-please from 1.5.22 to 1.5.44
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#464 - build(deps): bump news-please from 1.5.22 to 1.5.44
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#463 - build(deps): bump lxml from 4.9.2 to 5.0.0
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#463 - build(deps): bump lxml from 4.9.2 to 5.0.0
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 1 comment
Labels: dependencies
#462 - Drop invalid XML element attributes
Pull Request -
State: closed - Opened by vbarbaresi 11 months ago
- 1 comment
#461 - introduce `MAX_REDIRECTS` config setting and fix urllib3 redirect handling
Pull Request -
State: closed - Opened by vbarbaresi 11 months ago
- 7 comments
#460 - fix setup, update htmldate and add tests
Pull Request -
State: closed - Opened by adbar 11 months ago
- 1 comment
#459 - Here is an interesting example... any tips?
Issue -
State: open - Opened by krstp 11 months ago
- 1 comment
Labels: question
#458 - fix: remove cyclic imports
Pull Request -
State: closed - Opened by adbar 11 months ago
- 1 comment
#457 - improve feed detection
Pull Request -
State: closed - Opened by adbar 11 months ago
- 1 comment
#456 - Enhancements to Documentation, Testing, and Configuration
Pull Request -
State: open - Opened by Maddesea 11 months ago
- 4 comments
#455 - fix tests: httpbun.org → .com
Pull Request -
State: closed - Opened by adbar 11 months ago
- 1 comment
#454 - Few issues with tests.
Issue -
State: open - Opened by majcl 11 months ago
- 1 comment
Labels: evaluation
#453 - i could not extract tables from trafilatura library
Issue -
State: closed - Opened by saki021989 11 months ago
- 1 comment
Labels: question
#452 - build(deps): bump trafilatura from 1.5.0 to 1.6.3
Pull Request -
State: closed - Opened by dependabot[bot] 12 months ago
- 2 comments
Labels: dependencies
#451 - build(deps): bump boilerpy3 from 1.0.6 to 1.0.7
Pull Request -
State: closed - Opened by dependabot[bot] 12 months ago
- 2 comments
Labels: dependencies
#450 - Cannot fetch url with more than 2 redirections
Issue -
State: closed - Opened by julienlambert42 12 months ago
- 1 comment
Labels: enhancement
#449 - Python 3.12 support for MacOS
Issue -
State: closed - Opened by efecan-circlelabs 12 months ago
- 2 comments
Labels: bug
#448 - prepare v1.6.3 (setup and docs)
Pull Request -
State: closed - Opened by adbar 12 months ago
- 1 comment
#447 - update docs: theme, text embeddings, used-by
Pull Request -
State: closed - Opened by tonyyanga 12 months ago
- 5 comments
#446 - Text extraction performance fix.
Issue -
State: open - Opened by majcl 12 months ago
- 1 comment
Labels: question
#444 - htmldate and courlan: update setup and tests
Pull Request -
State: closed - Opened by adbar almost 1 year ago
- 1 comment
#443 - feeds: review code
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#442 - XML Parsing breaks on valid HTML
Issue -
State: closed - Opened by Jufik about 1 year ago
- 6 comments
Labels: feedback
#441 - add config option: external URLs for feeds and sitemaps
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#440 - xml namespace support in sitemaps
Issue -
State: open - Opened by stdweird about 1 year ago
- 1 comment
#439 - Update tutorial-epsilla.rst
Pull Request -
State: closed - Opened by richard-epsilla about 1 year ago
- 1 comment
#438 - colors and error in running gui
Issue -
State: open - Opened by co2nunes about 1 year ago
- 2 comments
#437 - update docs
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#436 - add_raw_html_prop
Pull Request -
State: closed - Opened by HawkClaws about 1 year ago
- 2 comments