Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / adbar/trafilatura issues and pull requests
#639 - Can I get an extracted element's CSS selector?
Issue -
State: closed - Opened by theabhinavdas 4 months ago
- 2 comments
Labels: question
#638 - Update crawls.rst typo: `known` is an unexpected argument
Pull Request -
State: closed - Opened by tommytyc 5 months ago
- 1 comment
#637 - build(deps): bump the dependencies group with 2 updates
Pull Request -
State: closed - Opened by dependabot[bot] 5 months ago
- 1 comment
Labels: dependencies
#636 - links/urls are not apprearing using extract
Issue -
State: closed - Opened by alroythalus 5 months ago
- 1 comment
Labels: feedback
#635 - fix: avoid faulty readability_lxml content
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#634 - some extraction duplicated in xml
Issue -
State: open - Opened by fortyfourforty 5 months ago
- 3 comments
Labels: question
#633 - Account for empty cells in table extraction (xml)
Issue -
State: open - Opened by fortyfourforty 5 months ago
- 3 comments
Labels: enhancement
#632 - weird xml extraction
Issue -
State: closed - Opened by fortyfourforty 5 months ago
- 2 comments
Labels: bug
#631 - prepare v1.11.0
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#630 - Deprecate Python 3.6 & 3.7
Issue -
State: closed - Opened by adbar 5 months ago
Labels: maintenance
#629 - Deprecate GUI in its current form (Gooey)
Issue -
State: closed - Opened by adbar 5 months ago
Labels: maintenance
#628 - extraction: simplify XML handling
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#627 - Sometimes, html tags remain on the string
Issue -
State: closed - Opened by masylum 5 months ago
- 2 comments
Labels: feedback
#626 - Error parsing non-English web pages
Issue -
State: closed - Opened by vodkaslime 5 months ago
- 2 comments
Labels: question
#625 - deduplication: shorter, more efficient code
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#624 - review spider code, add types and tests
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#623 - downloads: review code, tests, and add types
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#622 - Parts of article block are sometimes not being extracted
Issue -
State: closed - Opened by naktinis 5 months ago
- 3 comments
Labels: feedback
#621 - trafilatura.fetch_url Timeout is set but does not work
Issue -
State: closed - Opened by Storm0921 5 months ago
- 2 comments
Labels: question
#620 - metadata: simplify code and tests, add typing
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#619 - baseline: review extractor sequence, JSON parsing, and cleaning
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#618 - docs: update and extend
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#617 - Footer removal
Issue -
State: closed - Opened by hamsarajan 5 months ago
- 1 comment
Labels: bug
#616 - Image/Video caption and credits removal
Issue -
State: open - Opened by hamsarajan 5 months ago
- 3 comments
Labels: question, documentation
#615 - extraction: fix processing syntax and simplify code
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#614 - extraction: add HTML as output format
Pull Request -
State: closed - Opened by adbar 5 months ago
- 1 comment
#613 - use with_metadata argument as switch
Pull Request -
State: closed - Opened by adbar 6 months ago
- 2 comments
#611 - build(deps): bump the dependencies group with 5 updates
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#610 - It's set include_images=True, but there is no picture
Issue -
State: open - Opened by dark2star 6 months ago
- 5 comments
Labels: bug
#609 - Remove HTML doc pages from package and add instructions to build them
Issue -
State: closed - Opened by adbar 6 months ago
Labels: documentation, maintenance
#608 - prepare version 1.10.0
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#607 - CLI fix: read standard input as binary
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#606 - Evaluation adjusted
Pull Request -
State: closed - Opened by LydiaKoerber 6 months ago
- 1 comment
#605 - CLI fixes: file processing options, mtime, and tests
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#604 - New port of readability.js?
Issue -
State: open - Opened by zirkelc 6 months ago
- 4 comments
Labels: question
#603 - fix typos
Pull Request -
State: closed - Opened by RainRat 6 months ago
- 2 comments
#602 - Enhancement using LLM based approach
Issue -
State: closed - Opened by alroythalus 6 months ago
- 1 comment
#601 - Markdown table fixes
Pull Request -
State: closed - Opened by naktinis 6 months ago
- 8 comments
#600 - Unordered list markdown syntax is incorrect
Issue -
State: closed - Opened by naktinis 6 months ago
- 2 comments
#599 - Table markdown syntax incorrect in some cases
Issue -
State: closed - Opened by naktinis 6 months ago
- 2 comments
Labels: bug
#598 - fix: list spacing in TXT output
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#597 - <li> tag output in TXT
Issue -
State: closed - Opened by ethael 6 months ago
- 1 comment
Labels: bug
#596 - Add option to provide XPaths for content extraction
Issue -
State: open - Opened by klvbdmh 6 months ago
- 2 comments
Labels: enhancement
#595 - `utils.decode_file()`: add switch for full detection or GZip only
Issue -
State: open - Opened by adbar 6 months ago
Labels: enhancement
#594 - downloads: fix deflate and add optional zstd to accepted encodings
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#593 - setup: update justext and lxml dependencies
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#591 - simplify code: unique function for length tests
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#590 - spider fix: use internal download utilities for robots.txt
Pull Request -
State: closed - Opened by adbar 6 months ago
- 1 comment
#589 - focused_crawl returns nothing
Issue -
State: closed - Opened by bezir 6 months ago
- 6 comments
Labels: feedback
#588 - <main> Content gets missed out
Issue -
State: closed - Opened by alroythalus 6 months ago
- 1 comment
Labels: feedback
#587 - Port of is_probably_readerable from mozilla
Pull Request -
State: closed - Opened by zirkelc 6 months ago
- 15 comments
#586 - Extracting content from an URl is getting none
Issue -
State: open - Opened by Fabiha15 7 months ago
- 1 comment
Labels: question
#585 - Wrong links position in text from telegram post
Issue -
State: open - Opened by RedHotUnicorn 7 months ago
- 2 comments
Labels: question
#584 - Removing related links at end of article/sidebar on news websites?
Issue -
State: open - Opened by rahulbot 7 months ago
- 3 comments
Labels: bug
#583 - Simple content scoring prototype
Pull Request -
State: closed - Opened by zirkelc 7 months ago
- 8 comments
#582 - re-group classes and functions linked to deduplication
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#581 - breaking: raise errors on deprecated CLI and function arguments
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#580 - prepare version 1.9.0
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#579 - build(deps): bump the dependencies group with 4 updates
Pull Request -
State: closed - Opened by dependabot[bot] 7 months ago
- 1 comment
Labels: dependencies
#578 - docs: general update
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#577 - Update XML-TEI reference data
Issue -
State: closed - Opened by adbar 7 months ago
Labels: maintenance
#576 - Regroup deduplication functions in same submodule
Issue -
State: closed - Opened by adbar 7 months ago
Labels: documentation, maintenance
#575 - maintenance: reflect latest courlan changes
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#574 - tests: upgrade Python versions
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#573 - Extract text from buttons for semantic elements
Issue -
State: open - Opened by zirkelc 7 months ago
- 1 comment
Labels: question
#572 - Question: check if page is readable?
Issue -
State: closed - Opened by zirkelc 7 months ago
- 9 comments
Labels: question
#571 - extractor: improve recall preset
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#570 - fix download tests
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#569 - Content extraction failure on dozens of related sites
Issue -
State: closed - Opened by praveng 7 months ago
- 4 comments
Labels: bug
#568 - Content failed to be extracted
Issue -
State: closed - Opened by alroythalus 7 months ago
- 1 comment
#567 - metadata: add author XPaths
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#566 - No timeout in urllib.robotparser with focused_crawler
Issue -
State: closed - Opened by JER-CE 7 months ago
- 2 comments
Labels: bug
#565 - CLI & downloads: revamp options and make sure they are used
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#564 - docs: convert readme to markdown
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#563 - fix: table cell separators in non-XML output
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#562 - Markdown tables have incorrect format
Issue -
State: closed - Opened by zirkelc 7 months ago
- 1 comment
Labels: bug
#561 - metadata: add file creation date (date extraction, JSON & XML-TEI)
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#560 - Use `with_metadata` parameter to decide whether to run metadata extraction
Issue -
State: closed - Opened by adbar 7 months ago
Labels: enhancement
#559 - Why lzma for data compression?
Issue -
State: closed - Opened by Yomguithereal 7 months ago
- 6 comments
Labels: maintenance
#558 - Scraping websites which are protected by WAF
Issue -
State: closed - Opened by thebigbone 7 months ago
- 7 comments
Labels: question
#557 - Readme.md table is broken.
Issue -
State: closed - Opened by AnishPimpley 7 months ago
- 1 comment
Labels: bug
#556 - restructure code
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#555 - strikethrough text is returned as normal
Issue -
State: closed - Opened by snarb 7 months ago
- 1 comment
Labels: question
#554 - fix: raise error if config file does not exist
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#553 - Preserve horizontal space in code blocks
Issue -
State: open - Opened by mittsommer 7 months ago
- 3 comments
Labels: enhancement
#552 - add global options object for extraction and use it in CLI
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#551 - Scraping directly from wayback machine (newbie question)
Issue -
State: closed - Opened by scaramouche88 7 months ago
- 6 comments
Labels: question
#550 - add markdown as explicit output
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#549 - maintenance: deprecate `process_record()`
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#548 - fix: better encoding detection
Pull Request -
State: closed - Opened by adbar 7 months ago
- 1 comment
#547 - speedup for readability-lxml
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#546 - Refactor and improve readability-lxml syntax
Issue -
State: closed - Opened by adbar 8 months ago
Labels: enhancement
#545 - Fixed Extraction when Meta tag has an empty content
Pull Request -
State: closed - Opened by felipehertzer 8 months ago
- 4 comments
#544 - Respect no_fallback
Pull Request -
State: closed - Opened by co-odw 8 months ago
- 3 comments
#543 - refactoring: simplify code
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#542 - eval: review code, add guidelines and small benchmark
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#541 - Wrong encoding detected: gb2312
Issue -
State: closed - Opened by s-jse 8 months ago
- 3 comments
Labels: bug
#540 - Fixed bug with @data-testid and removed some classes
Pull Request -
State: closed - Opened by felipehertzer 8 months ago
- 14 comments
#539 - prepare version 1.8.1
Pull Request -
State: closed - Opened by adbar 8 months ago
- 1 comment
#538 - Make cascade of different content extractors explicit and configurable
Issue -
State: open - Opened by adbar 8 months ago
Labels: enhancement