Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / adbar/trafilatura issues and pull requests
#330 - Proxy support to Trafilatura
Issue -
State: closed - Opened by andremacola over 1 year ago
- 4 comments
Labels: enhancement
#329 - reflect changes in courlan library
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#328 - fix: html with no metadata image
Pull Request -
State: closed - Opened by andremacola over 1 year ago
- 1 comment
#327 - add is_live test using HTTP HEAD request
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#326 - sitemaps: more efficient processing
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#325 - ValueError: signal only works in main thread
Issue -
State: closed - Opened by pandemosth over 1 year ago
- 2 comments
Labels: bug
#324 - CLI fix: single URL provided with -u
Pull Request -
State: closed - Opened by adbar over 1 year ago
#323 - feat: add basic auth support
Issue -
State: closed - Opened by kondounagi over 1 year ago
- 4 comments
Labels: feedback
#322 - fetch_url doesn't return RawResponse and doesn't provide access to response code
Issue -
State: closed - Opened by edkrueger over 1 year ago
- 10 comments
Labels: enhancement, documentation
#321 - #320 - update deprecated code
Pull Request -
State: closed - Opened by sdondley over 1 year ago
- 3 comments
#320 - Getting warnings while testing extract() function with with pytest
Issue -
State: closed - Opened by sdondley over 1 year ago
#319 - 'lxml.etree._Element' object has no attribute 'text_content'
Issue -
State: closed - Opened by asjsrep over 1 year ago
- 17 comments
Labels: bug, documentation
#318 - Doesnt extract li tags content with an id
Issue -
State: closed - Opened by alroythalus over 1 year ago
- 5 comments
Labels: bug
#317 - prepare version 1.5.0
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#316 - setup: simplify CI and remove tests for Python 3.12-dev
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#315 - Fix handling of text content of <div> with empty <p> child
Pull Request -
State: closed - Opened by knit-bee over 1 year ago
- 1 comment
#314 - New content hashes and default file names
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#313 - probe_alternative_homepage no_ssl arg from fetch_url
Issue -
State: closed - Opened by hyshandler over 1 year ago
- 1 comment
Labels: question
#312 - spider: update setup and adjust
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#311 - build(deps): bump goose3 from 3.1.12 to 3.1.13
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#310 - feat: extract pagetype from og:type or ld+json
Pull Request -
State: closed - Opened by andremacola over 1 year ago
- 2 comments
#309 - Cannot extract heading correctly in a list
Issue -
State: closed - Opened by fortyfourforty over 1 year ago
- 5 comments
Labels: bug
#308 - What is the recommended approach to outputting a more readable article
Issue -
State: closed - Opened by rbhalla over 1 year ago
- 1 comment
Labels: question
#307 - Extract page type with og:type or jd+json
Issue -
State: closed - Opened by andremacola over 1 year ago
- 4 comments
Labels: enhancement
#306 - Add as_dict method to Document.
Pull Request -
State: closed - Opened by edkrueger over 1 year ago
- 2 comments
#305 - `cchardet` is recommended now or is replaced by `charset-normalizer` by default?
Issue -
State: closed - Opened by lord-alfred over 1 year ago
- 2 comments
Labels: feedback
#304 - Unable to extract full text from New Yorker article
Issue -
State: closed - Opened by CyberneticTurtle over 1 year ago
- 1 comment
Labels: duplicate
#303 - Prevent extract_metadata from failing then @type is an empty list.
Pull Request -
State: closed - Opened by edkrueger over 1 year ago
- 2 comments
#302 - Method of Sourcing Elements
Issue -
State: closed - Opened by jackHedaya almost 2 years ago
- 3 comments
Labels: feedback
#301 - Cannot extract table from wordpress gutenberg blocks
Issue -
State: closed - Opened by fortyfourforty almost 2 years ago
- 4 comments
Labels: bug
#300 - trafilatura as a server
Issue -
State: closed - Opened by lord-alfred almost 2 years ago
- 4 comments
Labels: duplicate, question
#299 - Empty list value for @type causes extract_metadata to fail
Issue -
State: closed - Opened by edkrueger almost 2 years ago
- 1 comment
Labels: bug
#298 - extract_metadata() doesn't return dict, but documentation says it does
Issue -
State: closed - Opened by edkrueger almost 2 years ago
- 3 comments
Labels: enhancement, documentation
#297 - Error on compare_extraction function when no fallback is False
Issue -
State: closed - Opened by felipehertzer almost 2 years ago
- 3 comments
Labels: question
#296 - Fixed bug on JSON metadata when ld+JSON is formatted wrong
Pull Request -
State: closed - Opened by felipehertzer almost 2 years ago
- 2 comments
#295 - Add new class to metadata title
Pull Request -
State: closed - Opened by felipehertzer almost 2 years ago
- 1 comment
#294 - Sourcery refactored master branch
Pull Request -
State: closed - Opened by sourcery-ai[bot] almost 2 years ago
- 1 comment
#293 - build(deps): bump trafilatura from 1.4.0 to 1.4.1
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
- 1 comment
Labels: dependencies
#292 - build(deps): bump beautifulsoup4 from 4.11.1 to 4.11.2
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
- 1 comment
Labels: dependencies
#291 - Option to remove unreachable pages and pages not strictly in the same domain
Issue -
State: closed - Opened by MTB-nsartor almost 2 years ago
- 3 comments
Labels: enhancement
#290 - Collected links as metadata field?
Issue -
State: open - Opened by Amaimersion almost 2 years ago
- 3 comments
Labels: enhancement
#289 - Fix XPath expression in subtree
Issue -
State: open - Opened by adbar almost 2 years ago
- 1 comment
Labels: maintenance
#288 - Can't get include_images to include any images
Issue -
State: closed - Opened by boxabirds almost 2 years ago
- 7 comments
Labels: question
#286 - build(deps): bump inscriptis from 2.3.1 to 2.3.2
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
- 3 comments
Labels: dependencies
#285 - Hostname parameter metadata should not return top level tld with subdomains
Issue -
State: closed - Opened by andremacola almost 2 years ago
#284 - Improve title extraction by removing sitename suffix
Issue -
State: closed - Opened by andremacola almost 2 years ago
- 6 comments
Labels: enhancement
#283 - Remove unwanted html elements with regex or xpaths
Issue -
State: closed - Opened by andremacola almost 2 years ago
- 8 comments
Labels: question
#282 - feat: Add image urls to metadata
Pull Request -
State: closed - Opened by andremacola almost 2 years ago
- 12 comments
#281 - Add image urls to metadata
Issue -
State: closed - Opened by andremacola almost 2 years ago
- 1 comment
Labels: enhancement
#280 - setup: use faust-cchardet from 3.10 onwards
Pull Request -
State: closed - Opened by adbar almost 2 years ago
#279 - fix setup (2)
Pull Request -
State: closed - Opened by adbar almost 2 years ago
#278 - setup: try to fix actions
Pull Request -
State: closed - Opened by adbar almost 2 years ago
#277 - Fix setup for oldest and newest Python versions
Pull Request -
State: closed - Opened by adbar almost 2 years ago
- 1 comment
#276 - improved cli and gui
Pull Request -
State: closed - Opened by wu-seong almost 2 years ago
- 1 comment
#275 - Fix for failing tests
Pull Request -
State: closed - Opened by knit-bee almost 2 years ago
#274 - TEI: Nesting of <ab> elements
Pull Request -
State: closed - Opened by knit-bee almost 2 years ago
#273 - Remove double tags in XML output
Pull Request -
State: closed - Opened by knit-bee almost 2 years ago
- 1 comment
#272 - Extraction of Youtube iframes and img elements with links
Issue -
State: open - Opened by sampathmende almost 2 years ago
- 3 comments
Labels: enhancement
#271 - PytzUsageWarning: localize method no longer necessary
Issue -
State: closed - Opened by rwinterschlaf almost 2 years ago
- 2 comments
Labels: question
#270 - 403 for URL for Amazon
Issue -
State: closed - Opened by mirfan899 almost 2 years ago
- 1 comment
Labels: question
#269 - Fixes to Emoji Regexp
Pull Request -
State: closed - Opened by felipehertzer about 2 years ago
- 1 comment
#268 - Html extraction
Issue -
State: closed - Opened by slavaGanzin about 2 years ago
- 3 comments
Labels: question
#267 - author regexes: review ranges (#266)
Pull Request -
State: closed - Opened by adbar about 2 years ago
- 1 comment
#266 - Fix code scanning alert - Overly permissive regular expression range
Issue -
State: closed - Opened by adbar about 2 years ago
- 1 comment
#263 - Endless reading for link / timeout possible?
Issue -
State: closed - Opened by Rapid1898-code about 2 years ago
- 7 comments
Labels: enhancement
#261 - CLI arguments inconsistent: --inputfile and --inputdir
Issue -
State: closed - Opened by adbar about 2 years ago
- 1 comment
Labels: good first issue, up for grabs
#259 - Added the possibility to prune custom path's
Pull Request -
State: closed - Opened by HeLehm about 2 years ago
- 2 comments
#254 - TEI conformity: improve divs and element tails, fw → ab
Pull Request -
State: closed - Opened by knit-bee about 2 years ago
- 7 comments
#253 - TEI: Handle invalid siblings of <div>
Pull Request -
State: closed - Opened by knit-bee about 2 years ago
- 4 comments
#232 - Defer URL management to courlan.UrlStore (experimental)
Pull Request -
State: closed - Opened by adbar over 2 years ago
- 4 comments
#231 - Trafilatura appears to ignore <meta charset="...">
Issue -
State: closed - Opened by zackw over 2 years ago
- 3 comments
Labels: question, feedback
#229 - Keep orderedness information of lists
Issue -
State: closed - Opened by DavidNemeskey over 2 years ago
- 4 comments
Labels: feedback
#225 - Add argument to use archive.org as a backup in fetch_url()
Issue -
State: closed - Opened by vprelovac over 2 years ago
- 12 comments
Labels: enhancement, documentation
#224 - Add document language to metadata
Issue -
State: open - Opened by adbar over 2 years ago
- 6 comments
Labels: enhancement
#216 - Memory leaks
Issue -
State: closed - Opened by kinoute over 2 years ago
- 6 comments
Labels: bug
#215 - Question: JSON for Linking Data
Issue -
State: closed - Opened by Lucabenj over 2 years ago
- 6 comments
Labels: question, feedback
#202 - Celery error with v1.2.1: ValueError: signal only works in main thread
Issue -
State: closed - Opened by alex-bender over 2 years ago
- 17 comments
Labels: feedback
#197 - Added Coinbase article annotation
Pull Request -
State: closed - Opened by swetepete over 2 years ago
- 3 comments
#195 - Extend test coverage for json_metadata functions
Issue -
State: closed - Opened by adbar over 2 years ago
- 5 comments
Labels: feedback
#175 - Add include_video parameter (iframe elements are missing)
Issue -
State: open - Opened by fraseInc over 2 years ago
- 9 comments
Labels: enhancement
#166 - Issue with LXML on M1 / Apple arm64 platforms
Issue -
State: closed - Opened by naftalibeder almost 3 years ago
- 9 comments
Labels: bug, wontfix, documentation
#158 - xml extraction leads to <graphic> tags in the wrong place.
Issue -
State: closed - Opened by joschu almost 3 years ago
- 5 comments
Labels: bug
#151 - CLI: run as server
Issue -
State: closed - Opened by adbar almost 3 years ago
- 4 comments
Labels: enhancement
#148 - Interaction with internet archives (API and formats)
Issue -
State: closed - Opened by adbar almost 3 years ago
- 2 comments
Labels: enhancement
#147 - anchor issue
Issue -
State: closed - Opened by pieterhartel almost 3 years ago
- 5 comments
Labels: bug, wontfix
#122 - CLI: improve usability for large number of downloads
Issue -
State: closed - Opened by adbar about 3 years ago
Labels: enhancement
#116 - Investigate accuracy on Polish and Russian websites?
Issue -
State: closed - Opened by adbar about 3 years ago
- 1 comment
Labels: question
#113 - Unexpected lack of whitespace before/after ref tags in XML output
Issue -
State: closed - Opened by adri1wald about 3 years ago
- 4 comments
Labels: bug
#105 - Extract content from formats other than HTML: PDF, EPUB?
Issue -
State: closed - Opened by adbar over 3 years ago
- 9 comments
Labels: enhancement, feedback
#99 - Parse JSON-LD information and write heuristics to decide where to draw info from
Issue -
State: closed - Opened by adbar over 3 years ago
- 3 comments
Labels: enhancement
#89 - Graphic tag with no src attribute in XML result
Issue -
State: closed - Opened by phongtnit over 3 years ago
- 2 comments
Labels: bug
#85 - Are there any settings that allow us to make sure that the full article is scraped inspead of just the initial part of it?
Issue -
State: closed - Opened by armsp over 3 years ago
- 8 comments
Labels: bug, wontfix
#80 - Teaser with link in article flow
Issue -
State: closed - Opened by adbar over 3 years ago
- 1 comment
Labels: enhancement
#57 - Is there a way to extract a top image from an article?
Issue -
State: closed - Opened by ArturasDruteika over 3 years ago
- 12 comments
Labels: bug
#53 - Bypass catchas/cookies/consent windows?
Issue -
State: closed - Opened by adbar almost 4 years ago
- 3 comments
Labels: feedback
#37 - Investigate potential speed-up with customized readability-lxml
Issue -
State: closed - Opened by adbar almost 4 years ago
- 1 comment
Labels: enhancement
#4 - List of smaller extraction bugs (text & metadata)
Issue -
State: open - Opened by adbar almost 5 years ago
- 30 comments
Labels: good first issue, up for grabs
#3 - Thoroughly implement and test duplicate detection
Issue -
State: closed - Opened by adbar almost 5 years ago
- 2 comments
Labels: enhancement