Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / adbar/trafilatura issues and pull requests
#435 - Improve SEO by adding sitemap to sphinx docs
Pull Request -
State: closed - Opened by tonyyanga about 1 year ago
- 3 comments
#434 - add htmldate extensive search to config
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#433 - trafilatura fails extracting
Issue -
State: closed - Opened by tejeshbhalla about 1 year ago
- 7 comments
Labels: question
#432 - Entire/majority content of these 2 sites being missed out
Issue -
State: open - Opened by alroythalus about 1 year ago
- 4 comments
Labels: enhancement
#431 - List items are being missed
Issue -
State: open - Opened by alroythalus about 1 year ago
- 9 comments
Labels: bug
#430 - Parts are getting missed out after using extract funtion
Issue -
State: open - Opened by alroythalus about 1 year ago
- 1 comment
Labels: enhancement
#429 - preserve space in certain elements
Pull Request -
State: closed - Opened by idoshamun about 1 year ago
- 20 comments
#428 - docs: update and extend
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#427 - crawl only sub-pages from an arbitrary URL?
Issue -
State: closed - Opened by pchalasani about 1 year ago
- 3 comments
Labels: question
#426 - Unable to extract text from a given site, TypeError: unhashable type: 'set'
Issue -
State: closed - Opened by noobistz about 1 year ago
- 5 comments
#425 - Test on Python 3.12 production release
Pull Request -
State: closed - Opened by cclauss about 1 year ago
- 6 comments
#424 - build(deps): bump trafilatura from 1.5.0 to 1.6.2
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 2 comments
Labels: dependencies
#423 - Inconsistent behavior on macOS
Issue -
State: closed - Opened by p-linnane about 1 year ago
- 3 comments
Labels: feedback
#422 - Multiple spaces within a text element are not supported
Issue -
State: closed - Opened by idoshamun about 1 year ago
- 5 comments
Labels: enhancement
#420 - Error when multiproessing
Issue -
State: closed - Opened by fortyfourforty about 1 year ago
- 1 comment
#419 - docs: fix quickstart
Pull Request -
State: closed - Opened by sashkab about 1 year ago
- 1 comment
#417 - build(deps): bump resiliparse from 0.14.3 to 0.14.5
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 2 comments
Labels: dependencies
#416 - build(deps): bump news-please from 1.5.22 to 1.5.35
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 2 comments
Labels: dependencies
#415 - prepare v1.6.2
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#414 - added possibility to prune xPaths
Pull Request -
State: closed - Opened by HeLehm about 1 year ago
- 5 comments
#413 - Error installing trafilatura on playwright focal image
Issue -
State: open - Opened by jaekunchoi about 1 year ago
- 1 comment
Labels: question
#412 - Consider switching from lxml's clean_html for enhanced security (and possibly performance)
Issue -
State: closed - Opened by frenzymadness about 1 year ago
- 1 comment
Labels: enhancement
#411 - include_links breaks the extraction for https://news.ycombinator.com
Issue -
State: open - Opened by shivanker about 1 year ago
- 2 comments
Labels: bug
#410 - Returns horribly bad result for MSN page
Issue -
State: open - Opened by TheRabidWolverine about 1 year ago
- 1 comment
Labels: bug
#409 - Installation problem on Mac due to charset version mismatch
Issue -
State: closed - Opened by TheRabidWolverine about 1 year ago
- 1 comment
#408 - maintenance: simplify code
Pull Request -
State: closed - Opened by adbar about 1 year ago
- 1 comment
#407 - docs: fix typo in usage-python.rst
Pull Request -
State: closed - Opened by eltociear about 1 year ago
- 1 comment
#406 - Some language tidy-ups
Pull Request -
State: closed - Opened by marksmayo over 1 year ago
- 3 comments
#405 - Web API idea
Issue -
State: closed - Opened by clach04 over 1 year ago
- 4 comments
Labels: enhancement
#404 - Corrupted Markdown output when TXT+formatting
Issue -
State: closed - Opened by clach04 over 1 year ago
- 2 comments
Labels: bug
#403 - Use of Signal Prevents Multithreading on Linux
Issue -
State: closed - Opened by simplexx over 1 year ago
- 1 comment
#402 - Question about the title
Issue -
State: open - Opened by pieterhartel over 1 year ago
- 5 comments
Labels: question
#401 - improve code support
Pull Request -
State: closed - Opened by idoshamun over 1 year ago
- 20 comments
#400 - Empty h1 blocks non-empty h2
Issue -
State: open - Opened by pieterhartel over 1 year ago
- 1 comment
Labels: bug
#399 - build(deps): bump lxml from 4.9.2 to 4.9.3
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 2 comments
Labels: dependencies
#398 - build(deps): bump goose3 from 3.1.13 to 3.1.17
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 2 comments
Labels: dependencies
#397 - author metadata field is null for YouTube videos
Issue -
State: closed - Opened by basilioss over 1 year ago
- 2 comments
Labels: enhancement
#396 - `included_images` failed when trying to extract images in a table
Issue -
State: open - Opened by ChangyaoTian over 1 year ago
- 7 comments
Labels: bug
#395 - Redirecting https://twitter.com
Issue -
State: closed - Opened by proteusbr1 over 1 year ago
- 1 comment
Labels: question
#393 - fix: pinned LXML version for MacOS
Pull Request -
State: closed - Opened by adbar over 1 year ago
#392 - add checks to probing mode
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#391 - Is it possible to get the metadata with markdown format?
Issue -
State: closed - Opened by charleshan over 1 year ago
- 1 comment
Labels: enhancement
#390 - Code tags are not parsed properly
Issue -
State: closed - Opened by charleshan over 1 year ago
- 3 comments
Labels: question
#389 - courlan changes: adapt parameter and tests
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#388 - Image markdown not included during processing
Issue -
State: open - Opened by kianwilcox over 1 year ago
- 5 comments
Labels: bug
#387 - build(deps): bump goose3 from 3.1.13 to 3.1.16
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 2 comments
Labels: dependencies
#386 - build(deps): bump trafilatura from 1.5.0 to 1.6.1
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 2 comments
Labels: dependencies
#385 - Code example for Multi-Threaded downloads seems out of date
Issue -
State: closed - Opened by github-mickael-leclerc over 1 year ago
- 4 comments
Labels: documentation
#384 - remove signal from core and use on CLI only
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#383 - Fix JSON-LD list on sitename
Pull Request -
State: closed - Opened by felipehertzer over 1 year ago
- 2 comments
#382 - Check URLs passed to courlan functions `extract_links` and `fix_relative_urls`
Issue -
State: open - Opened by adbar over 1 year ago
- 1 comment
Labels: question
#381 - CLI: more robust processing with chunks
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#380 - setup: fix and update CI workflows
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#379 - Doesn't seem to work with recent charset-normalizer
Issue -
State: closed - Opened by Stevod over 1 year ago
- 2 comments
Labels: feedback
#378 - CLI: add option to probe for extractable content, more robust downloads and html2txt
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#377 - Convert relative URLs in links to absolute by default
Pull Request -
State: closed - Opened by feltcat over 1 year ago
- 7 comments
#376 - Option to convert relative links
Issue -
State: closed - Opened by feltcat over 1 year ago
- 1 comment
Labels: enhancement
#375 - XMLSyntaxError during conversion to XML output
Issue -
State: closed - Opened by fortyfourforty over 1 year ago
- 2 comments
Labels: bug
#374 - metadata extraction problem
Issue -
State: closed - Opened by fortyfourforty over 1 year ago
- 8 comments
#372 - improve code block support
Pull Request -
State: closed - Opened by idoshamun over 1 year ago
- 4 comments
#371 - prepare version 1.6.1
Pull Request -
State: closed - Opened by adbar over 1 year ago
#370 - more efficient HTML parsing code
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 4 comments
#369 - Function to use part of the heuristics on bare HTML fragments
Issue -
State: open - Opened by adbar over 1 year ago
Labels: enhancement
#368 - Improving JSON tests
Pull Request -
State: closed - Opened by felipehertzer over 1 year ago
- 2 comments
#367 - Gooey dependency seems unmaintained and broken
Issue -
State: open - Opened by tkapias over 1 year ago
- 1 comment
Labels: wontfix
#366 - More robust backup parsing
Issue -
State: closed - Opened by adbar over 1 year ago
- 1 comment
Labels: enhancement
#365 - metadata fixes: authors, JSON parser, Unicode
Pull Request -
State: closed - Opened by felipehertzer over 1 year ago
- 7 comments
#364 - docs roundup
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#363 - [#362] Fix metadata extraction w/o 'additionalName' field
Pull Request -
State: closed - Opened by awwitecki over 1 year ago
- 3 comments
#362 - Unable to extract metadata w/o authors `additionalName`
Issue -
State: closed - Opened by awwitecki over 1 year ago
#361 - build(deps): bump trafilatura from 1.5.0 to 1.6.0
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#360 - adopt latest courlan changes
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#359 - fix spider init
Pull Request -
State: closed - Opened by adbar over 1 year ago
#358 - Restrictions on Web Crawling
Issue -
State: closed - Opened by conceptofmind over 1 year ago
- 2 comments
Labels: bug
#357 - extraction: bypass for tables in figures (#301)
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#356 - minor extraction fixes
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#355 - [1.6.0] New Content Hashes
Issue -
State: closed - Opened by felipehertzer over 1 year ago
- 2 comments
Labels: documentation
#354 - Cannot extract Heading tags
Issue -
State: closed - Opened by fortyfourforty over 1 year ago
- 26 comments
Labels: bug
#353 - fix: relax constrains on spider tests
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#352 - simplify code for JSON metadata extraction
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#351 - Codeblock Markdown formatting is missing
Issue -
State: closed - Opened by niksite over 1 year ago
- 3 comments
Labels: enhancement
#350 - sitemaps: use class and simplify code structure
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#348 - prepare v1.6.0
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#347 - review logging levels
Pull Request -
State: closed - Opened by adbar over 1 year ago
#346 - Unnecessary Comments LOG INFO
Issue -
State: closed - Opened by andremacola over 1 year ago
- 1 comment
Labels: enhancement
#345 - upgrade dependencies to allow for urllib3 v2
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#344 - build(deps): bump beautifulsoup4 from 4.12.1 to 4.12.2
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#343 - build(deps): update urllib3 requirement from <2,>=1.26 to >=1.26,<3
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#342 - build(deps): bump goose3 from 3.1.13 to 3.1.14
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#341 - build(deps): bump news-please from 1.5.22 to 1.5.33
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#340 - settings: upper bound on links examined
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#339 - review url blacklisting
Pull Request -
State: closed - Opened by adbar over 1 year ago
#338 - CLI: more efficient downloads
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#337 - Simple HTML processing issue?
Issue -
State: closed - Opened by alroythalus over 1 year ago
- 4 comments
Labels: question
#336 - feeds & sitemaps: check domain similarity
Pull Request -
State: closed - Opened by adbar over 1 year ago
- 1 comment
#335 - Doesn't detect bullet points within tables
Issue -
State: closed - Opened by alroythalus over 1 year ago
- 5 comments
Labels: enhancement
#334 - Paras get broken up into fragments
Issue -
State: closed - Opened by alroythalus over 1 year ago
- 3 comments
#333 - Headers with classes dont get detected
Issue -
State: closed - Opened by alroythalus over 1 year ago
- 1 comment
#332 - feat: use proxy to extract data
Pull Request -
State: closed - Opened by andremacola over 1 year ago
- 7 comments
Labels: feedback
#331 - Update core.py
Pull Request -
State: closed - Opened by Korben00 over 1 year ago
- 3 comments
Labels: feedback