Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / mediacloud/metadata-lib issues and pull requests
#87 - MC metadata extraction investigation
Issue -
State: closed - Opened by pgulley 2 months ago
#86 - Assess tweaks to content extraction to remove headlines at end of article
Issue -
State: open - Opened by rahulbot 5 months ago
- 2 comments
Labels: enhancement
#85 - Update htmldate requirement from ==1.7.* to >=1.7,<1.9
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#84 - Update trafilatura requirement from <1.7,>=1.4 to >=1.4,<1.9
Pull Request -
State: closed - Opened by dependabot[bot] 6 months ago
- 1 comment
Labels: dependencies
#83 - Further tweaking of User-Agent string?
Issue -
State: closed - Opened by philbudne 7 months ago
- 3 comments
Labels: question
#82 - central storage for User-Agent to use across MC projects
Pull Request -
State: closed - Opened by rahulbot 8 months ago
- 1 comment
#81 - store MC user-agent for use by our other libraries
Issue -
State: closed - Opened by rahulbot 8 months ago
Labels: enhancement
#80 - Not capturing full article text
Issue -
State: closed - Opened by jaypinho 8 months ago
- 1 comment
Labels: wontfix
#79 - Update trafilatura requirement from <1.7,>=1.4 to >=1.4,<1.8
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies
#78 - Get automated release working
Issue -
State: closed - Opened by rahulbot 8 months ago
- 2 comments
#77 - ignore ports & handle IP domains in `normalize_url`
Pull Request -
State: closed - Opened by rahulbot 8 months ago
#76 - Update requirements
Pull Request -
State: closed - Opened by rahulbot 8 months ago
- 1 comment
#75 - Update htmldate requirement from ==1.6.* to >=1.6,<1.8
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies
#74 - Fix title parsing failure (due to empty or whitespace title tag)
Pull Request -
State: closed - Opened by rahulbot 9 months ago
- 1 comment
#73 - mcmetadata.extract throwing AttributeErrors
Issue -
State: closed - Opened by philbudne 9 months ago
- 3 comments
Labels: bug
#72 - possible url normalization issues
Issue -
State: open - Opened by philbudne 9 months ago
- 1 comment
Labels: bug, question
#71 - Update static test fixtures
Pull Request -
State: closed - Opened by rahulbot 10 months ago
#70 - centralize url unique hash generation with helper method in this package
Pull Request -
State: closed - Opened by rahulbot 10 months ago
- 1 comment
#69 - improve CI test run reliabiility by using cached fixtures?
Issue -
State: closed - Opened by rahulbot 10 months ago
Labels: enhancement, question
#68 - allow capturing stats from individual extract calls
Pull Request -
State: closed - Opened by rahulbot 10 months ago
Labels: enhancement
#67 - May want to remove story source related query parameters!
Issue -
State: closed - Opened by philbudne 10 months ago
- 1 comment
#66 - update requirements file to latest
Pull Request -
State: closed - Opened by rahulbot 10 months ago
#65 - Small tweaks to handle whitespace in URLs
Pull Request -
State: closed - Opened by rahulbot 10 months ago
#64 - Support defaults and overrides in `extract`
Pull Request -
State: closed - Opened by rahulbot 10 months ago
#63 - support passing in a fallback publication date
Issue -
State: closed - Opened by rahulbot 10 months ago
- 2 comments
Labels: enhancement
#62 - Update htmldate requirement from ==1.5.* to >=1.5,<1.7
Pull Request -
State: closed - Opened by dependabot[bot] 10 months ago
- 2 comments
Labels: dependencies
#61 - Discuss possible enhancements to mcmetadata.extract
Issue -
State: closed - Opened by philbudne 10 months ago
- 2 comments
Labels: enhancement
#60 - Update dateparser requirement from ==1.1.* to >=1.1,<1.3
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 2 comments
Labels: dependencies
#59 - Update tldextract requirement from ==3.6.* to >=3.6,<5.2
Pull Request -
State: closed - Opened by dependabot[bot] 11 months ago
- 2 comments
Labels: dependencies
#58 - Handling of URL parse failure
Issue -
State: closed - Opened by philbudne 12 months ago
Labels: bug
#57 - Update tldextract requirement from ==3.6.* to >=3.6,<5.1
Pull Request -
State: closed - Opened by dependabot[bot] 12 months ago
- 1 comment
Labels: dependencies
#56 - Update tldextract requirement from ==3.4.* to >=3.4,<3.7
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies
#55 - Update tldextract requirement from ==3.4.* to >=3.4,<3.6
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies
#54 - Update htmldate requirement from ==1.4.* to >=1.4,<1.6
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies
#53 - Switched from cchardet to faust-chardet, as the former is unmantained…
Pull Request -
State: closed - Opened by pgulley over 1 year ago
#52 - mcmetadata not type checked by mypy
Issue -
State: closed - Opened by philbudne over 1 year ago
- 2 comments
#51 - Update trafilatura requirement from ==1.4.* to >=1.4,<1.7
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies
#50 - update to latest version of trafilatura
Issue -
State: closed - Opened by rahulbot over 1 year ago
- 1 comment
Labels: enhancement, dependencies
#49 - Update trafilatura requirement from ==1.4.* to >=1.4,<1.6
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#48 - Update beautifulsoup4 requirement from ==4.11.* to >=4.11,<4.13
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
Labels: dependencies
#47 - fix bugs from PT integration
Pull Request -
State: closed - Opened by rahulbot over 1 year ago
#46 - addressing no nk error
Pull Request -
State: closed - Opened by pgulley over 1 year ago
- 1 comment
#45 - Crash because uri.query.params['nk'] can be None
Issue -
State: closed - Opened by vbanos over 1 year ago
- 2 comments
Labels: bug
#44 - Feature feed normalization
Pull Request -
State: closed - Opened by rahulbot almost 2 years ago
#43 - Add feed_url.py
Pull Request -
State: closed - Opened by philbudne almost 2 years ago
#42 - handle IP addresses better
Issue -
State: closed - Opened by rahulbot almost 2 years ago
- 1 comment
Labels: bug
#41 - Add a a check to avoid TypeError
Issue -
State: closed - Opened by vbanos almost 2 years ago
- 1 comment
#40 - Update htmldate requirement from ==1.3.* to >=1.3,<1.5
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependencies
#39 - Update trafilatura requirement from ==1.3.* to >=1.3,<1.5
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependencies
#38 - Update tldextract requirement from ==3.3.* to >=3.3,<3.5
Pull Request -
State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependencies
#37 - assess fasttext for language guessing speedup
Issue -
State: closed - Opened by rahulbot almost 2 years ago
- 1 comment
Labels: enhancement
#36 - upgrade dependencies
Issue -
State: closed - Opened by rahulbot almost 2 years ago
- 3 comments
Labels: enhancement
#35 - Fallback extractor
Pull Request -
State: closed - Opened by pgulley about 2 years ago
#34 - handle empty content with no-encoding from HTML
Issue -
State: closed - Opened by rahulbot about 2 years ago
Labels: bug
#33 - Unexpected AttributeError on extract
Issue -
State: closed - Opened by vbanos about 2 years ago
- 1 comment
Labels: bug
#32 - Improvement regarding content decoding/encoding
Issue -
State: open - Opened by vbanos about 2 years ago
Labels: enhancement
#31 - Bug in extract method
Issue -
State: closed - Opened by vbanos about 2 years ago
- 1 comment
Labels: bug
#30 - Use latest htmldate and pass datetime max_date instead of string
Pull Request -
State: closed - Opened by vbanos about 2 years ago
#29 - add in top image and other metadata
Pull Request -
State: closed - Opened by rahulbot about 2 years ago
- 2 comments
#28 - More efficient parameterized unit tests
Pull Request -
State: closed - Opened by vbanos about 2 years ago
- 1 comment
#27 - optimization on tag removal in readability-lxml extraction fallback
Issue -
State: closed - Opened by rahulbot about 2 years ago
Labels: enhancement
#26 - improve trafilatura defaults
Issue -
State: closed - Opened by rahulbot about 2 years ago
Labels: enhancement
#25 - create larger test set to compare results to main system data
Issue -
State: closed - Opened by rahulbot about 2 years ago
- 1 comment
Labels: enhancement
#24 - don't lowercase YouTube URLs for uniqueness hashing
Issue -
State: closed - Opened by rahulbot about 2 years ago
Labels: bug
#23 - limit dates in future?
Issue -
State: closed - Opened by rahulbot about 2 years ago
- 2 comments
Labels: enhancement
#22 - Masking very frequent date parsing exceptions
Issue -
State: closed - Opened by vbanos over 2 years ago
- 1 comment
#21 - Unhandled exception we got in production
Issue -
State: closed - Opened by vbanos over 2 years ago
- 4 comments
Labels: bug
#20 - centralize dependencies in one place
Issue -
State: closed - Opened by rahulbot over 2 years ago
#19 - You could also compile these regex in this method.
Issue -
State: closed - Opened by vbanos over 2 years ago
#18 - Use set instead of list for improved performance
Issue -
State: closed - Opened by vbanos over 2 years ago
#17 - You could compile this regex for better performance
Issue -
State: closed - Opened by vbanos over 2 years ago
#16 - Use Beautifulsoup4 with lxml parser for faster performance
Issue -
State: closed - Opened by vbanos over 2 years ago
#15 - Add cchardet dependency to speedup BeautifulSoup4
Issue -
State: closed - Opened by vbanos over 2 years ago
#14 - investigate URLs failing extraction
Issue -
State: closed - Opened by rahulbot over 2 years ago
- 2 comments
Labels: wontfix
#13 - justify content extractor priorities with data and testing
Issue -
State: closed - Opened by rahulbot over 2 years ago
- 3 comments
#12 - Feature quick improvements
Pull Request -
State: closed - Opened by rahulbot over 2 years ago
#11 - Stats for the success / failure of each extractor
Issue -
State: closed - Opened by vbanos over 2 years ago
#10 - Improve exception handling
Issue -
State: closed - Opened by vbanos over 2 years ago
- 1 comment
#9 - Compile regular expressions to improve performance
Issue -
State: closed - Opened by vbanos over 2 years ago
#8 - rename core branch from master to main
Issue -
State: closed - Opened by rahulbot over 2 years ago
- 1 comment
#7 - Prep for release to PyPi
Issue -
State: closed - Opened by rahulbot over 2 years ago
- 2 comments
#6 - Extract authors information when possible
Issue -
State: closed - Opened by ibnesayeed over 2 years ago
- 3 comments
Labels: enhancement
#5 - Building and installing cld2-cffi is failing
Issue -
State: closed - Opened by ibnesayeed over 2 years ago
- 2 comments
#4 - Extracting original domain from archived pages
Issue -
State: closed - Opened by ibnesayeed over 2 years ago
- 1 comment
#3 - Exception on non-news article pages
Issue -
State: closed - Opened by ibnesayeed over 2 years ago
#2 - switch language detection for now
Pull Request -
State: closed - Opened by rahulbot over 2 years ago
#1 - first pass at quickly integrating existing code
Pull Request -
State: closed - Opened by rahulbot over 2 years ago
- 1 comment