Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / grangier/python-goose issues and pull requests
#289 - docs: Fix a few typos
Pull Request -
State: open - Opened by timgates42 about 3 years ago
#288 - Unable to execute the install script
Issue -
State: closed - Opened by sudo-behappy about 3 years ago
#287 - Unable to use goose with Python 3
Issue -
State: open - Opened by Ayokunle over 3 years ago
- 1 comment
#286 - Create new_file.md
Pull Request -
State: open - Opened by BrotherOrange about 5 years ago
#285 - Installation error
Issue -
State: open - Opened by pol690 over 5 years ago
- 2 comments
#284 - added support for HTTP and HTTPS proxy.
Pull Request -
State: open - Opened by soundofsettling almost 6 years ago
#283 - Add support for HTTP and HTTPS proxies
Issue -
State: open - Opened by soundofsettling almost 6 years ago
#282 - any paper or algorithm description about text extraction?
Issue -
State: open - Opened by whqwill about 6 years ago
#281 - no result return and waiting
Issue -
State: open - Opened by pigpeak about 6 years ago
- 3 comments
#280 - what's python's version
Issue -
State: open - Opened by charlotte-ling over 6 years ago
- 1 comment
#279 - remove use-mirror as it is depriciated
Pull Request -
State: closed - Opened by ravirnjn88 over 6 years ago
#278 - Goose is not extracting article whole text
Issue -
State: open - Opened by AgoloAhmedElhady over 6 years ago
#277 - lots of temporary files in /tmp/goose
Issue -
State: open - Opened by kingsaint almost 7 years ago
- 1 comment
#276 - Japanease functionality
Issue -
State: open - Opened by asafcombo almost 7 years ago
#275 - Not parsing following articles.
Issue -
State: open - Opened by thekgt almost 7 years ago
- 1 comment
#274 - Correct spelling mistakes.
Pull Request -
State: open - Opened by EdwardBetts almost 7 years ago
#273 - PLEASE SUBMIT ISSUES TO GOOSE3
Issue -
State: open - Opened by lababidi about 7 years ago
#272 - python-goose/goose/utils/encoding.py
Issue -
State: open - Opened by marcelotournier about 7 years ago
#271 - ImportError: dynamic module does not define init function (init_imaging)
Issue -
State: open - Opened by pratheepchowdhary over 7 years ago
#270 - Failed extraction from blogger post
Issue -
State: open - Opened by piccolbo over 7 years ago
- 12 comments
#269 - Fix #191: infinite recursion on some pages
Pull Request -
State: closed - Opened by androm3da over 7 years ago
- 1 comment
#268 - encoding error : input conversion failed due to input error, bytes 0xEC 0xD8 0xFD 0xFF
Issue -
State: open - Opened by brookxs over 7 years ago
#267 - Allow custom search tags
Pull Request -
State: closed - Opened by sproberts92 almost 8 years ago
- 1 comment
#266 - [extractors/title.py] None value for `site_name` in line 40
Issue -
State: open - Opened by kbandla almost 8 years ago
#265 - Not working with ABC News and The Hill articles
Issue -
State: closed - Opened by sarakhedr almost 8 years ago
#264 - ModuleNotFoundError: No module named 'urlparse'
Issue -
State: open - Opened by ghost almost 8 years ago
- 2 comments
#263 - li tags in html not extracted
Issue -
State: open - Opened by sparvind2000 about 8 years ago
- 2 comments
#262 - Problems Parsing Titles
Issue -
State: open - Opened by grantdelozier over 8 years ago
- 1 comment
#261 - Travis Bugfix: No More `--use-mirrors` Option
Pull Request -
State: open - Opened by mxamin over 8 years ago
#260 - Add Farsi (Persian) Language Support
Pull Request -
State: open - Opened by mxamin over 8 years ago
- 1 comment
#259 - #258 - Handle None from opengraph title
Pull Request -
State: closed - Opened by aniruddha-adhikary over 8 years ago
#258 - title from opengraph can return None
Issue -
State: open - Opened by aniruddha-adhikary over 8 years ago
- 1 comment
#257 - Update content.py
Pull Request -
State: open - Opened by abhigenie92 over 8 years ago
#256 - Link text is not included in cleaned text
Issue -
State: open - Opened by smilledge over 8 years ago
#255 - Install should simply be 'pip install goose-extractor'?
Issue -
State: open - Opened by andybak almost 9 years ago
- 2 comments
#254 - EXSLT link seems to have changed
Pull Request -
State: closed - Opened by andreis almost 9 years ago
#253 - extracting image from the content in my db
Issue -
State: open - Opened by lip365 almost 9 years ago
#252 - Goose fails in extracting articles from The New York Times
Issue -
State: closed - Opened by manalsali about 9 years ago
- 5 comments
#251 - Title of project says "scrapping" but it's "scraping"
Issue -
State: open - Opened by doda-zz about 9 years ago
#250 - og:image is not parsed correct if e.g. og:image:width exists on page
Issue -
State: open - Opened by vonholst about 9 years ago
- 1 comment
#249 - Bug: Infinite crawling recursion on some pages
Issue -
State: open - Opened by simonwjackson about 9 years ago
- 1 comment
#248 - Fixed unicode handling, Python 3 support, Request as network backend, better content root extraction and other awesome features
Pull Request -
State: open - Opened by Lol4t0 about 9 years ago
- 4 comments
#247 - Fallback to 'http' as default url schema if needed
Pull Request -
State: open - Opened by rastasheep about 9 years ago
#246 - Added Serbian stopwords
Pull Request -
State: open - Opened by rastasheep about 9 years ago
#245 - Goose is not working on extracting data from Kissmetrics blog which have some meta tags present.
Issue -
State: open - Opened by jijoy about 9 years ago
- 1 comment
#244 - Handle gzipped pages gracefully
Pull Request -
State: closed - Opened by daTokenizer over 9 years ago
#243 - fix(stopwords-id.txt): changed to Lucene stopwords
Pull Request -
State: open - Opened by luthfianto over 9 years ago
#242 - h1,h2...h6 not returned
Issue -
State: open - Opened by tamimibrahim over 9 years ago
- 1 comment
#241 - Incompatible library version: _imaging.so requires version 13.0.0 or later, but libjpeg.8.dylib provides version 12.0.0
Issue -
State: open - Opened by dbl001 over 9 years ago
- 1 comment
#240 - Non-obvious failure grabbing top_image
Issue -
State: open - Opened by Slater-Victoroff over 9 years ago
- 2 comments
#239 - Not working on some urls
Pull Request -
State: open - Opened by abhigenie92 over 9 years ago
#238 - HtmlFetcher does not handle gzip compression
Issue -
State: open - Opened by kqr over 9 years ago
- 2 comments
#237 - add gzip deflation to HtmlFetcher
Pull Request -
State: open - Opened by kqr over 9 years ago
- 1 comment
#236 - Forbes.com text extraction gives redundant date in some cases
Issue -
State: open - Opened by ethan-hunt-007 over 9 years ago
- 1 comment
#235 - Published_Date extraction
Issue -
State: open - Opened by kmehl over 9 years ago
#234 - Can't extract content from huffington post (?)
Issue -
State: open - Opened by jice-lavocat over 9 years ago
- 2 comments
#233 - Why can't Goose extract these Chinese articles?
Issue -
State: closed - Opened by motasay over 9 years ago
- 8 comments
#232 - cleaned_text doesn't work everytime for the same website
Issue -
State: closed - Opened by kmehl over 9 years ago
- 1 comment
#231 - Read article content using goose retrieving nothing
Issue -
State: open - Opened by abhigenie92 over 9 years ago
- 2 comments
#230 - IOError
Issue -
State: open - Opened by abhigenie92 over 9 years ago
#229 - NY Times doesn't work
Pull Request -
State: open - Opened by abhigenie92 over 9 years ago
- 1 comment
#228 - Hotfix for #219 - Missing real fix
Pull Request -
State: open - Opened by jice-lavocat over 9 years ago
#227 - Can not get the image from a Chinese page even the text
Issue -
State: open - Opened by SheldonWang3000 over 9 years ago
- 2 comments
#226 - Can not install on mac
Issue -
State: open - Opened by 1a1a11a over 9 years ago
- 3 comments
#225 - fixing new york times content extraction failure
Pull Request -
State: open - Opened by robmcdan over 9 years ago
- 1 comment
#224 - Goose fails on nytimes articles
Issue -
State: open - Opened by lsemel over 9 years ago
- 2 comments
#223 - Russian articles are not extracted
Issue -
State: open - Opened by szhem over 9 years ago
#222 - Turkish stopwords added
Pull Request -
State: open - Opened by ufukk over 9 years ago
#221 - top_node algorithm? (test case included)
Issue -
State: open - Opened by ThiemNguyen almost 10 years ago
- 2 comments
#220 - Add python 3 support
Pull Request -
State: open - Opened by vetal4444 almost 10 years ago
- 11 comments
#219 - Link without domain.
Issue -
State: open - Opened by warmspringwinds almost 10 years ago
- 2 comments
#218 - Dateline in articles
Issue -
State: open - Opened by cvelascorivera almost 10 years ago
#217 - Og site_name issue
Issue -
State: open - Opened by grangier almost 10 years ago
#216 - Getting a No Such File or Directory error
Issue -
State: open - Opened by lsemel almost 10 years ago
- 1 comment
#215 - Algorithm used in goose ?
Issue -
State: open - Opened by IndianShifu almost 10 years ago
- 2 comments
#214 - Type fix: Issue #204
Pull Request -
State: closed - Opened by amalfra almost 10 years ago
#213 - No Text Extracted for articles from domain http://www.clarin.com
Issue -
State: open - Opened by sathappanspm almost 10 years ago
- 1 comment
#212 - Clarification on how raw_html gets extracted
Issue -
State: open - Opened by konradkonrad almost 10 years ago
#211 - Indonesian stopwords file contains too many other words than stopwords
Issue -
State: open - Opened by luthfianto almost 10 years ago
#210 - Not getting any extracted text
Issue -
State: open - Opened by peterswang almost 10 years ago
- 1 comment
#209 - Tidy README.rst
Pull Request -
State: closed - Opened by StevenMaude almost 10 years ago
#208 - meta charset options support
Issue -
State: closed - Opened by kmmbvnr almost 10 years ago
- 1 comment
#207 - More efficient title extraction and infinite recursion bug fix
Pull Request -
State: open - Opened by slitayem almost 10 years ago
#206 - Maximum recursion depth exceeded
Issue -
State: open - Opened by slitayem almost 10 years ago
#205 - More efficient title extraction and bugs fix
Pull Request -
State: closed - Opened by slitayem almost 10 years ago
#204 - Spelling Error in documentation
Issue -
State: open - Opened by ghost almost 10 years ago
#203 - Fix bug with site_name=None
Pull Request -
State: closed - Opened by yprez almost 10 years ago
- 3 comments
#202 - provide a facility to get all text in a webpage
Issue -
State: open - Opened by aqp almost 10 years ago
#197 - Fix title extraction if title is same as site_name
Pull Request -
State: open - Opened by vetal4444 about 10 years ago
- 1 comment
#195 - Fix title cleaning
Pull Request -
State: closed - Opened by slitayem about 10 years ago
- 2 comments
#194 - Error in title extractor
Issue -
State: open - Opened by nargiza-sarkulova about 10 years ago
- 6 comments
#155 - Could not extract a Chinese html
Issue -
State: closed - Opened by yetuweiba about 10 years ago
- 3 comments
#148 - Goose is non-functional in Python 3
Issue -
State: open - Opened by fake-name over 10 years ago
- 13 comments
#138 - Timeout
Issue -
State: closed - Opened by harikt over 10 years ago
- 2 comments
#78 - WindowsError: [Error 32] The process cannot access the file because it is being used by another process
Issue -
State: closed - Opened by idf almost 11 years ago
- 17 comments
#64 - adding cookies support
Pull Request -
State: closed - Opened by tgallant about 11 years ago
- 2 comments
#60 - article.cleaned_text is null
Issue -
State: closed - Opened by harikt about 11 years ago
- 4 comments
#48 - Bad case for image extraction
Issue -
State: open - Opened by stephenLee over 11 years ago
- 1 comment
#36 - cannot identify image file
Issue -
State: closed - Opened by PriyeshV over 11 years ago
- 6 comments
#31 - Do we need to download images to file?
Issue -
State: open - Opened by atlithorn over 11 years ago
- 24 comments