Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / GeneralNewsExtractor/GeneralNewsExtractor issues and pull requests

#132 - ignore

Issue - State: closed - Opened by adsecuritycenter 30 days ago
Labels: bug

#131 - 腾x讯门户无法识别

Issue - State: open - Opened by 34892002 4 months ago
Labels: bug

#128 - 怎么知道图片处于提取的文本的什么位置?

Issue - State: open - Opened by blackworlder about 1 year ago - 1 comment

#126 - update version info and setup.py file

Pull Request - State: closed - Opened by jindaxiang almost 2 years ago - 2 comments

#125 - Bump lxml from 4.6.3 to 4.7.1

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago
Labels: dependencies

#124 - 标题抽取因为 | 字符少了一截

Issue - State: open - Opened by pgshow over 2 years ago
Labels: bug

#122 - 提取新闻列表页最后一页,只有一条,提取失败

Issue - State: open - Opened by DestroyLee over 2 years ago - 1 comment
Labels: bug

#121 - 提取时,中文逗号,全变成英文逗号了

Issue - State: closed - Opened by billgetjoy over 2 years ago - 3 comments
Labels: bug

#120 - 错失了最好的 content

Issue - State: open - Opened by zlj-zz almost 3 years ago - 1 comment
Labels: bug

#119 - ListPageExtractor出错

Issue - State: open - Opened by perpetually almost 3 years ago - 1 comment

#118 - 拼接相对地址时,检测<base>标签

Issue - State: open - Opened by kingname almost 3 years ago
Labels: enhancement

#117 - 在0.2.6版本上iter_node的出现的bug

Issue - State: closed - Opened by breeef almost 3 years ago - 1 comment
Labels: bug

#116 - Bump lxml from 4.6.2 to 4.6.3

Pull Request - State: closed - Opened by dependabot[bot] about 3 years ago - 1 comment
Labels: dependencies

#115 - Bump pyyaml from 5.3.1 to 5.4

Pull Request - State: closed - Opened by dependabot[bot] about 3 years ago - 1 comment
Labels: dependencies

#114 - 某政府网站新闻页面,过滤不到正文

Issue - State: closed - Opened by Shawn-fung over 3 years ago - 1 comment
Labels: bug

#113 - 提取列表的时候出现错误

Issue - State: open - Opened by legend-zl over 3 years ago

#112 - 爬取交易所新闻失败

Issue - State: closed - Opened by nickhuangxinyu over 3 years ago - 1 comment
Labels: bug

#111 - Bump lxml from 4.5.2 to 4.6.2

Pull Request - State: closed - Opened by dependabot[bot] over 3 years ago
Labels: dependencies

#110 - fix-html-py

Pull Request - State: open - Opened by Sunxiaoni over 3 years ago

#109 - 有的页面只能抽取到第一段

Issue - State: closed - Opened by shanliangLS over 3 years ago - 3 comments
Labels: bug

#108 - 对于网易新闻&腾讯新闻的正文提取错误

Issue - State: closed - Opened by changyt-1997 almost 4 years ago - 6 comments
Labels: bug

#107 - fix htag and title lcm bug

Pull Request - State: closed - Opened by kingname almost 4 years ago

#106 - 有的标题只能获取到一个字

Issue - State: closed - Opened by shanliangLS almost 4 years ago - 3 comments
Labels: bug

#105 - 微信公众号的文章 图片无法提取

Issue - State: open - Opened by liangbaika almost 4 years ago - 3 comments
Labels: bug

#104 - 使用extract方法进行内容抽取的时候如果加了对body内容的xpath配置就报错

Issue - State: open - Opened by tranzwalle almost 4 years ago - 3 comments
Labels: bug

#103 - update change log and setup.py

Pull Request - State: closed - Opened by kingname almost 4 years ago

#102 - 修复标题中日期分割线导致提取失败的问题

Pull Request - State: closed - Opened by kingname almost 4 years ago

#101 - 个别标题无法解析

Issue - State: closed - Opened by liangbaika almost 4 years ago - 3 comments
Labels: bug

#100 - 作者匹配,作者的名字是数字开头的话匹配为空

Issue - State: open - Opened by ymj4023 almost 4 years ago
Labels: bug

#99 - 对源代码的字符进行规范化

Pull Request - State: closed - Opened by kingname almost 4 years ago

#96 - 提取的正文值异常, 有正文外的大量字符

Issue - State: closed - Opened by shishiwie about 4 years ago - 1 comment
Labels: bug

#95 - 网页正文返回异常

Issue - State: closed - Opened by shishiwie about 4 years ago - 1 comment
Labels: bug

#94 - GNE正文返回结果异常, 没有正文,返回超链接

Issue - State: closed - Opened by shishiwie about 4 years ago - 2 comments
Labels: bug

#93 - 移除 footer 标签

Pull Request - State: closed - Opened by kingname about 4 years ago

#92 - 从0.1.8版本更新到0.2.3版本后,新闻正文提取出错

Issue - State: closed - Opened by 46319943 about 4 years ago - 3 comments
Labels: bug

#91 - 正文指定 body_xpath 后无法提取

Issue - State: closed - Opened by JerryChenn07 about 4 years ago - 6 comments
Labels: bug

#90 - Bad Case

Issue - State: open - Opened by kingname about 4 years ago
Labels: bug

#89 - 自动解析列表页

Issue - State: open - Opened by kingname about 4 years ago
Labels: enhancement

#88 - useless_attr 现在必须完全匹配才会移除节点

Pull Request - State: closed - Opened by kingname about 4 years ago

#87 - 更新文档

Pull Request - State: closed - Opened by kingname over 4 years ago

#86 - 自动提取新闻列表页

Pull Request - State: closed - Opened by kingname over 4 years ago

#85 - 微信公众号很不准,并且可否保留正文标签?

Issue - State: open - Opened by lonycc over 4 years ago
Labels: bug

#83 - 大幅度提升提取速度

Pull Request - State: closed - Opened by kingname over 4 years ago

#82 - 修复文档

Pull Request - State: closed - Opened by kingname over 4 years ago

#81 - 更新版本号

Pull Request - State: closed - Opened by kingname over 4 years ago

#80 - 20200606

Pull Request - State: closed - Opened by kingname over 4 years ago

#79 - 百家号文章较大概率抽取正文失败

Issue - State: open - Opened by jiangchao123 over 4 years ago - 3 comments
Labels: bug

#78 - 另外一种实现可以参考

Issue - State: open - Opened by baby5 over 4 years ago

#77 - 关于你提到的这篇论文

Issue - State: closed - Opened by chrislinan over 4 years ago - 3 comments
Labels: bug

#76 - 正文的第一张图片可能无法提取

Issue - State: open - Opened by kingname over 4 years ago - 1 comment
Labels: bug

#75 - feature: 来源中需要知道根新闻机构名称

Issue - State: open - Opened by dawei101 over 4 years ago - 2 comments
Labels: bug

#74 - 使用Pytest为GNE开发单元测试用例

Issue - State: open - Opened by kingname over 4 years ago - 1 comment
Labels: help wanted

#73 - 标题抽取逻辑应当调整

Issue - State: closed - Opened by asyncins over 4 years ago - 3 comments
Labels: bug

#72 - 提取Title的新思路

Issue - State: closed - Opened by kingname over 4 years ago - 2 comments
Labels: enhancement

#71 - ⚙️修复由于预处理导致用户自定义 XPath 无效的问题

Pull Request - State: closed - Opened by kingname over 4 years ago

#70 - 关于标准差的作用calc_standard_deviation

Issue - State: closed - Opened by hwg119 over 4 years ago - 1 comment

#69 - update online test page

Pull Request - State: closed - Opened by kingname over 4 years ago

#68 - 🔨添加更多测试案例

Pull Request - State: closed - Opened by kingname over 4 years ago

#67 - 提取图片链接 可以保存在原来位置吗

Issue - State: closed - Opened by JQ-K almost 5 years ago - 2 comments
Labels: bug

#66 - ✅从 TODO 里面删除一条已经完成的内容

Pull Request - State: closed - Opened by kingname almost 5 years ago

#65 - 📅从 HTML 的 meta 信息里面尝试提取文章的发布时间

Pull Request - State: closed - Opened by kingname almost 5 years ago

#64 - 更新文档

Pull Request - State: closed - Opened by kingname almost 5 years ago

#63 - 允许定向抓取作者与发布时间

Pull Request - State: closed - Opened by kingname almost 5 years ago

#62 - 适配微信公众号

Issue - State: closed - Opened by kingname almost 5 years ago
Labels: enhancement

#61 - gne提取36kr新闻无法获取title

Issue - State: closed - Opened by rottengeek almost 5 years ago - 4 comments
Labels: bug

#60 - 提取正文异常

Issue - State: closed - Opened by python-D almost 5 years ago - 1 comment
Labels: bug

#59 - 时间格式的支持

Issue - State: closed - Opened by rottengeek almost 5 years ago - 1 comment
Labels: bug

#58 - 百家号文章正文提取不到

Issue - State: closed - Opened by dedegs almost 5 years ago - 1 comment
Labels: bug

#57 - [Bug report] 澎湃新闻提取只有一小段

Issue - State: closed - Opened by TTTPOB almost 5 years ago - 2 comments
Labels: bug

#56 - Develop

Pull Request - State: closed - Opened by kingname almost 5 years ago

#55 - 添加百家号和新华网的测试样例

Pull Request - State: closed - Opened by kingname almost 5 years ago

#54 - 通过传入 host 参数把提取的图片 URL 拼接为绝对路径

Pull Request - State: closed - Opened by kingname almost 5 years ago

#53 - 关于正文中的图片提取为全路径

Issue - State: closed - Opened by gyco almost 5 years ago - 1 comment
Labels: enhancement

#52 - 有些页面图片不能识别

Issue - State: closed - Opened by ego008 almost 5 years ago - 1 comment

#51 - Merge pull request #1 from kingname/master

Pull Request - State: closed - Opened by MICHAELZYY almost 5 years ago - 1 comment

#50 - 抽取新华网正文不正确

Issue - State: closed - Opened by JingqiongWang almost 5 years ago - 1 comment

#49 - 为GNE写一个效果测试页面

Issue - State: closed - Opened by kingname almost 5 years ago - 2 comments
Labels: enhancement

#48 - 外文正文准确度

Issue - State: open - Opened by Monkey-Y about 5 years ago - 1 comment
Labels: enhancement

#47 - 修复 凤凰网无法获取图片的问题

Pull Request - State: closed - Opened by kingname about 5 years ago

#46 - update readme

Pull Request - State: closed - Opened by kingname about 5 years ago

#45 - recover images

Pull Request - State: closed - Opened by kingname about 5 years ago

#44 - 更新 readme

Pull Request - State: closed - Opened by kingname about 5 years ago

#43 - 更新版本号与 readme

Pull Request - State: closed - Opened by kingname about 5 years ago

#42 - 更新USELESS_ATTR

Pull Request - State: closed - Opened by kingname about 5 years ago

#41 - remove empty p tag

Pull Request - State: closed - Opened by kingname about 5 years ago

#39 - content提取内容重复

Issue - State: closed - Opened by Mr-iFan about 5 years ago - 5 comments

#37 - 修改为后序遍历

Pull Request - State: closed - Opened by phantomSuying about 5 years ago - 1 comment

#36 - 下面这个地址 content 内容位置不对

Issue - State: closed - Opened by debugksir about 5 years ago - 2 comments

#35 - 部分新浪新闻的正文提取不正确

Issue - State: closed - Opened by Rockyzsu about 5 years ago - 1 comment

#34 - 多个作者用 、连接无法提取

Issue - State: closed - Opened by zhu733756 about 5 years ago - 4 comments
Labels: enhancement

#33 - 使用动态规划优化提高计算性能

Issue - State: open - Opened by kingname about 5 years ago - 1 comment
Labels: enhancement

#31 - [讨论] 对于MD类文章的支持

Issue - State: open - Opened by x956606865 about 5 years ago - 3 comments

#29 - 百家号提取content字段不正确

Issue - State: closed - Opened by siryang2006 about 5 years ago - 2 comments
Labels: bug

#28 - RuntimeWarning: invalid value encountered in log node_info['sbdi'])

Issue - State: open - Opened by siryang2006 about 5 years ago - 1 comment
Labels: invalid