Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / GeneralNewsExtractor/GeneralNewsExtractor issues and pull requests
#132 - ignore
Issue -
State: closed - Opened by adsecuritycenter 30 days ago
Labels: bug
#131 - 腾x讯门户无法识别
Issue -
State: open - Opened by 34892002 4 months ago
Labels: bug
#129 - 抽取 https://www.gov.cn/gongbao/content/2007/content_711039.htm 的正文,结果不符合预期
Issue -
State: closed - Opened by linchen059 8 months ago
- 8 comments
Labels: bug
#128 - 怎么知道图片处于提取的文本的什么位置?
Issue -
State: open - Opened by blackworlder about 1 year ago
- 1 comment
#127 - 在线体验网页 提取body_html效果比代码要好 https://www.avira.com/en/blog/email-account-hacked
Issue -
State: open - Opened by zhixiuyisheng about 1 year ago
Labels: bug
#126 - update version info and setup.py file
Pull Request -
State: closed - Opened by jindaxiang almost 2 years ago
- 2 comments
#125 - Bump lxml from 4.6.3 to 4.7.1
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
Labels: dependencies
#124 - 标题抽取因为 | 字符少了一截
Issue -
State: open - Opened by pgshow over 2 years ago
Labels: bug
#123 - 您好~请问如何计算节点与正文的距离,进一步筛选得到最优的日期
Issue -
State: open - Opened by wzf9 over 2 years ago
- 2 comments
#122 - 提取新闻列表页最后一页,只有一条,提取失败
Issue -
State: open - Opened by DestroyLee over 2 years ago
- 1 comment
Labels: bug
#121 - 提取时,中文逗号,全变成英文逗号了
Issue -
State: closed - Opened by billgetjoy over 2 years ago
- 3 comments
Labels: bug
#120 - 错失了最好的 content
Issue -
State: open - Opened by zlj-zz almost 3 years ago
- 1 comment
Labels: bug
#119 - ListPageExtractor出错
Issue -
State: open - Opened by perpetually almost 3 years ago
- 1 comment
#118 - 拼接相对地址时,检测<base>标签
Issue -
State: open - Opened by kingname almost 3 years ago
Labels: enhancement
#117 - 在0.2.6版本上iter_node的出现的bug
Issue -
State: closed - Opened by breeef almost 3 years ago
- 1 comment
Labels: bug
#116 - Bump lxml from 4.6.2 to 4.6.3
Pull Request -
State: closed - Opened by dependabot[bot] about 3 years ago
- 1 comment
Labels: dependencies
#115 - Bump pyyaml from 5.3.1 to 5.4
Pull Request -
State: closed - Opened by dependabot[bot] about 3 years ago
- 1 comment
Labels: dependencies
#114 - 某政府网站新闻页面,过滤不到正文
Issue -
State: closed - Opened by Shawn-fung over 3 years ago
- 1 comment
Labels: bug
#113 - 提取列表的时候出现错误
Issue -
State: open - Opened by legend-zl over 3 years ago
#112 - 爬取交易所新闻失败
Issue -
State: closed - Opened by nickhuangxinyu over 3 years ago
- 1 comment
Labels: bug
#111 - Bump lxml from 4.5.2 to 4.6.2
Pull Request -
State: closed - Opened by dependabot[bot] over 3 years ago
Labels: dependencies
#110 - fix-html-py
Pull Request -
State: open - Opened by Sunxiaoni over 3 years ago
#109 - 有的页面只能抽取到第一段
Issue -
State: closed - Opened by shanliangLS over 3 years ago
- 3 comments
Labels: bug
#108 - 对于网易新闻&腾讯新闻的正文提取错误
Issue -
State: closed - Opened by changyt-1997 almost 4 years ago
- 6 comments
Labels: bug
#107 - fix htag and title lcm bug
Pull Request -
State: closed - Opened by kingname almost 4 years ago
#106 - 有的标题只能获取到一个字
Issue -
State: closed - Opened by shanliangLS almost 4 years ago
- 3 comments
Labels: bug
#105 - 微信公众号的文章 图片无法提取
Issue -
State: open - Opened by liangbaika almost 4 years ago
- 3 comments
Labels: bug
#104 - 使用extract方法进行内容抽取的时候如果加了对body内容的xpath配置就报错
Issue -
State: open - Opened by tranzwalle almost 4 years ago
- 3 comments
Labels: bug
#103 - update change log and setup.py
Pull Request -
State: closed - Opened by kingname almost 4 years ago
#102 - 修复标题中日期分割线导致提取失败的问题
Pull Request -
State: closed - Opened by kingname almost 4 years ago
#101 - 个别标题无法解析
Issue -
State: closed - Opened by liangbaika almost 4 years ago
- 3 comments
Labels: bug
#100 - 作者匹配,作者的名字是数字开头的话匹配为空
Issue -
State: open - Opened by ymj4023 almost 4 years ago
Labels: bug
#99 - 对源代码的字符进行规范化
Pull Request -
State: closed - Opened by kingname almost 4 years ago
#98 - gne TypeError: Argument 'element' has incorrect type (expected lxml.etree._Element, got lxml.etree._ElementUnicodeResult)错误
Issue -
State: closed - Opened by ymj4023 about 4 years ago
- 1 comment
Labels: bug
#97 - 快科技部分文章正文仅提取中间一段、标题提取错误,可能与p标签内子标签有关
Issue -
State: closed - Opened by zhs509 about 4 years ago
- 3 comments
#96 - 提取的正文值异常, 有正文外的大量字符
Issue -
State: closed - Opened by shishiwie about 4 years ago
- 1 comment
Labels: bug
#95 - 网页正文返回异常
Issue -
State: closed - Opened by shishiwie about 4 years ago
- 1 comment
Labels: bug
#94 - GNE正文返回结果异常, 没有正文,返回超链接
Issue -
State: closed - Opened by shishiwie about 4 years ago
- 2 comments
Labels: bug
#93 - 移除 footer 标签
Pull Request -
State: closed - Opened by kingname about 4 years ago
#92 - 从0.1.8版本更新到0.2.3版本后,新闻正文提取出错
Issue -
State: closed - Opened by 46319943 about 4 years ago
- 3 comments
Labels: bug
#91 - 正文指定 body_xpath 后无法提取
Issue -
State: closed - Opened by JerryChenn07 about 4 years ago
- 6 comments
Labels: bug
#90 - Bad Case
Issue -
State: open - Opened by kingname about 4 years ago
Labels: bug
#89 - 自动解析列表页
Issue -
State: open - Opened by kingname about 4 years ago
Labels: enhancement
#88 - useless_attr 现在必须完全匹配才会移除节点
Pull Request -
State: closed - Opened by kingname about 4 years ago
#87 - 更新文档
Pull Request -
State: closed - Opened by kingname over 4 years ago
#86 - 自动提取新闻列表页
Pull Request -
State: closed - Opened by kingname over 4 years ago
#85 - 微信公众号很不准,并且可否保留正文标签?
Issue -
State: open - Opened by lonycc over 4 years ago
Labels: bug
#84 - 修复提取标签中的文本时,只能提取最后一个节点的问题
Pull Request -
State: closed - Opened by kingname over 4 years ago
#83 - 大幅度提升提取速度
Pull Request -
State: closed - Opened by kingname over 4 years ago
#82 - 修复文档
Pull Request -
State: closed - Opened by kingname over 4 years ago
#81 - 更新版本号
Pull Request -
State: closed - Opened by kingname over 4 years ago
#80 - 20200606
Pull Request -
State: closed - Opened by kingname over 4 years ago
#79 - 百家号文章较大概率抽取正文失败
Issue -
State: open - Opened by jiangchao123 over 4 years ago
- 3 comments
Labels: bug
#78 - 另外一种实现可以参考
Issue -
State: open - Opened by baby5 over 4 years ago
#77 - 关于你提到的这篇论文
Issue -
State: closed - Opened by chrislinan over 4 years ago
- 3 comments
Labels: bug
#76 - 正文的第一张图片可能无法提取
Issue -
State: open - Opened by kingname over 4 years ago
- 1 comment
Labels: bug
#75 - feature: 来源中需要知道根新闻机构名称
Issue -
State: open - Opened by dawei101 over 4 years ago
- 2 comments
Labels: bug
#74 - 使用Pytest为GNE开发单元测试用例
Issue -
State: open - Opened by kingname over 4 years ago
- 1 comment
Labels: help wanted
#73 - 标题抽取逻辑应当调整
Issue -
State: closed - Opened by asyncins over 4 years ago
- 3 comments
Labels: bug
#72 - 提取Title的新思路
Issue -
State: closed - Opened by kingname over 4 years ago
- 2 comments
Labels: enhancement
#71 - ⚙️修复由于预处理导致用户自定义 XPath 无效的问题
Pull Request -
State: closed - Opened by kingname over 4 years ago
#70 - 关于标准差的作用calc_standard_deviation
Issue -
State: closed - Opened by hwg119 over 4 years ago
- 1 comment
#69 - update online test page
Pull Request -
State: closed - Opened by kingname over 4 years ago
#68 - 🔨添加更多测试案例
Pull Request -
State: closed - Opened by kingname over 4 years ago
#67 - 提取图片链接 可以保存在原来位置吗
Issue -
State: closed - Opened by JQ-K almost 5 years ago
- 2 comments
Labels: bug
#66 - ✅从 TODO 里面删除一条已经完成的内容
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#65 - 📅从 HTML 的 meta 信息里面尝试提取文章的发布时间
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#64 - 更新文档
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#63 - 允许定向抓取作者与发布时间
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#62 - 适配微信公众号
Issue -
State: closed - Opened by kingname almost 5 years ago
Labels: enhancement
#61 - gne提取36kr新闻无法获取title
Issue -
State: closed - Opened by rottengeek almost 5 years ago
- 4 comments
Labels: bug
#60 - 提取正文异常
Issue -
State: closed - Opened by python-D almost 5 years ago
- 1 comment
Labels: bug
#59 - 时间格式的支持
Issue -
State: closed - Opened by rottengeek almost 5 years ago
- 1 comment
Labels: bug
#58 - 百家号文章正文提取不到
Issue -
State: closed - Opened by dedegs almost 5 years ago
- 1 comment
Labels: bug
#57 - [Bug report] 澎湃新闻提取只有一小段
Issue -
State: closed - Opened by TTTPOB almost 5 years ago
- 2 comments
Labels: bug
#56 - Develop
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#55 - 添加百家号和新华网的测试样例
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#54 - 通过传入 host 参数把提取的图片 URL 拼接为绝对路径
Pull Request -
State: closed - Opened by kingname almost 5 years ago
#53 - 关于正文中的图片提取为全路径
Issue -
State: closed - Opened by gyco almost 5 years ago
- 1 comment
Labels: enhancement
#52 - 有些页面图片不能识别
Issue -
State: closed - Opened by ego008 almost 5 years ago
- 1 comment
#51 - Merge pull request #1 from kingname/master
Pull Request -
State: closed - Opened by MICHAELZYY almost 5 years ago
- 1 comment
#50 - 抽取新华网正文不正确
Issue -
State: closed - Opened by JingqiongWang almost 5 years ago
- 1 comment
#49 - 为GNE写一个效果测试页面
Issue -
State: closed - Opened by kingname almost 5 years ago
- 2 comments
Labels: enhancement
#48 - 外文正文准确度
Issue -
State: open - Opened by Monkey-Y about 5 years ago
- 1 comment
Labels: enhancement
#47 - 修复 凤凰网无法获取图片的问题
Pull Request -
State: closed - Opened by kingname about 5 years ago
#46 - update readme
Pull Request -
State: closed - Opened by kingname about 5 years ago
#45 - recover images
Pull Request -
State: closed - Opened by kingname about 5 years ago
#44 - 更新 readme
Pull Request -
State: closed - Opened by kingname about 5 years ago
#43 - 更新版本号与 readme
Pull Request -
State: closed - Opened by kingname about 5 years ago
#42 - 更新USELESS_ATTR
Pull Request -
State: closed - Opened by kingname about 5 years ago
#41 - remove empty p tag
Pull Request -
State: closed - Opened by kingname about 5 years ago
#39 - content提取内容重复
Issue -
State: closed - Opened by Mr-iFan about 5 years ago
- 5 comments
#37 - 修改为后序遍历
Pull Request -
State: closed - Opened by phantomSuying about 5 years ago
- 1 comment
#36 - 下面这个地址 content 内容位置不对
Issue -
State: closed - Opened by debugksir about 5 years ago
- 2 comments
#35 - 部分新浪新闻的正文提取不正确
Issue -
State: closed - Opened by Rockyzsu about 5 years ago
- 1 comment
#34 - 多个作者用 、连接无法提取
Issue -
State: closed - Opened by zhu733756 about 5 years ago
- 4 comments
Labels: enhancement
#33 - 使用动态规划优化提高计算性能
Issue -
State: open - Opened by kingname about 5 years ago
- 1 comment
Labels: enhancement
#31 - [讨论] 对于MD类文章的支持
Issue -
State: open - Opened by x956606865 about 5 years ago
- 3 comments
#29 - 百家号提取content字段不正确
Issue -
State: closed - Opened by siryang2006 about 5 years ago
- 2 comments
Labels: bug
#28 - RuntimeWarning: invalid value encountered in log node_info['sbdi'])
Issue -
State: open - Opened by siryang2006 about 5 years ago
- 1 comment
Labels: invalid