An open API service for providing issue and pull request metadata for open source projects.

GitHub / CrawlScript/WebCollector issues and pull requests

#137 - refactor: design and implementation smells

Pull Request - State: open - Opened by bhavya844 over 1 year ago

#134 - Create TestNews.java

Pull Request - State: closed - Opened by HiIamHiep over 2 years ago

#133 - Bump jsoup from 1.11.3 to 1.15.3

Pull Request - State: open - Opened by dependabot[bot] almost 3 years ago
Labels: dependencies

#132 - Inefficient code detected in RegexRule.java

Issue - State: open - Opened by linci8210 almost 3 years ago - 1 comment

#131 - Bump mysql-connector-java from 5.1.46 to 8.0.28

Pull Request - State: open - Opened by dependabot[bot] about 3 years ago
Labels: dependencies

#129 - Bump gson from 2.8.5 to 2.8.9

Pull Request - State: open - Opened by dependabot[bot] about 3 years ago
Labels: dependencies

#128 - Bump jsoup from 1.11.3 to 1.14.2

Pull Request - State: closed - Opened by dependabot[bot] almost 4 years ago - 1 comment
Labels: dependencies

#127 - 自动识别新闻时间部分存在BUG

Issue - State: closed - Opened by KTsama over 4 years ago - 1 comment

#126 - 大哥些 官方群都加不了了啊。全都提示满了

Issue - State: closed - Opened by jiangqiang1996 over 4 years ago - 1 comment

#125 - 请问论文中的准确度是如何计算的?

Issue - State: open - Opened by fubicheng208 over 4 years ago

#124 - 访问连接307怎么处理啊

Issue - State: open - Opened by nikesb23 over 4 years ago

#123 - Bump junit from 4.12 to 4.13.1

Pull Request - State: open - Opened by dependabot[bot] almost 5 years ago
Labels: dependencies

#122 - Bump mysql-connector-java from 5.1.46 to 8.0.16

Pull Request - State: closed - Opened by dependabot[bot] about 5 years ago - 1 comment
Labels: dependencies

#121 - 删除日志

Pull Request - State: open - Opened by wangqifan over 5 years ago - 1 comment

#120 - out of memory 问题。

Issue - State: open - Opened by wangqifan over 5 years ago

#118 - 抽取时间的正则在时那点应该改成【0-9】?

Issue - State: open - Opened by bigzhouj over 5 years ago

#116 - ContentExtractor中的computeInfo函数会出现StackOverflowError

Issue - State: open - Opened by yanpeng over 5 years ago - 3 comments

#115 - 请问执行教程中的爬取CSDN博客原码出错

Issue - State: open - Opened by dyn1721 over 5 years ago - 1 comment

#114 - 亲问下分布式的版本在哪里

Issue - State: open - Opened by xiaowenhuman over 5 years ago

#113 - 2.73-alpha版如何忽略https证书过期问题?

Issue - State: open - Opened by hj287678654 over 5 years ago - 2 comments

#112 - Bump c3p0 from 0.9.5.2 to 0.9.5.4

Pull Request - State: open - Opened by dependabot[bot] over 5 years ago
Labels: dependencies

#110 - add unit tests for ContentExtractor

Pull Request - State: open - Opened by LordLRO almost 6 years ago

#109 - 抛异常的日志级别能不能改warn或error

Issue - State: open - Opened by xiejx618 almost 6 years ago

#108 - 继承BreadthCrawler,获取网页中文部分输出乱码

Issue - State: open - Opened by linye271709915 almost 6 years ago - 2 comments

#107 - Add demo for selenium crawler with cookie

Pull Request - State: open - Opened by smallyunet almost 6 years ago - 3 comments

#104 - 发布包里包含log4j配置文件,会覆盖别人的log4j配置文件

Issue - State: closed - Opened by gaoxjin over 6 years ago - 3 comments

#102 - WebCollector交流群

Issue - State: open - Opened by mdzz9527 over 6 years ago - 8 comments

#101 - Update README.md

Pull Request - State: open - Opened by x-otto-x almost 7 years ago

#100 - Update DemoCookieCrawler.java

Pull Request - State: closed - Opened by x-otto-x almost 7 years ago

#99 - Update README.md

Pull Request - State: open - Opened by x-otto-x almost 7 years ago

#98 - Update DemoCookieCrawler.java

Pull Request - State: closed - Opened by x-otto-x almost 7 years ago

#97 - WebCollector-Hadoop版本的源码请问有公开么?

Issue - State: closed - Opened by coderf187 almost 7 years ago - 1 comment

#96 - 有没有相关的交流群啊?

Issue - State: open - Opened by liushaofeng89 almost 7 years ago - 2 comments

#95 - 好像OkHttp ConnectionPool和Okio Watchdog没有正确关闭

Issue - State: open - Opened by lewiswu1209 almost 7 years ago - 4 comments

#94 - 能否将深度设置为只要有链接就会进行下一次爬取

Issue - State: closed - Opened by hxq201300 almost 7 years ago - 1 comment

#93 - 关于新版本设置UA不生效的问题

Issue - State: open - Opened by CNdarkmoon almost 7 years ago - 1 comment

#92 - 你好! LockTimeoutException

Issue - State: closed - Opened by simplecnst almost 7 years ago - 1 comment

#91 - 如何判断爬虫结束

Issue - State: closed - Opened by djxhero almost 7 years ago - 1 comment

#90 - 重定向

Issue - State: closed - Opened by YYSpace almost 7 years ago - 4 comments

#89 - 你好,RamCrawler大约加了70个种子,执行结果不稳定

Issue - State: closed - Opened by gaoda1234 almost 7 years ago - 3 comments

#88 - StrategyCrawler类的stop方法能否立即停止爬虫行为

Issue - State: closed - Opened by BeQiang about 7 years ago - 1 comment

#87 - 如何使用这个框架爬取手机app的数据呢?

Issue - State: closed - Opened by x-otto-x about 7 years ago - 1 comment

#86 - 官网配置教程中的NewsCrawler.java报错

Issue - State: closed - Opened by MrKingHH about 7 years ago - 1 comment

#85 - 注入URL,只执行一部分

Issue - State: closed - Opened by x-otto-x about 7 years ago - 4 comments

#82 - Bug with depth

Issue - State: closed - Opened by Aki1996 about 7 years ago - 1 comment

#81 - 关于IP代理的问题

Issue - State: closed - Opened by zhangzhengk about 7 years ago - 4 comments

#79 - depth太大可能导致OOM

Pull Request - State: closed - Opened by carryxyh over 7 years ago - 1 comment

#78 - depth过大导致内存溢出

Issue - State: closed - Opened by carryxyh over 7 years ago - 9 comments

#77 - 当我连续爬取时出现403?怎么解决

Issue - State: closed - Opened by df8305909 over 7 years ago - 1 comment

#76 - 能否实现重复爬取URL

Issue - State: closed - Opened by weiyinfu over 7 years ago - 1 comment

#75 - 正文提取问题

Issue - State: closed - Opened by Kaneki-x over 7 years ago - 1 comment

#74 - 多个爬虫同时爬取

Issue - State: closed - Opened by ljc930611 over 7 years ago - 5 comments

#73 - 为什么用WebCollector的2.7.1版本拿不到图片数据了呢?

Issue - State: closed - Opened by kongbb1 almost 8 years ago - 1 comment

#72 - 2.70版本HttpRequest中的setUserAgent()方法无效

Issue - State: closed - Opened by yanzuo1992 almost 8 years ago - 6 comments

#71 - 移除unused变量&优化代码

Pull Request - State: closed - Opened by feifeiiiiiiiiiii almost 8 years ago

#70 - 能否根据不同类型的seed实际情况,可以配置分配线程的数量?

Issue - State: closed - Opened by Janus-Xu almost 8 years ago - 1 comment

#69 - 打包中的源码注释是乱码的

Issue - State: closed - Opened by Janus-Xu almost 8 years ago - 1 comment

#68 - 有没有集群方案和demo?单机单线程测试爬取速度不理想

Issue - State: closed - Opened by Janus-Xu almost 8 years ago - 4 comments

#67 - 爬取给定url的所有子页面,子子页面

Issue - State: closed - Opened by yaoyuanyy almost 8 years ago - 3 comments

#66 - 增强时间提取功能添加一个gitignore file

Pull Request - State: closed - Opened by imalec-huang almost 8 years ago

#65 - CrawlDatumFormater.class bug问题

Issue - State: closed - Opened by ljc930611 about 8 years ago

#64 - CrawlDatums next中添加后续任务的问题

Issue - State: closed - Opened by ljc930611 about 8 years ago - 5 comments

#63 - jar包缺少问题

Issue - State: closed - Opened by ljc930611 about 8 years ago - 4 comments

#62 - 将url添加到CrawlDatums 不生效

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 2 comments

#61 - 能否添加多个正则规则呢?

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 1 comment

#60 - 如何在visited方法中把拼接的url放入到url队列中?

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 1 comment

#59 - 调用api 接口用什么方法返回数据呢?

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 1 comment

#58 - 现在maven仓库的版本是多少呢?

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 1 comment

#57 - 关于正则的问题

Issue - State: closed - Opened by wuxiongliu1 over 8 years ago - 1 comment

#56 - 添加序列化

Pull Request - State: closed - Opened by LinuxSuRen over 8 years ago

#55 - Fix broken headings in Markdown files

Pull Request - State: open - Opened by bryant1410 over 8 years ago

#54 - readme的是什么格式啊,怎么看来是乱的

Issue - State: closed - Opened by LinuxSuRen over 8 years ago

#53 - delete the readme

Pull Request - State: open - Opened by Junjiu over 8 years ago

#52 - 爬取的页面内部链接能修改么

Issue - State: closed - Opened by zbcdj2008 over 8 years ago - 2 comments

#51 - 想请教下,爬取的url信息储存到BDB中字段属性的相关说明

Issue - State: open - Opened by newCheng over 8 years ago - 1 comment

#50 - 教程链接挂了,能解决下吗

Issue - State: closed - Opened by ahd763810566 over 8 years ago - 1 comment

#49 - 设置timeout 以及设置 重连次数

Issue - State: closed - Opened by loseyou over 8 years ago - 1 comment

#48 - 新版本plugin下怎么没有的Mongo了?

Issue - State: open - Opened by bulletmarker over 8 years ago

#47 - Update ContentExtractor time parser

Pull Request - State: open - Opened by sundy-li almost 9 years ago

#46 - ContentExtractor时间解析不准确

Issue - State: open - Opened by sundy-li almost 9 years ago

#45 - WebCollector 2.31 选择器(select bug)

Issue - State: closed - Opened by stoneJava almost 9 years ago - 5 comments

#44 - java.lang.NoClassDefFoundError: org/openqa/selenium/htmlunit/HtmlUnitDriver

Issue - State: closed - Opened by huangwenyan1225 almost 9 years ago - 4 comments

#43 - 深度爬取,存储berkeleydb错误,爬取完成不释放内存

Issue - State: closed - Opened by 123yxp123 almost 9 years ago - 2 comments

#41 - crawlDatums.add(datum) 之后不继续执行

Issue - State: closed - Opened by chen28683 about 9 years ago - 1 comment

#39 - 很现实的一个问题,爬取网站受到网站访问频率限制

Issue - State: open - Opened by eagle-1949 about 9 years ago - 5 comments

#37 - DemoDepthCrawler好像不能正常工作

Issue - State: closed - Opened by lewiswu1209 over 9 years ago - 3 comments

#36 - webcollector帮助文档问题

Issue - State: closed - Opened by leiyz over 9 years ago - 2 comments

#35 - 最新版的有在重定向时set cookie吗?

Issue - State: closed - Opened by zemochen over 9 years ago - 1 comment