GitHub / CrawlScript/WebCollector issues and pull requests
#137 - refactor: design and implementation smells
Pull Request -
State: open - Opened by bhavya844 over 1 year ago
#136 - 访问的页面报502异常,但是还需要访问,visit异常设置了ExceptionUtils.fail(e)还是不行,怎么解决
Issue -
State: open - Opened by Amnesiabht almost 2 years ago
- 1 comment
#134 - Create TestNews.java
Pull Request -
State: closed - Opened by HiIamHiep over 2 years ago
#133 - Bump jsoup from 1.11.3 to 1.15.3
Pull Request -
State: open - Opened by dependabot[bot] almost 3 years ago
Labels: dependencies
#132 - Inefficient code detected in RegexRule.java
Issue -
State: open - Opened by linci8210 almost 3 years ago
- 1 comment
#131 - Bump mysql-connector-java from 5.1.46 to 8.0.28
Pull Request -
State: open - Opened by dependabot[bot] about 3 years ago
Labels: dependencies
#130 - ContentExtractor.getContentByUrl返回的内容没有空行等格式排版
Issue -
State: open - Opened by AmberYang678 about 3 years ago
- 1 comment
#129 - Bump gson from 2.8.5 to 2.8.9
Pull Request -
State: open - Opened by dependabot[bot] about 3 years ago
Labels: dependencies
#128 - Bump jsoup from 1.11.3 to 1.14.2
Pull Request -
State: closed - Opened by dependabot[bot] almost 4 years ago
- 1 comment
Labels: dependencies
#127 - 自动识别新闻时间部分存在BUG
Issue -
State: closed - Opened by KTsama over 4 years ago
- 1 comment
#126 - 大哥些 官方群都加不了了啊。全都提示满了
Issue -
State: closed - Opened by jiangqiang1996 over 4 years ago
- 1 comment
#125 - 请问论文中的准确度是如何计算的?
Issue -
State: open - Opened by fubicheng208 over 4 years ago
#124 - 访问连接307怎么处理啊
Issue -
State: open - Opened by nikesb23 over 4 years ago
#123 - Bump junit from 4.12 to 4.13.1
Pull Request -
State: open - Opened by dependabot[bot] almost 5 years ago
Labels: dependencies
#122 - Bump mysql-connector-java from 5.1.46 to 8.0.16
Pull Request -
State: closed - Opened by dependabot[bot] about 5 years ago
- 1 comment
Labels: dependencies
#121 - 删除日志
Pull Request -
State: open - Opened by wangqifan over 5 years ago
- 1 comment
#120 - out of memory 问题。
Issue -
State: open - Opened by wangqifan over 5 years ago
#118 - 抽取时间的正则在时那点应该改成【0-9】?
Issue -
State: open - Opened by bigzhouj over 5 years ago
#117 - 运行爬取CSDN示例代码时,出现RocksDBException,Failed to create a directory: C:\code\weibocrawler\crawl\crawldb: ϵͳÕҲ»µ½ָ¶
Issue -
State: open - Opened by jack13163 over 5 years ago
- 3 comments
#116 - ContentExtractor中的computeInfo函数会出现StackOverflowError
Issue -
State: open - Opened by yanpeng over 5 years ago
- 3 comments
#115 - 请问执行教程中的爬取CSDN博客原码出错
Issue -
State: open - Opened by dyn1721 over 5 years ago
- 1 comment
#114 - 亲问下分布式的版本在哪里
Issue -
State: open - Opened by xiaowenhuman over 5 years ago
#113 - 2.73-alpha版如何忽略https证书过期问题?
Issue -
State: open - Opened by hj287678654 over 5 years ago
- 2 comments
#112 - Bump c3p0 from 0.9.5.2 to 0.9.5.4
Pull Request -
State: open - Opened by dependabot[bot] over 5 years ago
Labels: dependencies
#111 - 请问如何在爬虫内部解决数据库连接过多的问题
Issue -
State: open - Opened by linye271709915 almost 6 years ago
#110 - add unit tests for ContentExtractor
Pull Request -
State: open - Opened by LordLRO almost 6 years ago
#109 - 抛异常的日志级别能不能改warn或error
Issue -
State: open - Opened by xiejx618 almost 6 years ago
#108 - 继承BreadthCrawler,获取网页中文部分输出乱码
Issue -
State: open - Opened by linye271709915 almost 6 years ago
- 2 comments
#107 - Add demo for selenium crawler with cookie
Pull Request -
State: open - Opened by smallyunet almost 6 years ago
- 3 comments
#106 - 前端渲染的页面怎么样使用webcollector进行爬取数据
Issue -
State: open - Opened by qiuqiu0802 almost 6 years ago
#104 - 发布包里包含log4j配置文件,会覆盖别人的log4j配置文件
Issue -
State: closed - Opened by gaoxjin over 6 years ago
- 3 comments
#103 - 爬取一段时间后总是会抛出RocksDBException异常,不清楚什么原因。
Issue -
State: open - Opened by tanwubo over 6 years ago
- 2 comments
#102 - WebCollector交流群
Issue -
State: open - Opened by mdzz9527 over 6 years ago
- 8 comments
#101 - Update README.md
Pull Request -
State: open - Opened by x-otto-x almost 7 years ago
#100 - Update DemoCookieCrawler.java
Pull Request -
State: closed - Opened by x-otto-x almost 7 years ago
#99 - Update README.md
Pull Request -
State: open - Opened by x-otto-x almost 7 years ago
#98 - Update DemoCookieCrawler.java
Pull Request -
State: closed - Opened by x-otto-x almost 7 years ago
#97 - WebCollector-Hadoop版本的源码请问有公开么?
Issue -
State: closed - Opened by coderf187 almost 7 years ago
- 1 comment
#96 - 有没有相关的交流群啊?
Issue -
State: open - Opened by liushaofeng89 almost 7 years ago
- 2 comments
#95 - 好像OkHttp ConnectionPool和Okio Watchdog没有正确关闭
Issue -
State: open - Opened by lewiswu1209 almost 7 years ago
- 4 comments
#94 - 能否将深度设置为只要有链接就会进行下一次爬取
Issue -
State: closed - Opened by hxq201300 almost 7 years ago
- 1 comment
#93 - 关于新版本设置UA不生效的问题
Issue -
State: open - Opened by CNdarkmoon almost 7 years ago
- 1 comment
#92 - 你好! LockTimeoutException
Issue -
State: closed - Opened by simplecnst almost 7 years ago
- 1 comment
#91 - 如何判断爬虫结束
Issue -
State: closed - Opened by djxhero almost 7 years ago
- 1 comment
#89 - 你好,RamCrawler大约加了70个种子,执行结果不稳定
Issue -
State: closed - Opened by gaoda1234 almost 7 years ago
- 3 comments
#88 - StrategyCrawler类的stop方法能否立即停止爬虫行为
Issue -
State: closed - Opened by BeQiang about 7 years ago
- 1 comment
#87 - 如何使用这个框架爬取手机app的数据呢?
Issue -
State: closed - Opened by x-otto-x about 7 years ago
- 1 comment
#86 - 官网配置教程中的NewsCrawler.java报错
Issue -
State: closed - Opened by MrKingHH about 7 years ago
- 1 comment
#85 - 注入URL,只执行一部分
Issue -
State: closed - Opened by x-otto-x about 7 years ago
- 4 comments
#84 - BerkeleyDBReader读取berkerly种子历史文件,种子信息少了,而且执行的次数也少一次
Issue -
State: closed - Opened by haixingmu about 7 years ago
- 1 comment
#83 - Exception when updating db, java.lang.InterruptedException,org.openqa.selenium.remote.UnreachableBrowserException: Error communicating with the remote browser. It may have died.
Issue -
State: closed - Opened by x-otto-x about 7 years ago
- 3 comments
#82 - Bug with depth
Issue -
State: closed - Opened by Aki1996 about 7 years ago
- 1 comment
#81 - 关于IP代理的问题
Issue -
State: closed - Opened by zhangzhengk about 7 years ago
- 4 comments
#80 - 设置了Config.MAX_EXECUTE_COUNT,但是因超时而失败的种子好像没有再次抓取,这是怎么回事
Issue -
State: closed - Opened by haixingmu over 7 years ago
- 6 comments
#79 - depth太大可能导致OOM
Pull Request -
State: closed - Opened by carryxyh over 7 years ago
- 1 comment
#78 - depth过大导致内存溢出
Issue -
State: closed - Opened by carryxyh over 7 years ago
- 9 comments
#77 - 当我连续爬取时出现403?怎么解决
Issue -
State: closed - Opened by df8305909 over 7 years ago
- 1 comment
#76 - 能否实现重复爬取URL
Issue -
State: closed - Opened by weiyinfu over 7 years ago
- 1 comment
#75 - 正文提取问题
Issue -
State: closed - Opened by Kaneki-x over 7 years ago
- 1 comment
#74 - 多个爬虫同时爬取
Issue -
State: closed - Opened by ljc930611 over 7 years ago
- 5 comments
#73 - 为什么用WebCollector的2.7.1版本拿不到图片数据了呢?
Issue -
State: closed - Opened by kongbb1 almost 8 years ago
- 1 comment
#72 - 2.70版本HttpRequest中的setUserAgent()方法无效
Issue -
State: closed - Opened by yanzuo1992 almost 8 years ago
- 6 comments
#71 - 移除unused变量&优化代码
Pull Request -
State: closed - Opened by feifeiiiiiiiiiii almost 8 years ago
#70 - 能否根据不同类型的seed实际情况,可以配置分配线程的数量?
Issue -
State: closed - Opened by Janus-Xu almost 8 years ago
- 1 comment
#69 - 打包中的源码注释是乱码的
Issue -
State: closed - Opened by Janus-Xu almost 8 years ago
- 1 comment
#68 - 有没有集群方案和demo?单机单线程测试爬取速度不理想
Issue -
State: closed - Opened by Janus-Xu almost 8 years ago
- 4 comments
#67 - 爬取给定url的所有子页面,子子页面
Issue -
State: closed - Opened by yaoyuanyy almost 8 years ago
- 3 comments
#66 - 增强时间提取功能添加一个gitignore file
Pull Request -
State: closed - Opened by imalec-huang almost 8 years ago
#65 - CrawlDatumFormater.class bug问题
Issue -
State: closed - Opened by ljc930611 about 8 years ago
#64 - CrawlDatums next中添加后续任务的问题
Issue -
State: closed - Opened by ljc930611 about 8 years ago
- 5 comments
#63 - jar包缺少问题
Issue -
State: closed - Opened by ljc930611 about 8 years ago
- 4 comments
#62 - 将url添加到CrawlDatums 不生效
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 2 comments
#61 - 能否添加多个正则规则呢?
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 1 comment
#60 - 如何在visited方法中把拼接的url放入到url队列中?
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 1 comment
#59 - 调用api 接口用什么方法返回数据呢?
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 1 comment
#58 - 现在maven仓库的版本是多少呢?
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 1 comment
#57 - 关于正则的问题
Issue -
State: closed - Opened by wuxiongliu1 over 8 years ago
- 1 comment
#56 - 添加序列化
Pull Request -
State: closed - Opened by LinuxSuRen over 8 years ago
#55 - Fix broken headings in Markdown files
Pull Request -
State: open - Opened by bryant1410 over 8 years ago
#54 - readme的是什么格式啊,怎么看来是乱的
Issue -
State: closed - Opened by LinuxSuRen over 8 years ago
#53 - delete the readme
Pull Request -
State: open - Opened by Junjiu over 8 years ago
#52 - 爬取的页面内部链接能修改么
Issue -
State: closed - Opened by zbcdj2008 over 8 years ago
- 2 comments
#51 - 想请教下,爬取的url信息储存到BDB中字段属性的相关说明
Issue -
State: open - Opened by newCheng over 8 years ago
- 1 comment
#50 - 教程链接挂了,能解决下吗
Issue -
State: closed - Opened by ahd763810566 over 8 years ago
- 1 comment
#49 - 设置timeout 以及设置 重连次数
Issue -
State: closed - Opened by loseyou over 8 years ago
- 1 comment
#48 - 新版本plugin下怎么没有的Mongo了?
Issue -
State: open - Opened by bulletmarker over 8 years ago
#47 - Update ContentExtractor time parser
Pull Request -
State: open - Opened by sundy-li almost 9 years ago
#46 - ContentExtractor时间解析不准确
Issue -
State: open - Opened by sundy-li almost 9 years ago
#45 - WebCollector 2.31 选择器(select bug)
Issue -
State: closed - Opened by stoneJava almost 9 years ago
- 5 comments
#44 - java.lang.NoClassDefFoundError: org/openqa/selenium/htmlunit/HtmlUnitDriver
Issue -
State: closed - Opened by huangwenyan1225 almost 9 years ago
- 4 comments
#43 - 深度爬取,存储berkeleydb错误,爬取完成不释放内存
Issue -
State: closed - Opened by 123yxp123 almost 9 years ago
- 2 comments
#42 - Exception in thread "main" java.lang.NoClassDefFoundError: com/sleepycat/je/EnvironmentConfig
Issue -
State: open - Opened by AceLee39 about 9 years ago
- 1 comment
#41 - crawlDatums.add(datum) 之后不继续执行
Issue -
State: closed - Opened by chen28683 about 9 years ago
- 1 comment
#40 - 爬虫的start代表爬行的深度,这个名字起得有点误解,不看代码以为线程启动呢
Issue -
State: closed - Opened by eagle-1949 about 9 years ago
- 1 comment
#39 - 很现实的一个问题,爬取网站受到网站访问频率限制
Issue -
State: open - Opened by eagle-1949 about 9 years ago
- 5 comments
#38 - 解析某些页面会出现死锁,如内容中列出的页面
Issue -
State: open - Opened by titibaba about 9 years ago
#37 - DemoDepthCrawler好像不能正常工作
Issue -
State: closed - Opened by lewiswu1209 over 9 years ago
- 3 comments
#36 - webcollector帮助文档问题
Issue -
State: closed - Opened by leiyz over 9 years ago
- 2 comments
#35 - 最新版的有在重定向时set cookie吗?
Issue -
State: closed - Opened by zemochen over 9 years ago
- 1 comment