Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / archivesunleashed/aut issues and pull requests
#556 - `s3a` URLs don't work as in documentation
Issue -
State: open - Opened by acruise 9 months ago
- 1 comment
Labels: bug
#555 - Update Apache Commons Compress dependency.
Pull Request -
State: closed - Opened by ruebot 9 months ago
- 1 comment
#554 - Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies
#553 - Bump org.apache.tika:tika-core from 1.23 to 1.28.3
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#552 - Bump org.apache.spark:spark-core_2.12 from 3.0.1 to 3.3.3
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#551 - Bump snappy-java from 1.1.7.3 to 1.1.10.1
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#550 - Bump guava from 29.0-jre to 32.0.0-jre
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#549 - Bump spark-core_2.12 from 3.0.1 to 3.4.0
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 1 comment
Labels: dependencies
#548 - Add scalafix and remove unused imports.
Pull Request -
State: closed - Opened by ruebot about 2 years ago
- 1 comment
#547 - Last modified headers
Pull Request -
State: closed - Opened by ruebot about 2 years ago
- 5 comments
#546 - Include last modified date for a resource
Issue -
State: closed - Opened by ruebot about 2 years ago
- 2 comments
Labels: Scala, feature, DataFrames, App
#545 - Use YYYYMMDD for crawl_date for DomainGraphExtractor.
Pull Request -
State: closed - Opened by ruebot about 2 years ago
- 1 comment
#544 - DomainGraph should use YYYYMMDD not YYYYMMDDHHMMSS
Issue -
State: closed - Opened by ruebot about 2 years ago
Labels: bug, in progress, DataFrames, App
#543 - Bump jsoup from 1.14.2 to 1.15.3
Pull Request -
State: closed - Opened by dependabot[bot] about 2 years ago
- 1 comment
Labels: dependencies
#542 - org.apache.tika.mime.MimeTypeException: Invalid media type name: application/rss+xml lang=utf-8
Issue -
State: closed - Opened by ruebot over 2 years ago
- 1 comment
Labels: bug
#541 - Add ARCH text files derivatives.
Pull Request -
State: closed - Opened by ruebot over 2 years ago
- 1 comment
#540 - Add ARCH text files derivatives
Issue -
State: closed - Opened by ruebot over 2 years ago
Labels: feature
#539 - Make webpages() consistent across aut and ARCH.
Pull Request -
State: closed - Opened by ruebot over 2 years ago
- 4 comments
#538 - Remove http headers, and html on webpages()
Issue -
State: closed - Opened by ruebot over 2 years ago
- 1 comment
Labels: bug, enhancement, DataFrames
#537 - Update README
Pull Request -
State: closed - Opened by ruebot over 2 years ago
- 1 comment
#536 - Fix codecov GitHub action.
Pull Request -
State: closed - Opened by ruebot over 2 years ago
- 1 comment
#535 - Bump commons-compress from 1.14 to 1.21
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 1 comment
Labels: dependencies
#534 - Add domain column to webpages()
Issue -
State: closed - Opened by ruebot over 2 years ago
Labels: enhancement
#533 - Remove Java w/arc processing, and replace it with Sparkling.
Pull Request -
State: closed - Opened by ruebot over 2 years ago
- 8 comments
#532 - Discard date RDD filter only takes a single string, not a list of strings.
Issue -
State: closed - Opened by ruebot over 2 years ago
Labels: bug
#531 - Bump jackson-databind from 2.10.0 to 2.12.6.1
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 2 comments
Labels: dependencies
#530 - Bump hadoop-common from 2.7.4 to 3.2.3
Pull Request -
State: closed - Opened by dependabot[bot] over 2 years ago
- 2 comments
Labels: dependencies
#529 - java.lang.RuntimeException: Unsupported literal type class scala.collection.immutable.Set$Set1 Set(liberal.ca)
Issue -
State: closed - Opened by JakeBickUKGWA over 2 years ago
- 1 comment
#528 - Bump hadoop-common from 2.7.4 to 2.10.1
Pull Request -
State: closed - Opened by dependabot[bot] almost 3 years ago
- 1 comment
Labels: dependencies
#527 - Bump xercesImpl from 2.12.0 to 2.12.2
Pull Request -
State: closed - Opened by dependabot[bot] almost 3 years ago
- 1 comment
Labels: dependencies
#526 - Change crawl_date format to YYYYMMDDHHMMSS, update hasDate filter.
Pull Request -
State: closed - Opened by ruebot almost 3 years ago
- 2 comments
#525 - Include timestamp in crawl date
Issue -
State: closed - Opened by ruebot almost 3 years ago
Labels: enhancement
#524 - Replace scala-uri library from ExtractDomain.
Pull Request -
State: closed - Opened by ruebot about 3 years ago
#523 - Issue 522
Pull Request -
State: closed - Opened by ruebot about 3 years ago
#522 - Scaladocs haven't been created since 0.90.0 release
Issue -
State: closed - Opened by ruebot about 3 years ago
Labels: bug
#521 - Replace scala-uri library from ExtractDomain and just parse public_suffix_list.dat
Issue -
State: closed - Opened by ruebot about 3 years ago
- 1 comment
Labels: enhancement, clean-up
#520 - Update ExtractDomain to extract apex domains.
Pull Request -
State: closed - Opened by ruebot about 3 years ago
- 8 comments
#519 - ExtractDomains returns non-Apex Domains
Issue -
State: closed - Opened by ruebot about 3 years ago
- 2 comments
Labels: bug
#518 - Bump jsoup from 1.13.1 to 1.14.2
Pull Request -
State: closed - Opened by dependabot[bot] over 3 years ago
- 1 comment
Labels: dependencies
#517 - Filter or filedesc and dns records from arcs.
Pull Request -
State: closed - Opened by ruebot over 3 years ago
- 1 comment
#516 - ARC file name appearing in `url` list
Issue -
State: closed - Opened by ianmilligan1 over 3 years ago
Labels: bug
#515 - Handle wget WARC-Target-URI formatting.
Pull Request -
State: closed - Opened by ruebot over 3 years ago
- 2 comments
#514 - WARC-Target-URI in Wget warc files is not parsed properly
Issue -
State: closed - Opened by javieraespinosa over 3 years ago
- 1 comment
Labels: bug
#513 - Add missing crawl_date column to binary information jobs.
Pull Request -
State: closed - Opened by ruebot over 3 years ago
- 1 comment
#512 - crawl_date is not included on binary information jobs when documentation says it is
Issue -
State: closed - Opened by ruebot over 3 years ago
Labels: bug
#511 - Update jsoup to 1.13.1
Pull Request -
State: closed - Opened by ruebot over 3 years ago
- 1 comment
#510 - ars-cloud compatibility with aut and Java 11
Pull Request -
State: closed - Opened by ruebot almost 4 years ago
- 1 comment
#509 - Update required Scala version to 2.12
Issue -
State: closed - Opened by ruebot almost 4 years ago
- 1 comment
Labels: invalid
#508 - Update to Spark 3.0.1
Pull Request -
State: closed - Opened by ruebot about 4 years ago
- 1 comment
#507 - Replace TravisCI with GitHub Actions.
Pull Request -
State: closed - Opened by ruebot about 4 years ago
- 1 comment
#506 - Migrate CI infrastructure from TravisCI to GitHub Action
Issue -
State: closed - Opened by ruebot about 4 years ago
#505 - Bump junit from 4.12 to 4.13.1
Pull Request -
State: closed - Opened by dependabot[bot] about 4 years ago
- 1 comment
Labels: dependencies
#504 - Fix relative links extraction
Pull Request -
State: closed - Opened by yxzhu16 about 4 years ago
- 1 comment
#503 - Remove .keepValidPages() on .all() Python implmentation.
Pull Request -
State: closed - Opened by ruebot about 4 years ago
#502 - Python implementation of .all() has .keepValidPages() incorrectly applied to it
Issue -
State: closed - Opened by ruebot about 4 years ago
Labels: bug, Python
#501 - Extract hyperlinks from wayback machine
Issue -
State: closed - Opened by yxzhu16 about 4 years ago
- 3 comments
Labels: bug, Scala
#500 - Updates read.me to include citation section
Pull Request -
State: closed - Opened by SamFritz about 4 years ago
- 3 comments
#499 - Remove tf project; resolves #498.
Pull Request -
State: closed - Opened by ruebot about 4 years ago
- 1 comment
#498 - Split tf into it's own repo
Issue -
State: closed - Opened by ruebot about 4 years ago
- 1 comment
Labels: clean-up
#497 - Update Read.me w/ citation information
Issue -
State: closed - Opened by SamFritz about 4 years ago
- 4 comments
Labels: question, documentation
#496 - Set the upper limit of WARC content length to half of Integer.MAX_VALUE
Pull Request -
State: closed - Opened by adamyy about 4 years ago
- 1 comment
#495 - Release 0.80.0 JAR produces error; built 0.80.1 fatjar built on repo works
Issue -
State: closed - Opened by ianmilligan1 over 4 years ago
- 2 comments
Labels: bug
#494 - Replace Java ARC/WARC record processing library
Issue -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
Labels: enhancement, Java
#493 - Extract gzip data from transfer-encoded WARC
Issue -
State: closed - Opened by ianmilligan1 over 4 years ago
- 1 comment
Labels: bug, feature
#492 - ARC reader string vs int error on record length
Issue -
State: closed - Opened by ruebot over 4 years ago
- 2 comments
Labels: bug, Java
#491 - Hadoop 3.2 support
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 2 comments
#490 - Change master branch to main branch
Issue -
State: closed - Opened by ruebot over 4 years ago
- 4 comments
#489 - Add Python formatter GitHub Action.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#488 - GitHub action - Run isort and black on Python code
Issue -
State: closed - Opened by ruebot over 4 years ago
Labels: Python, clean-up
#487 - Add scalafmt GitHub action and apply it to scala code.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#486 - Add scalafmt GitHub action
Issue -
State: closed - Opened by ruebot over 4 years ago
Labels: Scala, clean-up
#485 - Add Google Java Formatter as an action, and apply it.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#484 - Add Google Java Formatter as a GitHub action
Issue -
State: closed - Opened by ruebot over 4 years ago
Labels: Java, clean-up
#483 - Packages build is often broken - should we support it?
Issue -
State: closed - Opened by ruebot over 4 years ago
- 5 comments
Labels: discussion
#482 - Add Python implementation of SaveBytes.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 2 comments
#481 - Bump xercesImpl from 2.11.0 to 2.12.0
Pull Request -
State: closed - Opened by dependabot[bot] over 4 years ago
- 1 comment
Labels: dependencies
#480 - [Skip Travis] Trim README down given aut.docs.archivesunleashed.org
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#479 - Remove RDD suffixes on file, class, and object names.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 2 comments
#478 - Implement SaveToDisk in Python
Issue -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
Labels: PySpark, Python, DataFrames
#477 - PEP8 Python app method names.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#476 - Broken link in documentation
Issue -
State: closed - Opened by sepastian over 4 years ago
- 6 comments
Labels: documentation
#475 - Move Python UDF methods out of their own class.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 1 comment
#474 - Add DataFrame udf tests.
Pull Request -
State: closed - Opened by ruebot over 4 years ago
- 3 comments
#473 - Improve udfs/package.scala test coverage
Issue -
State: closed - Opened by ruebot over 4 years ago
Labels: tests, Scala
#468 - PEP8 Naming - UDFs, App method names, DataFrame names, and filters.
Issue -
State: closed - Opened by ruebot over 4 years ago
- 10 comments
Labels: discussion
#467 - Python UDFs - class or not?
Issue -
State: closed - Opened by ruebot over 4 years ago
- 5 comments
Labels: discussion
#428 - Encoding management
Issue -
State: closed - Opened by alxdrdelaporte over 4 years ago
- 11 comments
Labels: question
#411 - Research, test, and benchmark jwarc integration
Issue -
State: closed - Opened by ruebot almost 5 years ago
- 1 comment
Labels: wontfix, Java
#385 - Spark 3.0.0 + Hadoop 3.2 + Java 11 support.
Pull Request -
State: closed - Opened by ruebot about 5 years ago
- 2 comments
#375 - Spark 3.0.0 + Java 11 support.
Pull Request -
State: closed - Opened by ruebot about 5 years ago
- 11 comments
#371 - Convert RecordLoader.loadArchives to a Spark Data Source
Issue -
State: closed - Opened by ruebot about 5 years ago
- 4 comments
Labels: wontfix
#356 - Java 11 support
Issue -
State: closed - Opened by ruebot over 5 years ago
- 7 comments
Labels: Java
#329 - Upgrade to Hadoop 3.x
Issue -
State: closed - Opened by jrwiebe over 5 years ago
- 7 comments
Labels: wontfix
#317 - Heap space!! java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3332)
Issue -
State: open - Opened by ruebot over 5 years ago
- 21 comments
Labels: bug
#308 - Strategy to deal with conflict between application and Spark distribution dependencies
Issue -
State: closed - Opened by jrwiebe almost 6 years ago
- 6 comments
#262 - Improve CommandLineApp.scala test coverage
Issue -
State: closed - Opened by ruebot over 6 years ago
- 1 comment
Labels: tests, RA-Task
#261 - Improve ExtractBoilerpipeText.scala test coverage
Issue -
State: closed - Opened by ruebot over 6 years ago
- 1 comment
Labels: tests, RA-Task
#260 - Improve ArchiveRecord.scala test coverage
Issue -
State: closed - Opened by ruebot over 6 years ago
Labels: tests, RA-Task
#247 - Method to perform finer-grained selection of ARCs and WARCs
Issue -
State: closed - Opened by lintool over 6 years ago
- 6 comments
Labels: enhancement, RA-Task, in progress
#182 - Unit testing for RecordLoader
Issue -
State: closed - Opened by greebie over 6 years ago
- 1 comment
Labels: tests