Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / archivesunleashed/aut issues and pull requests

#556 - `s3a` URLs don't work as in documentation

Issue - State: open - Opened by acruise 9 months ago - 1 comment
Labels: bug

#555 - Update Apache Commons Compress dependency.

Pull Request - State: closed - Opened by ruebot 9 months ago - 1 comment

#554 - Bump org.xerial.snappy:snappy-java from 1.1.10.1 to 1.1.10.4

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago - 1 comment
Labels: dependencies

#553 - Bump org.apache.tika:tika-core from 1.23 to 1.28.3

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#552 - Bump org.apache.spark:spark-core_2.12 from 3.0.1 to 3.3.3

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#551 - Bump snappy-java from 1.1.7.3 to 1.1.10.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#550 - Bump guava from 29.0-jre to 32.0.0-jre

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#549 - Bump spark-core_2.12 from 3.0.1 to 3.4.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies

#548 - Add scalafix and remove unused imports.

Pull Request - State: closed - Opened by ruebot about 2 years ago - 1 comment

#547 - Last modified headers

Pull Request - State: closed - Opened by ruebot about 2 years ago - 5 comments

#546 - Include last modified date for a resource

Issue - State: closed - Opened by ruebot about 2 years ago - 2 comments
Labels: Scala, feature, DataFrames, App

#545 - Use YYYYMMDD for crawl_date for DomainGraphExtractor.

Pull Request - State: closed - Opened by ruebot about 2 years ago - 1 comment

#544 - DomainGraph should use YYYYMMDD not YYYYMMDDHHMMSS

Issue - State: closed - Opened by ruebot about 2 years ago
Labels: bug, in progress, DataFrames, App

#543 - Bump jsoup from 1.14.2 to 1.15.3

Pull Request - State: closed - Opened by dependabot[bot] about 2 years ago - 1 comment
Labels: dependencies

#542 - org.apache.tika.mime.MimeTypeException: Invalid media type name: application/rss+xml lang=utf-8

Issue - State: closed - Opened by ruebot over 2 years ago - 1 comment
Labels: bug

#541 - Add ARCH text files derivatives.

Pull Request - State: closed - Opened by ruebot over 2 years ago - 1 comment

#540 - Add ARCH text files derivatives

Issue - State: closed - Opened by ruebot over 2 years ago
Labels: feature

#539 - Make webpages() consistent across aut and ARCH.

Pull Request - State: closed - Opened by ruebot over 2 years ago - 4 comments

#538 - Remove http headers, and html on webpages()

Issue - State: closed - Opened by ruebot over 2 years ago - 1 comment
Labels: bug, enhancement, DataFrames

#537 - Update README

Pull Request - State: closed - Opened by ruebot over 2 years ago - 1 comment

#536 - Fix codecov GitHub action.

Pull Request - State: closed - Opened by ruebot over 2 years ago - 1 comment

#535 - Bump commons-compress from 1.14 to 1.21

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 1 comment
Labels: dependencies

#534 - Add domain column to webpages()

Issue - State: closed - Opened by ruebot over 2 years ago
Labels: enhancement

#533 - Remove Java w/arc processing, and replace it with Sparkling.

Pull Request - State: closed - Opened by ruebot over 2 years ago - 8 comments

#532 - Discard date RDD filter only takes a single string, not a list of strings.

Issue - State: closed - Opened by ruebot over 2 years ago
Labels: bug

#531 - Bump jackson-databind from 2.10.0 to 2.12.6.1

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 2 comments
Labels: dependencies

#530 - Bump hadoop-common from 2.7.4 to 3.2.3

Pull Request - State: closed - Opened by dependabot[bot] over 2 years ago - 2 comments
Labels: dependencies

#528 - Bump hadoop-common from 2.7.4 to 2.10.1

Pull Request - State: closed - Opened by dependabot[bot] almost 3 years ago - 1 comment
Labels: dependencies

#527 - Bump xercesImpl from 2.12.0 to 2.12.2

Pull Request - State: closed - Opened by dependabot[bot] almost 3 years ago - 1 comment
Labels: dependencies

#526 - Change crawl_date format to YYYYMMDDHHMMSS, update hasDate filter.

Pull Request - State: closed - Opened by ruebot almost 3 years ago - 2 comments

#525 - Include timestamp in crawl date

Issue - State: closed - Opened by ruebot almost 3 years ago
Labels: enhancement

#524 - Replace scala-uri library from ExtractDomain.

Pull Request - State: closed - Opened by ruebot about 3 years ago

#523 - Issue 522

Pull Request - State: closed - Opened by ruebot about 3 years ago

#522 - Scaladocs haven't been created since 0.90.0 release

Issue - State: closed - Opened by ruebot about 3 years ago
Labels: bug

#521 - Replace scala-uri library from ExtractDomain and just parse public_suffix_list.dat

Issue - State: closed - Opened by ruebot about 3 years ago - 1 comment
Labels: enhancement, clean-up

#520 - Update ExtractDomain to extract apex domains.

Pull Request - State: closed - Opened by ruebot about 3 years ago - 8 comments

#519 - ExtractDomains returns non-Apex Domains

Issue - State: closed - Opened by ruebot about 3 years ago - 2 comments
Labels: bug

#518 - Bump jsoup from 1.13.1 to 1.14.2

Pull Request - State: closed - Opened by dependabot[bot] over 3 years ago - 1 comment
Labels: dependencies

#517 - Filter or filedesc and dns records from arcs.

Pull Request - State: closed - Opened by ruebot over 3 years ago - 1 comment

#516 - ARC file name appearing in `url` list

Issue - State: closed - Opened by ianmilligan1 over 3 years ago
Labels: bug

#515 - Handle wget WARC-Target-URI formatting.

Pull Request - State: closed - Opened by ruebot over 3 years ago - 2 comments

#514 - WARC-Target-URI in Wget warc files is not parsed properly

Issue - State: closed - Opened by javieraespinosa over 3 years ago - 1 comment
Labels: bug

#513 - Add missing crawl_date column to binary information jobs.

Pull Request - State: closed - Opened by ruebot over 3 years ago - 1 comment

#511 - Update jsoup to 1.13.1

Pull Request - State: closed - Opened by ruebot over 3 years ago - 1 comment

#510 - ars-cloud compatibility with aut and Java 11

Pull Request - State: closed - Opened by ruebot almost 4 years ago - 1 comment

#509 - Update required Scala version to 2.12

Issue - State: closed - Opened by ruebot almost 4 years ago - 1 comment
Labels: invalid

#508 - Update to Spark 3.0.1

Pull Request - State: closed - Opened by ruebot about 4 years ago - 1 comment

#507 - Replace TravisCI with GitHub Actions.

Pull Request - State: closed - Opened by ruebot about 4 years ago - 1 comment

#506 - Migrate CI infrastructure from TravisCI to GitHub Action

Issue - State: closed - Opened by ruebot about 4 years ago

#505 - Bump junit from 4.12 to 4.13.1

Pull Request - State: closed - Opened by dependabot[bot] about 4 years ago - 1 comment
Labels: dependencies

#504 - Fix relative links extraction

Pull Request - State: closed - Opened by yxzhu16 about 4 years ago - 1 comment

#503 - Remove .keepValidPages() on .all() Python implmentation.

Pull Request - State: closed - Opened by ruebot about 4 years ago

#502 - Python implementation of .all() has .keepValidPages() incorrectly applied to it

Issue - State: closed - Opened by ruebot about 4 years ago
Labels: bug, Python

#501 - Extract hyperlinks from wayback machine

Issue - State: closed - Opened by yxzhu16 about 4 years ago - 3 comments
Labels: bug, Scala

#500 - Updates read.me to include citation section

Pull Request - State: closed - Opened by SamFritz about 4 years ago - 3 comments

#499 - Remove tf project; resolves #498.

Pull Request - State: closed - Opened by ruebot about 4 years ago - 1 comment

#498 - Split tf into it's own repo

Issue - State: closed - Opened by ruebot about 4 years ago - 1 comment
Labels: clean-up

#497 - Update Read.me w/ citation information

Issue - State: closed - Opened by SamFritz about 4 years ago - 4 comments
Labels: question, documentation

#496 - Set the upper limit of WARC content length to half of Integer.MAX_VALUE

Pull Request - State: closed - Opened by adamyy about 4 years ago - 1 comment

#495 - Release 0.80.0 JAR produces error; built 0.80.1 fatjar built on repo works

Issue - State: closed - Opened by ianmilligan1 over 4 years ago - 2 comments
Labels: bug

#494 - Replace Java ARC/WARC record processing library

Issue - State: closed - Opened by ruebot over 4 years ago - 1 comment
Labels: enhancement, Java

#493 - Extract gzip data from transfer-encoded WARC

Issue - State: closed - Opened by ianmilligan1 over 4 years ago - 1 comment
Labels: bug, feature

#492 - ARC reader string vs int error on record length

Issue - State: closed - Opened by ruebot over 4 years ago - 2 comments
Labels: bug, Java

#491 - Hadoop 3.2 support

Pull Request - State: closed - Opened by ruebot over 4 years ago - 2 comments

#490 - Change master branch to main branch

Issue - State: closed - Opened by ruebot over 4 years ago - 4 comments

#489 - Add Python formatter GitHub Action.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#488 - GitHub action - Run isort and black on Python code

Issue - State: closed - Opened by ruebot over 4 years ago
Labels: Python, clean-up

#487 - Add scalafmt GitHub action and apply it to scala code.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#486 - Add scalafmt GitHub action

Issue - State: closed - Opened by ruebot over 4 years ago
Labels: Scala, clean-up

#485 - Add Google Java Formatter as an action, and apply it.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#484 - Add Google Java Formatter as a GitHub action

Issue - State: closed - Opened by ruebot over 4 years ago
Labels: Java, clean-up

#483 - Packages build is often broken - should we support it?

Issue - State: closed - Opened by ruebot over 4 years ago - 5 comments
Labels: discussion

#482 - Add Python implementation of SaveBytes.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 2 comments

#481 - Bump xercesImpl from 2.11.0 to 2.12.0

Pull Request - State: closed - Opened by dependabot[bot] over 4 years ago - 1 comment
Labels: dependencies

#480 - [Skip Travis] Trim README down given aut.docs.archivesunleashed.org

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#479 - Remove RDD suffixes on file, class, and object names.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 2 comments

#478 - Implement SaveToDisk in Python

Issue - State: closed - Opened by ruebot over 4 years ago - 1 comment
Labels: PySpark, Python, DataFrames

#477 - PEP8 Python app method names.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#476 - Broken link in documentation

Issue - State: closed - Opened by sepastian over 4 years ago - 6 comments
Labels: documentation

#475 - Move Python UDF methods out of their own class.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 1 comment

#474 - Add DataFrame udf tests.

Pull Request - State: closed - Opened by ruebot over 4 years ago - 3 comments

#473 - Improve udfs/package.scala test coverage

Issue - State: closed - Opened by ruebot over 4 years ago
Labels: tests, Scala

#468 - PEP8 Naming - UDFs, App method names, DataFrame names, and filters.

Issue - State: closed - Opened by ruebot over 4 years ago - 10 comments
Labels: discussion

#467 - Python UDFs - class or not?

Issue - State: closed - Opened by ruebot over 4 years ago - 5 comments
Labels: discussion

#428 - Encoding management

Issue - State: closed - Opened by alxdrdelaporte over 4 years ago - 11 comments
Labels: question

#411 - Research, test, and benchmark jwarc integration

Issue - State: closed - Opened by ruebot almost 5 years ago - 1 comment
Labels: wontfix, Java

#385 - Spark 3.0.0 + Hadoop 3.2 + Java 11 support.

Pull Request - State: closed - Opened by ruebot about 5 years ago - 2 comments

#375 - Spark 3.0.0 + Java 11 support.

Pull Request - State: closed - Opened by ruebot about 5 years ago - 11 comments

#371 - Convert RecordLoader.loadArchives to a Spark Data Source

Issue - State: closed - Opened by ruebot about 5 years ago - 4 comments
Labels: wontfix

#356 - Java 11 support

Issue - State: closed - Opened by ruebot over 5 years ago - 7 comments
Labels: Java

#329 - Upgrade to Hadoop 3.x

Issue - State: closed - Opened by jrwiebe over 5 years ago - 7 comments
Labels: wontfix

#262 - Improve CommandLineApp.scala test coverage

Issue - State: closed - Opened by ruebot over 6 years ago - 1 comment
Labels: tests, RA-Task

#261 - Improve ExtractBoilerpipeText.scala test coverage

Issue - State: closed - Opened by ruebot over 6 years ago - 1 comment
Labels: tests, RA-Task

#260 - Improve ArchiveRecord.scala test coverage

Issue - State: closed - Opened by ruebot over 6 years ago
Labels: tests, RA-Task

#247 - Method to perform finer-grained selection of ARCs and WARCs

Issue - State: closed - Opened by lintool over 6 years ago - 6 comments
Labels: enhancement, RA-Task, in progress

#182 - Unit testing for RecordLoader

Issue - State: closed - Opened by greebie over 6 years ago - 1 comment
Labels: tests