Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / HTTPArchive/data-pipeline issues and pull requests

#169 - Bump apache-beam[gcp] from 2.43.0 to 2.44.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies, python

#167 - Bump actions/upload-artifact from 3.1.1 to 3.1.2

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies, github_actions

#166 - Bump github/super-linter from 4.9.7 to 4.10.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies, github_actions

#162 - Add Dataflow flex templates and GCP Workflows

Pull Request - State: closed - Opened by giancarloaf over 1 year ago - 3 comments

#161 - Update har manifest instructions

Pull Request - State: closed - Opened by giancarloaf over 1 year ago - 1 comment

#160 - Investigate why the Dataflow jobs stall on the remaining HARs

Issue - State: closed - Opened by rviscomi over 1 year ago - 1 comment
Labels: bug

#159 - Add example flex template commands to the README

Pull Request - State: closed - Opened by rviscomi over 1 year ago - 1 comment

#158 - Bump black from 22.10.0 to 22.12.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies, python

#157 - Add HAR manifest generation steps to README

Pull Request - State: closed - Opened by giancarloaf over 1 year ago - 1 comment

#156 - Fix home page filtering for summary requests

Pull Request - State: closed - Opened by giancarloaf over 1 year ago - 2 comments

#155 - Run November BigQuery pipeline

Issue - State: closed - Opened by rviscomi over 1 year ago - 9 comments

#154 - Bump apache-beam[gcp] from 2.41.0 to 2.43.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies, python

#153 - Too many pages in `summary_requests`

Issue - State: closed - Opened by rviscomi over 1 year ago - 1 comment
Labels: bug

#152 - Combined data pipeline failed to process October dataset

Issue - State: closed - Opened by rviscomi over 1 year ago
Labels: bug

#151 - Bump actions/upload-artifact from 3.1.0 to 3.1.1

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies, github_actions

#150 - Create auto updating `sample_data` queries

Issue - State: open - Opened by tunetheweb over 1 year ago - 5 comments

#149 - The new schema and cost concerns for users

Issue - State: open - Opened by tunetheweb over 1 year ago - 4 comments

#148 - Blink feature tables haven't been updated

Issue - State: closed - Opened by tunetheweb over 1 year ago - 1 comment
Labels: bug

#147 - Bump apache-beam[gcp] from 2.41.0 to 2.42.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies, python

#146 - Bump ewjoachim/python-coverage-comment-action from 2.1.0 to 3.0.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies, github_actions

#145 - Bump black from 22.8.0 to 22.10.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 1 comment
Labels: dependencies, python

#144 - Bump github/super-linter from 4.9.6 to 4.9.7

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago - 1 comment
Labels: dependencies, github_actions

#143 - Explore integrating with Cloudflare's Domain Intelligence API

Issue - State: open - Opened by rviscomi almost 2 years ago - 1 comment

#142 - Investigate views on `all` dataset

Issue - State: open - Opened by tunetheweb almost 2 years ago - 6 comments

#141 - Convert the `latest` dataset to views

Issue - State: open - Opened by rviscomi almost 2 years ago - 15 comments
Labels: enhancement

#140 - Improve test coverage of summary pipeline

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment

#139 - Account for redirects when selecting secondary page URL

Issue - State: closed - Opened by rviscomi almost 2 years ago - 2 comments
Labels: bug

#137 - Bump black from 22.6.0 to 22.8.0

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago - 1 comment
Labels: dependencies, python

#136 - Bug/backfill fixes

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment

#135 - Bump apache-beam[gcp] from 2.40.0 to 2.41.0

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago - 2 comments
Labels: dependencies, python

#134 - Improve test coverage

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment

#133 - August "combined" tables contain duplicates

Issue - State: closed - Opened by rviscomi almost 2 years ago - 5 comments
Labels: bug

#132 - Add python code coverage badge to README.md

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment

#131 - Add python code coverage

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment

#130 - Add unit tests for beam functionality

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago

#129 - Subscription for "releases" at the end of a pipeline run

Issue - State: open - Opened by giancarloaf almost 2 years ago - 1 comment

#128 - Pub/Sub Bottleneck

Issue - State: closed - Opened by pmeenan almost 2 years ago - 1 comment

#127 - Assess secondary page quality

Issue - State: open - Opened by rviscomi almost 2 years ago - 1 comment
Labels: question

#126 - Fix `doctype` - stringify output

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago - 1 comment
Labels: bug

#125 - Bump github/super-linter from 4.9.5 to 4.9.6

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependencies, github_actions

#124 - Summary statistics for BigQuery `all` tables

Issue - State: open - Opened by giancarloaf almost 2 years ago

#123 - Fix parsed_css root page field

Pull Request - State: closed - Opened by rviscomi almost 2 years ago

#122 - Bump github/super-linter from 4.9.4 to 4.9.5

Pull Request - State: closed - Opened by dependabot[bot] almost 2 years ago
Labels: dependencies, github_actions

#121 - Deprecate streaming pipeline

Pull Request - State: closed - Opened by giancarloaf almost 2 years ago

#120 - Set up mechanism for triggering the batch pipeline

Issue - State: closed - Opened by rviscomi almost 2 years ago - 1 comment
Labels: enhancement

#119 - Switch data pipeline defaults from streaming to batch

Issue - State: closed - Opened by rviscomi almost 2 years ago
Labels: enhancement

#118 - Standard library of custom BigQuery functions

Issue - State: open - Opened by rviscomi almost 2 years ago
Labels: enhancement

#117 - Clean up intermediary load job tables on BQ

Issue - State: closed - Opened by rviscomi almost 2 years ago - 2 comments
Labels: bug

#116 - Secondary pages marked as root pages in parsed CSS table

Issue - State: closed - Opened by rviscomi almost 2 years ago - 1 comment
Labels: bug

#115 - Pipe parsed_css custom metric data into BQ

Pull Request - State: closed - Opened by rviscomi about 2 years ago

#114 - Improved documentation for the reports

Issue - State: open - Opened by Themanwithoutaplan about 2 years ago - 1 comment

#113 - Crawlid to be continued after EOL of batching

Issue - State: open - Opened by Themanwithoutaplan about 2 years ago

#112 - Page summary reports contain duplicates

Issue - State: closed - Opened by Themanwithoutaplan about 2 years ago - 4 comments
Labels: bug

#111 - Combined pipeline fixes from July 2022 crawl

Pull Request - State: closed - Opened by giancarloaf about 2 years ago

#109 - `All` pipeline improvements

Pull Request - State: closed - Opened by giancarloaf about 2 years ago - 2 comments

#108 - Reformat to avoid linter errors

Pull Request - State: closed - Opened by tunetheweb about 2 years ago

#107 - Bump black from 22.3.0 to 22.6.0

Pull Request - State: closed - Opened by dependabot[bot] about 2 years ago
Labels: dependencies, python

#106 - Add pip to dependabot.yml

Pull Request - State: closed - Opened by giancarloaf about 2 years ago

#105 - Update non-summary partitioning

Pull Request - State: closed - Opened by giancarloaf about 2 years ago - 4 comments

#104 - Bump beam SDK to v2.40

Pull Request - State: closed - Opened by giancarloaf about 2 years ago

#103 - Exceeded rate limits: too many api requests per user per method for this user_method

Issue - State: closed - Opened by rviscomi about 2 years ago - 4 comments
Labels: bug

#102 - Error: The Dataflow job may be impacted by insufficient Pub/Sub quota

Issue - State: closed - Opened by rviscomi about 2 years ago - 3 comments
Labels: bug

#101 - Ensure JSON pop has a default

Pull Request - State: closed - Opened by rviscomi about 2 years ago - 1 comment

#98 - Prepare for the July 2022 crawl

Issue - State: closed - Opened by rviscomi about 2 years ago - 1 comment

#97 - Python CSS parser in combined pipeline

Pull Request - State: closed - Opened by rviscomi about 2 years ago

#96 - Python CSS parser

Pull Request - State: closed - Opened by rviscomi about 2 years ago - 1 comment

#94 - Investigate why the crawl stalled on the remaining 300k HARs in June 2022

Issue - State: closed - Opened by rviscomi about 2 years ago - 2 comments
Labels: bug

#93 - Keep secondary page data separate from home page data (for now)

Issue - State: closed - Opened by rviscomi about 2 years ago

#92 - Eliminate the need to run batch pipelines

Issue - State: closed - Opened by rviscomi about 2 years ago
Labels: enhancement

#91 - Combine summary and non-summary pipelines

Pull Request - State: closed - Opened by giancarloaf about 2 years ago - 5 comments
Labels: enhancement

#66 - Add new image formats and change typ to type

Pull Request - State: closed - Opened by tunetheweb about 2 years ago - 1 comment

#46 - Interacting on websites

Issue - State: open - Opened by nrllh about 2 years ago - 15 comments

#43 - Discrepancies between experimental and legacy summary_pages pipelines

Issue - State: closed - Opened by rviscomi about 2 years ago - 13 comments
Labels: bug

#38 - Combine Dataflow pipelines

Issue - State: closed - Opened by rviscomi over 2 years ago - 1 comment

#18 - Pre-parse CSS in Dataflow before writing to BigQuery

Issue - State: closed - Opened by rviscomi over 2 years ago - 1 comment

#15 - Reorganize the BigQuery datasets to be more efficient

Issue - State: open - Opened by rviscomi over 2 years ago - 13 comments

#8 - Achieve 100% test coverage for all new pipeline code

Issue - State: closed - Opened by rviscomi over 2 years ago - 2 comments

#6 - Document how the new GCP pipeline works

Issue - State: open - Opened by rviscomi over 2 years ago - 1 comment

#3 - Add the ability to monitor each stage of the GCP pipeline

Issue - State: closed - Opened by rviscomi over 2 years ago - 8 comments