Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / bigscience-workshop/metadata issues and pull requests

#198 - Prefix special tokens

Pull Request - State: open - Opened by jordiclive over 1 year ago

#197 - feat: create a merged vld set of designated metadata

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#196 - Train june

Pull Request - State: open - Opened by jordiclive over 1 year ago

#195 - feat: enable generation_length_text

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#194 - feat: add local special tokens for HTML

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#193 - fix: avoid the situation of inf loss * weight → nan

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#192 - Eval loop

Pull Request - State: open - Opened by jordiclive over 1 year ago

#191 - Update tokenizer in evaluation

Pull Request - State: closed - Opened by manandey over 1 year ago

#190 - Set logits of entity related special tokens to -infinity

Pull Request - State: closed - Opened by manandey over 1 year ago

#189 - Configurable html sample rate, and without metada same context option.

Pull Request - State: closed - Opened by jordiclive over 1 year ago - 1 comment

#188 - Update metadata_utils.py

Pull Request - State: open - Opened by jordiclive over 1 year ago

#187 - Update metadata_utils.py

Pull Request - State: open - Opened by jordiclive over 1 year ago

#186 - fix: args and defaults

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#185 - feat: update eval script

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#184 - fix issue with embedding size too small

Pull Request - State: closed - Opened by jordiclive over 1 year ago

#183 - feat: v2.yaml for the resampled training set

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#182 - Patch 3

Pull Request - State: closed - Opened by tianjianjiang over 1 year ago

#181 - Add prompting baseline eval

Pull Request - State: closed - Opened by ppommer over 1 year ago

#180 - Fix stuff and add new readme

Pull Request - State: closed - Opened by cccntu over 1 year ago

#179 - Changes for eval

Pull Request - State: closed - Opened by Muennighoff almost 2 years ago

#178 - Add loss plotting

Pull Request - State: closed - Opened by Muennighoff almost 2 years ago

#177 - debug code (WIP)

Pull Request - State: open - Opened by cccntu almost 2 years ago

#176 - Update train.py

Pull Request - State: closed - Opened by Muennighoff almost 2 years ago

#175 - Fix mask bug

Pull Request - State: closed - Opened by cccntu almost 2 years ago

#174 - Fix code quality tests

Pull Request - State: closed - Opened by manandey almost 2 years ago

#173 - Add CM3 loss

Pull Request - State: open - Opened by masoudjs almost 2 years ago - 2 comments

#172 - test evaluation script

Pull Request - State: closed - Opened by cccntu almost 2 years ago

#171 - evaluation script debugging

Pull Request - State: closed - Opened by cccntu almost 2 years ago

#170 - ci: pin Python, Ubuntu, & GH Action versions

Pull Request - State: closed - Opened by tianjianjiang almost 2 years ago - 1 comment

#169 - Fix eval script

Pull Request - State: closed - Opened by ppommer almost 2 years ago - 2 comments

#168 - Minor updates in evaluation script

Pull Request - State: closed - Opened by manandey about 2 years ago

#167 - fix: ms-timestamp conversion

Pull Request - State: closed - Opened by tianjianjiang about 2 years ago

#166 - build: upgrade transformers for a consistent version of huggingface_hub

Pull Request - State: closed - Opened by tianjianjiang about 2 years ago
Labels: bug

#165 - Add evaluation pipeline

Pull Request - State: closed - Opened by ppommer about 2 years ago - 4 comments

#164 - Updates

Pull Request - State: closed - Opened by cccntu about 2 years ago - 2 comments

#163 - Fix file list

Pull Request - State: closed - Opened by cccntu about 2 years ago

#162 - Refactor a util function

Pull Request - State: closed - Opened by cccntu over 2 years ago

#161 - add special tokens for entities

Pull Request - State: closed - Opened by manandey over 2 years ago - 1 comment

#159 - fix: timestamp precision

Pull Request - State: closed - Opened by tianjianjiang over 2 years ago
Labels: bug

#158 - Fix streaming mode

Pull Request - State: closed - Opened by cccntu over 2 years ago

#157 - add new configs for entity_paragraph

Pull Request - State: closed - Opened by manandey over 2 years ago - 1 comment

#156 - new configuration for HTML tags

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#155 - Finalizing for big training

Pull Request - State: closed - Opened by cccntu over 2 years ago - 2 comments

#154 - Revert "Filter examples by `num_chars` to include in a batch (#137)"

Pull Request - State: closed - Opened by manandey over 2 years ago

#153 - Add separate EntityParagraph processor

Pull Request - State: closed - Opened by manandey over 2 years ago

#152 - Adapt code to use new data format

Pull Request - State: closed - Opened by cccntu over 2 years ago - 3 comments
Labels: #dataset

#151 - feat: clean up website desc.

Issue - State: closed - Opened by tianjianjiang over 2 years ago
Labels: #dataset

#150 - feat: tag clean website desc., entity paragraph, and title

Pull Request - State: closed - Opened by tianjianjiang over 2 years ago - 1 comment
Labels: #dataset

#149 - feat: add paragraph-entity metadata

Issue - State: closed - Opened by tianjianjiang over 2 years ago

#148 - feat: add title

Issue - State: closed - Opened by tianjianjiang over 2 years ago - 1 comment
Labels: #dataset

#147 - Post processing website desc

Pull Request - State: closed - Opened by shanyas10 over 2 years ago - 1 comment

#146 - add example that build a dataset

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#145 - feat: add paragraphs

Pull Request - State: closed - Opened by tianjianjiang over 2 years ago

#144 - Entity at paragraph level

Pull Request - State: closed - Opened by manandey over 2 years ago
Labels: #dataset

#143 - Additional changes to test the entities extraction

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#142 - [WIP] entities extraction tentative 2

Pull Request - State: closed - Opened by SaulLu over 2 years ago

#141 - Create evaluation_utils.py

Pull Request - State: closed - Opened by shanyas10 over 2 years ago

#140 - test gpt2-xl

Pull Request - State: closed - Opened by cccntu over 2 years ago - 3 comments

#139 - build: sync setup.py defined dependencies and fix broken ones

Pull Request - State: closed - Opened by tianjianjiang over 2 years ago - 1 comment
Labels: bug

#138 - Common simple eval function to calculate ppl

Issue - State: open - Opened by shanyas10 almost 3 years ago

#137 - Filter examples by `num_chars` to include in a batch

Pull Request - State: closed - Opened by manandey almost 3 years ago - 7 comments

#136 - Fix accelerate not using multi-GPU

Pull Request - State: closed - Opened by cccntu almost 3 years ago - 1 comment

#135 - Update add_metadata.py

Pull Request - State: closed - Opened by manandey almost 3 years ago - 1 comment

#134 - `Title` preprocessor

Pull Request - State: closed - Opened by manandey almost 3 years ago

#133 - Add `title` metadata processor

Pull Request - State: closed - Opened by manandey almost 3 years ago

#132 - build: pin datasets to 1.17.0

Pull Request - State: closed - Opened by tianjianjiang almost 3 years ago
Labels: bug

#131 - Add script to convert the dataset in compressed jsonlines files

Pull Request - State: closed - Opened by SaulLu almost 3 years ago

#130 - build: bump nltk to 3.6.7 for security and performance

Pull Request - State: closed - Opened by tianjianjiang almost 3 years ago
Labels: bug

#128 - feat: mark paragraphs by metadata-html #125

Pull Request - State: closed - Opened by tianjianjiang almost 3 years ago
Labels: #paragraph_extraction

#127 - Add filters to `HtmlProcessor`

Pull Request - State: closed - Opened by SaulLu almost 3 years ago - 1 comment

#126 - Remove entity description

Pull Request - State: closed - Opened by manandey almost 3 years ago - 1 comment

#125 - feat: HTML scanner for text content & content sectioning elements → segment paragraphs

Issue - State: closed - Opened by tianjianjiang almost 3 years ago
Labels: #paragraph_extraction

#124 - Create Dataset with metadata

Issue - State: open - Opened by SaulLu almost 3 years ago
Labels: #dataset, Epic

#123 - Add fp16, multi-GPU training script (toy dataset)

Pull Request - State: closed - Opened by cccntu almost 3 years ago

#110 - Which HTML tags should be used during training?

Issue - State: closed - Opened by norakassner almost 3 years ago - 1 comment
Labels: duplicate

#108 - Evaluation bias

Issue - State: open - Opened by norakassner almost 3 years ago - 1 comment

#103 - Add code to sampling multiple metadata

Pull Request - State: closed - Opened by cccntu almost 3 years ago - 1 comment

#99 - Evaluation toxicity for website description and data source

Issue - State: open - Opened by norakassner almost 3 years ago - 1 comment

#98 - data analysis: website description (quality and yield)

Issue - State: open - Opened by norakassner almost 3 years ago

#97 - Start joint training

Issue - State: open - Opened by norakassner almost 3 years ago

#96 - eval hyperparameters: occupied tokens

Issue - State: open - Opened by norakassner almost 3 years ago

#95 - entity tagging speedup

Issue - State: closed - Opened by norakassner almost 3 years ago - 1 comment

#94 - estimate amount of data

Issue - State: open - Opened by norakassner almost 3 years ago

#93 - eval hyperparameters: amount of metadata

Issue - State: open - Opened by norakassner almost 3 years ago

#92 - method to sample global metadata

Issue - State: open - Opened by norakassner almost 3 years ago

#91 - method to sample local metadata

Issue - State: open - Opened by norakassner almost 3 years ago

#90 - explore hyperparameters:

Issue - State: open - Opened by norakassner almost 3 years ago

#89 - simple zero-shot eval function: time stamps

Issue - State: open - Opened by norakassner almost 3 years ago

#88 - simple zero-shot eval function: website description

Issue - State: open - Opened by norakassner almost 3 years ago - 1 comment

#87 - simple zero-shot eval function: datasource

Issue - State: open - Opened by norakassner almost 3 years ago

#86 - simple zero-shot eval function: entity tags

Issue - State: open - Opened by norakassner almost 3 years ago

#85 - simple zero-shot eval function: HTML tags

Issue - State: open - Opened by norakassner almost 3 years ago

#84 - simple zero-shot eval function: generation length

Issue - State: open - Opened by norakassner almost 3 years ago - 1 comment

#84 - simple zero-shot eval function: generation length

Issue - State: open - Opened by norakassner almost 3 years ago - 1 comment

#83 - Handle the comment specific type not recognized by pyarrow

Pull Request - State: closed - Opened by SaulLu almost 3 years ago - 1 comment

#82 - Change torch version + make it optional

Pull Request - State: closed - Opened by SaulLu almost 3 years ago

#81 - update: generation length and datasource

Pull Request - State: closed - Opened by chkla almost 3 years ago

#80 - Update entity-tags preprocessing code to speed up the process

Pull Request - State: closed - Opened by manandey almost 3 years ago