Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/tokenizers issues and pull requests

#1325 - Having problem with loading pretrained tokenizer.

Issue - State: closed - Opened by bkhanal-11 about 1 year ago - 1 comment

#1324 - tokenizers 0.13.3 build issue in ubuntu 22.04 rust 1.17.1

Issue - State: closed - Opened by HappyCodingLover about 1 year ago - 2 comments

#1323 - Breaking changes in a minor version update.

Issue - State: open - Opened by jafioti about 1 year ago - 1 comment

#1323 - Breaking changes in a minor version update.

Issue - State: open - Opened by jafioti about 1 year ago - 2 comments
Labels: Stale

#1323 - Breaking changes in a minor version update.

Issue - State: closed - Opened by jafioti about 1 year ago - 2 comments
Labels: Stale

#1322 - pyo3: update to 0.19

Pull Request - State: closed - Opened by mikelui about 1 year ago - 3 comments

#1321 - Fix stride condition.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1321 - Fix stride condition.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1321 - Fix stride condition.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1320 - Release all at once for simplicity.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1320 - Release all at once for simplicity.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1319 - 0.13.4.rc1

Pull Request - State: closed - Opened by Narsil about 1 year ago - 3 comments

#1319 - 0.13.4.rc1

Pull Request - State: closed - Opened by Narsil about 1 year ago - 3 comments

#1319 - 0.13.4.rc1

Pull Request - State: closed - Opened by Narsil about 1 year ago - 3 comments

#1318 - CD backports

Pull Request - State: closed - Opened by chris-ha458 about 1 year ago - 7 comments

#1317 - Derive clone for TrainerWrapper

Pull Request - State: closed - Opened by jonatanklosko about 1 year ago - 2 comments

#1316 - Add `expect()` for disabling truncation

Pull Request - State: closed - Opened by boyleconnor about 1 year ago - 5 comments

#1313 - BpeTrainer is too slow compared to YouTokenToMe. Why?

Issue - State: closed - Opened by saberiato about 1 year ago - 13 comments
Labels: Stale

#1312 - Check that processors add the number of tokens they say they will

Pull Request - State: closed - Opened by boyleconnor about 1 year ago - 6 comments
Labels: Stale

#1311 - sentencepiece and importing with .from_spm do not encode the same way

Issue - State: closed - Opened by kellymarchisio about 1 year ago - 7 comments
Labels: Stale

#1310 - How to train BPE tokenizer with multiple CPU

Issue - State: closed - Opened by voidmagic about 1 year ago - 2 comments

#1309 - Added tokens not getting encoded in gpt2 tokenizer

Issue - State: closed - Opened by jaykasundra2 about 1 year ago - 4 comments

#1308 - Handle when precompiled charsmap is empty

Pull Request - State: closed - Opened by kellymarchisio about 1 year ago - 1 comment

#1306 - Give error when initializing tokenizer with too high stride

Pull Request - State: closed - Opened by boyleconnor about 1 year ago - 2 comments

#1305 - Get the eos_token and eos_token_id in Rust

Issue - State: closed - Opened by junsoo999 about 1 year ago - 1 comment

#1304 - Possible infinite loop in Encoding merge

Issue - State: closed - Opened by timkaas about 1 year ago - 2 comments
Labels: Stale

#1303 - Single warning for holes.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 2 comments

#1302 - feat: Added CITATION.cff.

Pull Request - State: closed - Opened by SamuelLarkin about 1 year ago - 2 comments

#1301 - Loading and Hosting pre-trained spm SentencePieceUnigram tokenizer

Issue - State: closed - Opened by meliksahturker about 1 year ago - 3 comments

#1300 - Incorrect offsets with add_prefix_space=True

Issue - State: closed - Opened by david-waterworth about 1 year ago - 7 comments
Labels: Stale

#1299 - Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago - 1 comment
Labels: dependencies, javascript

#1298 - Is there a way to supress this warning?

Issue - State: closed - Opened by shatz01 about 1 year ago - 7 comments

#1296 - Fixing clippy warnings on 1.71.

Pull Request - State: closed - Opened by Narsil about 1 year ago - 1 comment

#1295 - import Tuple from typing

Pull Request - State: closed - Opened by kellymarchisio about 1 year ago - 2 comments

#1294 - fix 'Tuple' is not defined at sentencepiece_unigram.py

Pull Request - State: closed - Opened by fenglui about 1 year ago - 1 comment

#1293 - Bump semver from 5.7.1 to 5.7.2 in /bindings/node

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago - 1 comment
Labels: dependencies, javascript, Stale

#1292 - Update path name: master -> main

Pull Request - State: closed - Opened by bact about 1 year ago - 1 comment

#1291 - Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node

Pull Request - State: closed - Opened by dependabot[bot] about 1 year ago
Labels: dependencies, javascript

#1290 - Make correct padding for text generation with GPT-NEO

Issue - State: closed - Opened by junoriosity about 1 year ago - 3 comments

#1289 - revise type specification

Pull Request - State: closed - Opened by hiroshi-matsuda-rit about 1 year ago - 1 comment

#1286 - Question: phoneme tokenizer?

Issue - State: closed - Opened by pfeatherstone about 1 year ago - 1 comment

#1285 - Compile fails when feature "onig" is off.

Issue - State: closed - Opened by jafioti about 1 year ago - 1 comment

#1284 - DataCollatorForLanguageModeling call of tokenizer.pad causes crash

Issue - State: closed - Opened by condor-cp over 1 year ago - 2 comments

#1283 - Bart custom tokenizer

Issue - State: closed - Opened by BakingBrains over 1 year ago - 5 comments
Labels: Stale

#1282 - [Question] When is "Ġ" added in the pre-tokenization step?

Issue - State: closed - Opened by hkvision over 1 year ago - 2 comments

#1281 - Incorrect treatment of special tokens added prior to training

Issue - State: closed - Opened by mwaskom over 1 year ago - 10 comments
Labels: Stale

#1280 - Seeding a tokenizer with initial tokens

Issue - State: closed - Opened by BramVanroy over 1 year ago - 3 comments
Labels: Stale

#1278 - update tokenizer vocab

Issue - State: closed - Opened by XDeepAzure over 1 year ago - 3 comments
Labels: Stale

#1277 - Additional special tokens re-added after calling `train_new_from_iterator`.

Issue - State: closed - Opened by Kinyugo over 1 year ago - 8 comments
Labels: Stale

#1276 - pre_tokenizers/byte_level.rs lacking docs

Issue - State: closed - Opened by steventrouble over 1 year ago - 2 comments
Labels: Stale

#1275 - Improve error for truncation with too high stride

Pull Request - State: closed - Opened by boyleconnor over 1 year ago - 1 comment

#1274 - Include Extra Whitespace after Decode

Issue - State: closed - Opened by karan-dalal over 1 year ago - 2 comments

#1273 - [doc build] Use secrets

Pull Request - State: closed - Opened by mishig25 over 1 year ago - 1 comment

#1272 - Update README.md - Broken link

Pull Request - State: closed - Opened by sbhavani over 1 year ago

#1271 - Update Cargo.toml for bindings

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 2 comments
Labels: Stale

#1270 - don't update added tokens map if token already in vocab

Pull Request - State: closed - Opened by ArthurZucker over 1 year ago - 2 comments

#1269 - Error message for too-high stride could be much clearer

Issue - State: closed - Opened by boyleconnor over 1 year ago - 4 comments

#1268 - Fixing broken link.

Pull Request - State: closed - Opened by Narsil over 1 year ago - 1 comment

#1267 - broken link at the documentation of tokenizers/quicktour

Issue - State: closed - Opened by youkaichao over 1 year ago - 1 comment

#1266 - Update Cargo.toml

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 2 comments

#1265 - how to use train_new_from_iterator() attribute with facebook/bart-base model tokenizer

Issue - State: closed - Opened by Saitaruntulabandula over 1 year ago - 3 comments
Labels: Stale

#1264 - fix documentation regarding regex

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 6 comments

#1263 - [Bug]? how does the tokenizer encode the special tokens?

Issue - State: closed - Opened by vpegasus over 1 year ago - 3 comments

#1262 - Parallelize (pretokenization)

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 2 comments
Labels: Stale

#1261 - Documentation Update Request: tokenizers.normalizers.Replace and tokenizers.Regex

Issue - State: closed - Opened by piquan over 1 year ago - 4 comments
Labels: Stale

#1260 - question about python test result of add_special_token function in tokenizer

Issue - State: closed - Opened by DamonsJ over 1 year ago - 2 comments

#1259 - Path to implementing --max_sentence_length

Issue - State: closed - Opened by chris-ha458 over 1 year ago - 7 comments

#1257 - Update unigram/trainer.rs

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 1 comment

#1256 - Update all GH Actions with dependency on actions/checkout

Pull Request - State: closed - Opened by mfuntowicz over 1 year ago - 1 comment

#1254 - Using vendored ssl instead of manylinux one

Pull Request - State: closed - Opened by Narsil over 1 year ago - 2 comments

#1253 - How to install this repo from source code

Issue - State: closed - Opened by zheyuye over 1 year ago - 3 comments

#1252 - Openssl version 1.0.1e detected in the whl file

Issue - State: closed - Opened by zhzhping over 1 year ago - 2 comments
Labels: Stale

#1251 - Makes `decode` and `decode_batch` work on borrowed content.

Pull Request - State: closed - Opened by mfuntowicz over 1 year ago - 1 comment

#1250 - [Bug] Inconsistent removal of leading and trailing whitespace for Metaspace pretokenizers

Issue - State: closed - Opened by xenova over 1 year ago - 11 comments
Labels: Stale

#1249 - conda

Issue - State: closed - Opened by egilber over 1 year ago

#1248 - Replace deprecated command with environment file

Pull Request - State: closed - Opened by jongwooo over 1 year ago - 1 comment
Labels: Stale

#1247 - Replace deprecated `set-output` command with environment file

Issue - State: closed - Opened by jongwooo over 1 year ago - 1 comment
Labels: Stale

#1246 - Entire rewrite of node bindings.

Pull Request - State: closed - Opened by Narsil over 1 year ago - 5 comments

#1245 - Update Cargo.toml rust edition

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 3 comments

#1244 - fix unigram.rs test_sample()

Pull Request - State: closed - Opened by chris-ha458 over 1 year ago - 1 comment

#1243 - Bug with tokenizer.add_tokens()

Issue - State: closed - Opened by liujuncn over 1 year ago - 8 comments
Labels: Stale

#1242 - Fail trying to find sentence-transformer

Issue - State: closed - Opened by rasrov over 1 year ago - 3 comments

#1241 - AutoTokenizer is prepending something else compared to GPT2Tokenizer

Issue - State: closed - Opened by kruthay over 1 year ago - 3 comments

#1240 - Tokenizer Performance Comparision between the Python and the Rust version.

Issue - State: closed - Opened by songkq over 1 year ago - 4 comments

#1239 - Garbage value in Bertweet-large tokenizer's max_len_sentences_pair

Issue - State: closed - Opened by adithya8 over 1 year ago - 3 comments

#1238 - Creating a tokenizer for flaubert in rust

Issue - State: closed - Opened by larochef over 1 year ago - 8 comments
Labels: Stale

#1237 - LlamaTokenizer adds space when decoding `<s>`

Issue - State: closed - Opened by lizelive over 1 year ago - 5 comments

#1236 - Is SentencePieceBPETokenizer officially supported?

Issue - State: closed - Opened by keunwoochoi over 1 year ago - 9 comments

#1235 - Patched

Pull Request - State: closed - Opened by Narsil over 1 year ago - 1 comment

#1233 - Fixing padding_left sequence_ids.

Pull Request - State: closed - Opened by Narsil over 1 year ago - 3 comments

#1232 - Make tokenizer return information about case of the word

Issue - State: closed - Opened by maiiabocharova over 1 year ago - 2 comments

#1231 - Npm error during tokenizers installation

Issue - State: closed - Opened by Barbariskaa over 1 year ago - 2 comments