Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/tokenizers issues and pull requests
#1325 - Having problem with loading pretrained tokenizer.
Issue -
State: closed - Opened by bkhanal-11 about 1 year ago
- 1 comment
#1324 - tokenizers 0.13.3 build issue in ubuntu 22.04 rust 1.17.1
Issue -
State: closed - Opened by HappyCodingLover about 1 year ago
- 2 comments
#1323 - Breaking changes in a minor version update.
Issue -
State: open - Opened by jafioti about 1 year ago
- 1 comment
#1323 - Breaking changes in a minor version update.
Issue -
State: open - Opened by jafioti about 1 year ago
- 2 comments
Labels: Stale
#1323 - Breaking changes in a minor version update.
Issue -
State: closed - Opened by jafioti about 1 year ago
- 2 comments
Labels: Stale
#1322 - pyo3: update to 0.19
Pull Request -
State: closed - Opened by mikelui about 1 year ago
- 3 comments
#1321 - Fix stride condition.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1321 - Fix stride condition.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1321 - Fix stride condition.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1320 - Release all at once for simplicity.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1320 - Release all at once for simplicity.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1319 - 0.13.4.rc1
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 3 comments
#1319 - 0.13.4.rc1
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 3 comments
#1319 - 0.13.4.rc1
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 3 comments
#1318 - CD backports
Pull Request -
State: closed - Opened by chris-ha458 about 1 year ago
- 7 comments
#1317 - Derive clone for TrainerWrapper
Pull Request -
State: closed - Opened by jonatanklosko about 1 year ago
- 2 comments
#1316 - Add `expect()` for disabling truncation
Pull Request -
State: closed - Opened by boyleconnor about 1 year ago
- 5 comments
#1315 - Unable to Import BertWordPieceTokenizer from "tokenizers" Package
Issue -
State: closed - Opened by hanxk about 1 year ago
#1314 - `processor.added_tokens()` and the actual number of tokens added by `processor.process_encodings()` are defined separately, but there is nothing checking that they align
Issue -
State: closed - Opened by boyleconnor about 1 year ago
- 3 comments
Labels: Stale
#1313 - BpeTrainer is too slow compared to YouTokenToMe. Why?
Issue -
State: closed - Opened by saberiato about 1 year ago
- 13 comments
Labels: Stale
#1312 - Check that processors add the number of tokens they say they will
Pull Request -
State: closed - Opened by boyleconnor about 1 year ago
- 6 comments
Labels: Stale
#1311 - sentencepiece and importing with .from_spm do not encode the same way
Issue -
State: closed - Opened by kellymarchisio about 1 year ago
- 7 comments
Labels: Stale
#1310 - How to train BPE tokenizer with multiple CPU
Issue -
State: closed - Opened by voidmagic about 1 year ago
- 2 comments
#1309 - Added tokens not getting encoded in gpt2 tokenizer
Issue -
State: closed - Opened by jaykasundra2 about 1 year ago
- 4 comments
#1308 - Handle when precompiled charsmap is empty
Pull Request -
State: closed - Opened by kellymarchisio about 1 year ago
- 1 comment
#1307 - Exception: Error while attempting to build Precompiled normalizer: Cannot parse precompiled_charsmap
Issue -
State: closed - Opened by kellymarchisio about 1 year ago
- 1 comment
#1306 - Give error when initializing tokenizer with too high stride
Pull Request -
State: closed - Opened by boyleconnor about 1 year ago
- 2 comments
#1305 - Get the eos_token and eos_token_id in Rust
Issue -
State: closed - Opened by junsoo999 about 1 year ago
- 1 comment
#1304 - Possible infinite loop in Encoding merge
Issue -
State: closed - Opened by timkaas about 1 year ago
- 2 comments
Labels: Stale
#1303 - Single warning for holes.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 2 comments
#1302 - feat: Added CITATION.cff.
Pull Request -
State: closed - Opened by SamuelLarkin about 1 year ago
- 2 comments
#1301 - Loading and Hosting pre-trained spm SentencePieceUnigram tokenizer
Issue -
State: closed - Opened by meliksahturker about 1 year ago
- 3 comments
#1300 - Incorrect offsets with add_prefix_space=True
Issue -
State: closed - Opened by david-waterworth about 1 year ago
- 7 comments
Labels: Stale
#1299 - Bump word-wrap from 1.2.3 to 1.2.4 in /bindings/node
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies, javascript
#1298 - Is there a way to supress this warning?
Issue -
State: closed - Opened by shatz01 about 1 year ago
- 7 comments
#1297 - Error when loading tokenizer from a file: data did not match any variant of untagged enum ModelWrapper
Issue -
State: closed - Opened by delgermurun about 1 year ago
- 3 comments
#1296 - Fixing clippy warnings on 1.71.
Pull Request -
State: closed - Opened by Narsil about 1 year ago
- 1 comment
#1295 - import Tuple from typing
Pull Request -
State: closed - Opened by kellymarchisio about 1 year ago
- 2 comments
#1294 - fix 'Tuple' is not defined at sentencepiece_unigram.py
Pull Request -
State: closed - Opened by fenglui about 1 year ago
- 1 comment
#1293 - Bump semver from 5.7.1 to 5.7.2 in /bindings/node
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
- 1 comment
Labels: dependencies, javascript, Stale
#1292 - Update path name: master -> main
Pull Request -
State: closed - Opened by bact about 1 year ago
- 1 comment
#1291 - Bump tough-cookie from 4.0.0 to 4.1.3 in /bindings/node
Pull Request -
State: closed - Opened by dependabot[bot] about 1 year ago
Labels: dependencies, javascript
#1290 - Make correct padding for text generation with GPT-NEO
Issue -
State: closed - Opened by junoriosity about 1 year ago
- 3 comments
#1289 - revise type specification
Pull Request -
State: closed - Opened by hiroshi-matsuda-rit about 1 year ago
- 1 comment
#1288 - Revise type hint in SentencePieceUnigramTokenizer.__init()__
Pull Request -
State: closed - Opened by hiroshi-matsuda-rit about 1 year ago
#1287 - Question: Dose tokenizers.Tokenizer has pad() method like transformers PretrainedTokenizer?
Issue -
State: closed - Opened by ZeguanXiao about 1 year ago
- 2 comments
#1286 - Question: phoneme tokenizer?
Issue -
State: closed - Opened by pfeatherstone about 1 year ago
- 1 comment
#1285 - Compile fails when feature "onig" is off.
Issue -
State: closed - Opened by jafioti about 1 year ago
- 1 comment
#1284 - DataCollatorForLanguageModeling call of tokenizer.pad causes crash
Issue -
State: closed - Opened by condor-cp over 1 year ago
- 2 comments
#1283 - Bart custom tokenizer
Issue -
State: closed - Opened by BakingBrains over 1 year ago
- 5 comments
Labels: Stale
#1282 - [Question] When is "Ġ" added in the pre-tokenization step?
Issue -
State: closed - Opened by hkvision over 1 year ago
- 2 comments
#1281 - Incorrect treatment of special tokens added prior to training
Issue -
State: closed - Opened by mwaskom over 1 year ago
- 10 comments
Labels: Stale
#1280 - Seeding a tokenizer with initial tokens
Issue -
State: closed - Opened by BramVanroy over 1 year ago
- 3 comments
Labels: Stale
#1279 - Something went wrong about the result of the vocab extracted from a test example
Issue -
State: closed - Opened by wangze09 over 1 year ago
- 2 comments
#1278 - update tokenizer vocab
Issue -
State: closed - Opened by XDeepAzure over 1 year ago
- 3 comments
Labels: Stale
#1277 - Additional special tokens re-added after calling `train_new_from_iterator`.
Issue -
State: closed - Opened by Kinyugo over 1 year ago
- 8 comments
Labels: Stale
#1276 - pre_tokenizers/byte_level.rs lacking docs
Issue -
State: closed - Opened by steventrouble over 1 year ago
- 2 comments
Labels: Stale
#1275 - Improve error for truncation with too high stride
Pull Request -
State: closed - Opened by boyleconnor over 1 year ago
- 1 comment
#1274 - Include Extra Whitespace after Decode
Issue -
State: closed - Opened by karan-dalal over 1 year ago
- 2 comments
#1273 - [doc build] Use secrets
Pull Request -
State: closed - Opened by mishig25 over 1 year ago
- 1 comment
#1272 - Update README.md - Broken link
Pull Request -
State: closed - Opened by sbhavani over 1 year ago
#1271 - Update Cargo.toml for bindings
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 2 comments
Labels: Stale
#1270 - don't update added tokens map if token already in vocab
Pull Request -
State: closed - Opened by ArthurZucker over 1 year ago
- 2 comments
#1269 - Error message for too-high stride could be much clearer
Issue -
State: closed - Opened by boyleconnor over 1 year ago
- 4 comments
#1268 - Fixing broken link.
Pull Request -
State: closed - Opened by Narsil over 1 year ago
- 1 comment
#1267 - broken link at the documentation of tokenizers/quicktour
Issue -
State: closed - Opened by youkaichao over 1 year ago
- 1 comment
#1266 - Update Cargo.toml
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 2 comments
#1265 - how to use train_new_from_iterator() attribute with facebook/bart-base model tokenizer
Issue -
State: closed - Opened by Saitaruntulabandula over 1 year ago
- 3 comments
Labels: Stale
#1264 - fix documentation regarding regex
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 6 comments
#1263 - [Bug]? how does the tokenizer encode the special tokens?
Issue -
State: closed - Opened by vpegasus over 1 year ago
- 3 comments
#1262 - Parallelize (pretokenization)
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 2 comments
Labels: Stale
#1261 - Documentation Update Request: tokenizers.normalizers.Replace and tokenizers.Regex
Issue -
State: closed - Opened by piquan over 1 year ago
- 4 comments
Labels: Stale
#1260 - question about python test result of add_special_token function in tokenizer
Issue -
State: closed - Opened by DamonsJ over 1 year ago
- 2 comments
#1259 - Path to implementing --max_sentence_length
Issue -
State: closed - Opened by chris-ha458 over 1 year ago
- 7 comments
#1257 - Update unigram/trainer.rs
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 1 comment
#1256 - Update all GH Actions with dependency on actions/checkout
Pull Request -
State: closed - Opened by mfuntowicz over 1 year ago
- 1 comment
#1254 - Using vendored ssl instead of manylinux one
Pull Request -
State: closed - Opened by Narsil over 1 year ago
- 2 comments
#1253 - How to install this repo from source code
Issue -
State: closed - Opened by zheyuye over 1 year ago
- 3 comments
#1252 - Openssl version 1.0.1e detected in the whl file
Issue -
State: closed - Opened by zhzhping over 1 year ago
- 2 comments
Labels: Stale
#1251 - Makes `decode` and `decode_batch` work on borrowed content.
Pull Request -
State: closed - Opened by mfuntowicz over 1 year ago
- 1 comment
#1250 - [Bug] Inconsistent removal of leading and trailing whitespace for Metaspace pretokenizers
Issue -
State: closed - Opened by xenova over 1 year ago
- 11 comments
Labels: Stale
#1249 - conda
Issue -
State: closed - Opened by egilber over 1 year ago
#1248 - Replace deprecated command with environment file
Pull Request -
State: closed - Opened by jongwooo over 1 year ago
- 1 comment
Labels: Stale
#1247 - Replace deprecated `set-output` command with environment file
Issue -
State: closed - Opened by jongwooo over 1 year ago
- 1 comment
Labels: Stale
#1246 - Entire rewrite of node bindings.
Pull Request -
State: closed - Opened by Narsil over 1 year ago
- 5 comments
#1245 - Update Cargo.toml rust edition
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 3 comments
#1244 - fix unigram.rs test_sample()
Pull Request -
State: closed - Opened by chris-ha458 over 1 year ago
- 1 comment
#1243 - Bug with tokenizer.add_tokens()
Issue -
State: closed - Opened by liujuncn over 1 year ago
- 8 comments
Labels: Stale
#1242 - Fail trying to find sentence-transformer
Issue -
State: closed - Opened by rasrov over 1 year ago
- 3 comments
#1241 - AutoTokenizer is prepending something else compared to GPT2Tokenizer
Issue -
State: closed - Opened by kruthay over 1 year ago
- 3 comments
#1240 - Tokenizer Performance Comparision between the Python and the Rust version.
Issue -
State: closed - Opened by songkq over 1 year ago
- 4 comments
#1239 - Garbage value in Bertweet-large tokenizer's max_len_sentences_pair
Issue -
State: closed - Opened by adithya8 over 1 year ago
- 3 comments
#1238 - Creating a tokenizer for flaubert in rust
Issue -
State: closed - Opened by larochef over 1 year ago
- 8 comments
Labels: Stale
#1237 - LlamaTokenizer adds space when decoding `<s>`
Issue -
State: closed - Opened by lizelive over 1 year ago
- 5 comments
#1236 - Is SentencePieceBPETokenizer officially supported?
Issue -
State: closed - Opened by keunwoochoi over 1 year ago
- 9 comments
#1235 - Patched
Pull Request -
State: closed - Opened by Narsil over 1 year ago
- 1 comment
#1234 - [bug] BPE `roberta-large-mnli` saved with `.save_pretrained()` incorrectly sets `byte_fallback` to false (should be true)
Issue -
State: closed - Opened by xenova over 1 year ago
- 8 comments
#1233 - Fixing padding_left sequence_ids.
Pull Request -
State: closed - Opened by Narsil over 1 year ago
- 3 comments
#1232 - Make tokenizer return information about case of the word
Issue -
State: closed - Opened by maiiabocharova over 1 year ago
- 2 comments
#1231 - Npm error during tokenizers installation
Issue -
State: closed - Opened by Barbariskaa over 1 year ago
- 2 comments