Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/tokenizers issues and pull requests
#1526 - Link to download the training text in `docs/source/quicktour.rst` is broken
Issue -
State: closed - Opened by 14jdelap 6 months ago
- 6 comments
Labels: Stale
#1525 - How to write custom Wordpiece class?
Issue -
State: closed - Opened by xinyinan9527 6 months ago
- 3 comments
Labels: Stale
#1524 - Convert huggingface tokenizer into sentencepiece format
Issue -
State: closed - Opened by RRaphaell 6 months ago
- 3 comments
Labels: Stale
#1523 - ❓Get stats (e.g. counts) about the merged pairs
Issue -
State: closed - Opened by pietrolesci 7 months ago
- 3 comments
Labels: Stale
#1522 - Error: Cannot find module 'tokenizers/bindings/tokenizer'
Issue -
State: open - Opened by meichangsu1 7 months ago
#1522 - Error: Cannot find module 'tokenizers/bindings/tokenizer'
Issue -
State: closed - Opened by meichangsu1 7 months ago
- 1 comment
Labels: Stale
#1521 - remove enforcement of non special when adding tokens
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 2 comments
#1520 - Why are 'unknown' tokens randomly added to my tokenized input?
Issue -
State: closed - Opened by tshmak 7 months ago
- 2 comments
#1520 - Why are 'unknown' tokens randomly added to my tokenized input?
Issue -
State: closed - Opened by tshmak 7 months ago
- 2 comments
#1519 - Why the tokenizer is slower than tiktoken?
Issue -
State: open - Opened by BigBinnie 7 months ago
- 8 comments
#1519 - Why the tokenizer is slower than tiktoken?
Issue -
State: open - Opened by BigBinnie 7 months ago
- 5 comments
#1518 - Loading `tokenizer.model` with Rust API
Issue -
State: open - Opened by EricLBuehler 7 months ago
- 5 comments
#1518 - Loading `tokenizer.model` with Rust API
Issue -
State: closed - Opened by EricLBuehler 7 months ago
- 11 comments
Labels: Stale
#1518 - Loading `tokenizer.model` with Rust API
Issue -
State: open - Opened by EricLBuehler 7 months ago
- 10 comments
#1518 - Loading `tokenizer.model` with Rust API
Issue -
State: open - Opened by EricLBuehler 7 months ago
- 7 comments
#1517 - Llama3 tokenizer with Incorrect offset_mapping
Issue -
State: open - Opened by justin-shao 7 months ago
- 2 comments
Labels: Stale
#1517 - Llama3 tokenizer with Incorrect offset_mapping
Issue -
State: open - Opened by justin-shao 7 months ago
#1517 - Llama3 tokenizer with Incorrect offset_mapping
Issue -
State: closed - Opened by justin-shao 7 months ago
- 3 comments
Labels: Stale
#1516 - Tokens Removed from Trained Custom BPE Tokenizer
Issue -
State: closed - Opened by rteehas 7 months ago
#1516 - Tokens Removed from Trained Custom BPE Tokenizer
Issue -
State: closed - Opened by rteehas 7 months ago
#1516 - Tokens Removed from Trained Custom BPE Tokenizer
Issue -
State: closed - Opened by rteehas 7 months ago
#1515 - UnigramTrainer: byte_fallback is false.
Issue -
State: open - Opened by Moddus 7 months ago
- 3 comments
Labels: Feature Request, training
#1515 - UnigramTrainer: byte_fallback is false.
Issue -
State: open - Opened by Moddus 7 months ago
- 4 comments
Labels: Feature Request, training
#1514 - BPE Trainer doesn't respect the `vocab_size` parameter when dataset size is increased
Issue -
State: closed - Opened by Abhinay1997 7 months ago
- 3 comments
Labels: Stale
#1514 - BPE Trainer doesn't respect the `vocab_size` parameter when dataset size is increased
Issue -
State: open - Opened by Abhinay1997 7 months ago
- 1 comment
#1514 - BPE Trainer doesn't respect the `vocab_size` parameter when dataset size is increased
Issue -
State: open - Opened by Abhinay1997 7 months ago
- 2 comments
Labels: Stale
#1513 - [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder
Pull Request -
State: open - Opened by Narsil 7 months ago
- 2 comments
#1513 - [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 6 comments
#1512 - Breaking changes in v0.19.1 for tiktoken/llama3
Issue -
State: closed - Opened by sanderland 7 months ago
- 7 comments
Labels: Stale
#1512 - Breaking changes in v0.19.1 for tiktoken/llama3
Issue -
State: closed - Opened by sanderland 7 months ago
- 7 comments
Labels: Stale
#1511 - Fix "dictionnary" typo
Pull Request -
State: open - Opened by nprisbrey 7 months ago
#1511 - Fix "dictionnary" typo
Pull Request -
State: closed - Opened by nprisbrey 7 months ago
- 3 comments
#1510 - change conditional compilation for regex libraries
Pull Request -
State: open - Opened by semaraugusto 7 months ago
#1510 - change conditional compilation for regex libraries
Pull Request -
State: closed - Opened by semaraugusto 7 months ago
- 1 comment
Labels: Stale
#1509 - Cross-compilation fails for custom target
Issue -
State: closed - Opened by semaraugusto 7 months ago
- 3 comments
Labels: Stale
#1509 - Cross-compilation fails for custom target
Issue -
State: closed - Opened by semaraugusto 7 months ago
- 1 comment
Labels: Stale
#1508 - Add `.editorconfig` and `rustfmt.toml` for Consistent Code Formatting
Pull Request -
State: closed - Opened by tal7aouy 7 months ago
- 1 comment
Labels: Stale
#1508 - Add `.editorconfig` and `rustfmt.toml` for Consistent Code Formatting
Pull Request -
State: open - Opened by tal7aouy 7 months ago
#1508 - Add `.editorconfig` and `rustfmt.toml` for Consistent Code Formatting
Pull Request -
State: open - Opened by tal7aouy 7 months ago
#1507 - Treatment of hyphenated words
Issue -
State: closed - Opened by rattle99 7 months ago
- 2 comments
Labels: Stale
#1507 - Treatment of hyphenated words
Issue -
State: closed - Opened by rattle99 7 months ago
- 2 comments
Labels: Stale
#1507 - Treatment of hyphenated words
Issue -
State: closed - Opened by rattle99 7 months ago
- 1 comment
Labels: Stale
#1507 - Treatment of hyphenated words
Issue -
State: closed - Opened by rattle99 7 months ago
- 7 comments
Labels: Stale
#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens
Issue -
State: closed - Opened by dwash96 7 months ago
- 1 comment
#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens
Issue -
State: closed - Opened by dwash96 7 months ago
- 1 comment
#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens
Issue -
State: closed - Opened by dwash96 7 months ago
- 2 comments
#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens
Issue -
State: closed - Opened by dwash96 7 months ago
- 2 comments
#1505 - Failing to build bindings with 0.19.1
Issue -
State: closed - Opened by bryteise 7 months ago
- 7 comments
Labels: Stale
#1505 - Failing to build bindings with 0.19.1
Issue -
State: closed - Opened by bryteise 7 months ago
- 7 comments
Labels: Stale
#1505 - Failing to build bindings with 0.19.1
Issue -
State: closed - Opened by bryteise 7 months ago
- 6 comments
Labels: Stale
#1505 - Failing to build bindings with 0.19.1
Issue -
State: open - Opened by bryteise 7 months ago
- 1 comment
#1504 - add serialization for `ignore_merges`
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 1 comment
#1503 - corrected typo in the documentations for pre-tokenizers
Pull Request -
State: closed - Opened by GorkaUrbizu 7 months ago
Labels: Stale
#1502 - offline installation
Issue -
State: closed - Opened by HankLiu10 7 months ago
- 3 comments
Labels: Stale
#1501 - Extended vocab tokenizer merging text into a single string without spaces while decoding
Issue -
State: closed - Opened by savanth14 7 months ago
- 4 comments
Labels: Stale
#1500 - Issue in installing rudalle on google colab, !pip install rudalle
Issue -
State: closed - Opened by deepanshh786 7 months ago
- 2 comments
Labels: Stale
#1499 - Fixing doc.
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1498 - Bumping all versions 3 times (ty transformers :) )
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1497 - Remove 3.13 (potential undefined behavior.)
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1497 - Remove 3.13 (potential undefined behavior.)
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1496 - StripAccents doesn't work
Issue -
State: closed - Opened by NivinaNull 7 months ago
- 1 comment
Labels: Stale
#1495 - LLamaTokenizer with `use_fast=True` / and `use_fast=False` causing memory leak when used with multiprocessing / `dataset.map(num_proc)`
Issue -
State: open - Opened by michaelfeil 7 months ago
- 6 comments
#1495 - LLamaTokenizer with `use_fast=True` / and `use_fast=False` causing memory leak when used with multiprocessing / `dataset.map(num_proc)`
Issue -
State: open - Opened by michaelfeil 7 months ago
- 7 comments
#1495 - LLamaTokenizer with `use_fast=True` / and `use_fast=False` causing memory leak when used with multiprocessing / `dataset.map(num_proc)`
Issue -
State: open - Opened by michaelfeil 7 months ago
- 12 comments
#1494 - PyO3 0.21.
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1494 - PyO3 0.21.
Pull Request -
State: closed - Opened by Narsil 7 months ago
- 1 comment
#1493 - Add more support for tiktoken based tokenizers
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 1 comment
#1493 - Add more support for tiktoken based tokenizers
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 1 comment
#1493 - Add more support for tiktoken based tokenizers
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 1 comment
#1493 - Add more support for tiktoken based tokenizers
Pull Request -
State: closed - Opened by ArthurZucker 7 months ago
- 1 comment
#1492 - Fix unsoundness in `tokenizers::utils::parallelism`
Pull Request -
State: closed - Opened by albertsgarde 7 months ago
- 4 comments
#1491 - Unsound use of unsafe in `src/utils/parallelism.rs`
Issue -
State: closed - Opened by albertsgarde 7 months ago
- 1 comment
Labels: Stale
#1490 - Deepseeker model completely loses performance after using tokenizer.add_tokens(special_tokens)
Issue -
State: closed - Opened by bin123apple 7 months ago
- 2 comments
Labels: Stale
#1489 - Discrepancy Between GitHub Release and NPM Package Version & Missing Dependencies
Issue -
State: closed - Opened by superBertBerg 7 months ago
- 5 comments
Labels: Stale
#1489 - Discrepancy Between GitHub Release and NPM Package Version & Missing Dependencies
Issue -
State: open - Opened by superBertBerg 7 months ago
- 4 comments
#1488 - Fix data directory for test
Pull Request -
State: closed - Opened by atupone 7 months ago
- 1 comment
Labels: Stale
#1487 - Is it possible to pass a tokenizer from Python into Rust?
Issue -
State: closed - Opened by albertsgarde 8 months ago
- 2 comments
Labels: Stale
#1486 - Fix Strip decoder doc comment
Pull Request -
State: closed - Opened by jacklee1792 8 months ago
Labels: Stale
#1485 - error: casting `&T` to `&mut T` is undefined behavior
Issue -
State: closed - Opened by Jipok 8 months ago
- 10 comments
Labels: Stale
#1484 - Candidate release
Pull Request -
State: closed - Opened by ArthurZucker 8 months ago
- 1 comment
#1483 - fix: change var name from `vocab` to `vocab_file`
Pull Request -
State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale
#1483 - fix: change var name from `vocab` to `vocab_file`
Pull Request -
State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale
#1482 - fix: typo
Pull Request -
State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale
#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked
Issue -
State: open - Opened by AngledLuffa 8 months ago
- 1 comment
#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked
Issue -
State: open - Opened by AngledLuffa 8 months ago
- 1 comment
#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked
Issue -
State: closed - Opened by AngledLuffa 8 months ago
- 2 comments
#1480 - tokenizers-linux-x64-musl is not found when running inside node apline docker
Issue -
State: closed - Opened by madhurjya-acko 8 months ago
- 2 comments
Labels: Stale
#1479 - Bump express from 4.18.1 to 4.19.2 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies, javascript, Stale
#1479 - Bump express from 4.18.1 to 4.19.2 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies, javascript, Stale
#1478 - Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies, javascript, Stale
#1478 - Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] 8 months ago
- 1 comment
Labels: dependencies, javascript, Stale
#1477 - `cargo build` fails for python bindings when `--locked` is passed for `v0.15.1` and `v0.15.2`
Issue -
State: closed - Opened by CobaltCause 8 months ago
- 3 comments
Labels: Stale
#1477 - `cargo build` fails for python bindings when `--locked` is passed for `v0.15.1` and `v0.15.2`
Issue -
State: closed - Opened by CobaltCause 8 months ago
- 4 comments
Labels: Stale
#1476 - Refactor metaspace
Pull Request -
State: closed - Opened by ArthurZucker 8 months ago
- 7 comments
#1475 - Issue merging across whitespaces
Issue -
State: closed - Opened by henrycharlesworth 8 months ago
- 2 comments
Labels: Stale
#1474 - BPE Decoder cleanup option
Issue -
State: closed - Opened by w-zygmuntowicz 8 months ago
- 2 comments
Labels: Stale
#1473 - Assign `<unusedXX>` tokens with `special_tokens` without growing vocab size
Issue -
State: closed - Opened by jacobwjs 8 months ago
- 6 comments
Labels: Stale, Feature Request, planned
#1472 - Bump follow-redirects from 1.15.4 to 1.15.6 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: closed - Opened by dependabot[bot] 8 months ago
- 2 comments
Labels: dependencies, javascript, Stale
#1471 - Train tokenizer on integer lists, not strings
Issue -
State: closed - Opened by rteehas 8 months ago
- 7 comments
Labels: Stale
#1470 - Tokens display issues
Issue -
State: closed - Opened by jordane95 8 months ago
- 2 comments
Labels: Stale