Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / huggingface/tokenizers issues and pull requests
#1638 - Fix off-by-one error in tokenizer::normalizer::Range::len
Pull Request -
State: open - Opened by rlanday about 20 hours ago
#1637 - Adding tokens to a tokenizer with subword support?
Issue -
State: open - Opened by noamgat 2 days ago
#1636 - NormalizedString.clear() broken?
Issue -
State: open - Opened by lkurlandski 5 days ago
- 1 comment
Labels: bug
#1635 - Adding many AddedTokens makes loading a tokenizer extremely slow.
Issue -
State: open - Opened by stephantul 5 days ago
#1634 - Cannot inject custom PreTokenizer into Tokenizer
Issue -
State: open - Opened by Old-Shatterhand 6 days ago
- 6 comments
#1633 - README.md contains non-functional code
Issue -
State: open - Opened by ahenkes1 11 days ago
- 2 comments
#1632 - style: simplify string formatting for readability
Pull Request -
State: open - Opened by hamirmahal 12 days ago
#1631 - Bump send and express in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] 13 days ago
Labels: dependencies, javascript
#1630 - Bump serve-static and express in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] 13 days ago
Labels: dependencies, javascript
#1629 - Bump body-parser and express in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] 14 days ago
Labels: dependencies, javascript
#1628 - Access utf-8 byte sequence for each token
Issue -
State: open - Opened by DanielHesslow 21 days ago
- 2 comments
#1627 - Rust: How to handle models with `precompiled_charsmap = null`
Issue -
State: open - Opened by kallebysantos 26 days ago
- 1 comment
#1626 - Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows
Pull Request -
State: open - Opened by dependabot[bot] 26 days ago
- 1 comment
Labels: dependencies, github_actions
#1625 - Tokenizer Quickstart Tutorial: Broken Links
Issue -
State: open - Opened by SinaMostafanejad 26 days ago
#1624 - Special token gets tokenized while training tokenizer from scratch
Issue -
State: open - Opened by LalchandPandia 28 days ago
- 1 comment
#1623 - STATUS_ENTRYPOINT_NOT_FOUND
Issue -
State: open - Opened by impurity-dev 28 days ago
- 1 comment
#1622 - Bump webpack from 5.76.0 to 5.94.0 in /tokenizers/examples/unstable_wasm/www
Pull Request -
State: open - Opened by dependabot[bot] about 1 month ago
- 1 comment
Labels: dependencies, javascript
#1621 - Arg name correction: auth_token -> token
Pull Request -
State: open - Opened by rravenel about 1 month ago
- 3 comments
#1620 - PreTrainedTokenizerFast `char_to_token` `token_to_char` not working as expected
Issue -
State: open - Opened by yonigottesman about 2 months ago
- 5 comments
Labels: bug
#1619 - ModuleNotFoundError: No module named 'tokenizers.tokenizers'
Issue -
State: open - Opened by jpferraro1 about 1 month ago
- 6 comments
#1618 - [WIP] free speed/mem optimizations with ahash, dary_heap, and compact_str
Pull Request -
State: open - Opened by mjbommar about 1 month ago
#1617 - 🚨 breaking: Fix training with special tokens
Pull Request -
State: open - Opened by ArthurZucker about 1 month ago
- 2 comments
#1617 - 🚨 breaking: Fix training with special tokens
Pull Request -
State: open - Opened by ArthurZucker about 1 month ago
- 2 comments
#1617 - 🚨 breaking: Fix training with special tokens
Pull Request -
State: open - Opened by ArthurZucker about 1 month ago
- 2 comments
#1617 - 🚨 breaking: Fix training with special tokens
Pull Request -
State: open - Opened by ArthurZucker about 1 month ago
- 2 comments
#1616 - BPE trainer ignoring special tokens.
Issue -
State: open - Opened by henrycharlesworth about 1 month ago
- 3 comments
#1615 - .NET bindings
Issue -
State: open - Opened by sappho192 about 2 months ago
#1614 - Can I use SentencePieceBPETokenizer to replace google/sentencepiece?
Issue -
State: closed - Opened by npuichigo about 2 months ago
- 6 comments
#1613 - Space after unnormalized token is added when `use_fast=True` for Llama tokenizer
Issue -
State: open - Opened by Butanium about 2 months ago
- 10 comments
#1612 - `RefMutContainer` is unsound
Issue -
State: open - Opened by CheaterCodes about 2 months ago
- 3 comments
#1611 - [test-infra] Enable Codecov for tokenizers
Issue -
State: open - Opened by hvaara about 2 months ago
#1610 - fix benchmark file link
Pull Request -
State: closed - Opened by 152334H about 2 months ago
- 1 comment
#1609 - Token ID Out of Range & Indexing Assertion Errors During Training
Issue -
State: closed - Opened by haseebrj17 about 2 months ago
- 4 comments
#1608 - Update README.md
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1607 - Fix CI
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1606 - Candidate release
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1605 - Fast regex
Pull Request -
State: open - Opened by ArthurZucker about 2 months ago
- 1 comment
#1604 - Tests + Deserialization improvement for normalizers.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1603 - add deserialize for pre tokenizers
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1602 - Fix strip python type
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1601 - Support for Golang now or support a cli for other languages?
Issue -
State: open - Opened by xuxiaoxia96 about 2 months ago
- 2 comments
#1600 - Add test normalizers
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1599 - Improve decoder deserialization
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1598 - Adding a few tests for decoder deserialization.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1597 - Add-legacy-tests
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1596 - ValueError: The following `model_kwargs` are not used by the model: ['num_beans'] (note: typos in the generate arguments will also show up in this list)[06/Aug/2024 14:34:35]
Issue -
State: open - Opened by navdeep8990 about 2 months ago
- 1 comment
#1595 - Better serialization error
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1594 - Adding some serialization testing around the wrapper.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1593 - Fixing release CI strict (taken from safetensors).
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1592 - Better serialization and deserialization error
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1591 - Fix doc about split
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 1 comment
#1590 - Support `None` to reset pre_tokenizers and normalizers, and index sequences
Pull Request -
State: closed - Opened by ArthurZucker about 2 months ago
- 2 comments
#1589 - Recursive ellipsis for serde_pyo3
Pull Request -
State: closed - Opened by EricLBuehler about 2 months ago
- 2 comments
#1588 - Using serde (serde_pyo3) to get __str__ and __repr__ easily.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1587 - Perf improvement 16% by removing offsets.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1586 - Enable fancy regex
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1585 - Tiny improvement
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1584 - Fixing benchmark2.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 2 comments
#1583 - Fixing the benchmark.
Pull Request -
State: closed - Opened by Narsil about 2 months ago
- 1 comment
#1582 - Add benchmark vs tiktoken
Pull Request -
State: closed - Opened by Narsil 2 months ago
- 1 comment
#1581 - [building on windows] onig_sys/oniguruma two or more data types in declaration specifiers
Issue -
State: open - Opened by louis030195 2 months ago
- 2 comments
#1580 - Fix clippy + feature test management.
Pull Request -
State: closed - Opened by Narsil 2 months ago
- 1 comment
#1579 - Risk of global variable memory leaks when calling train_from_iterator
Issue -
State: open - Opened by Yikai-Liao 2 months ago
- 1 comment
Labels: Stale
#1579 - Risk of global variable memory leaks when calling train_from_iterator
Issue -
State: open - Opened by Yikai-Liao 2 months ago
#1578 - return pytorch tensors like in transformers?
Issue -
State: closed - Opened by PaulLerner 2 months ago
- 5 comments
#1577 - `train_from_iterator` out of memory on WMT14 `de` dataset
Issue -
State: closed - Opened by Kami-chanw 2 months ago
- 2 comments
#1577 - `train_from_iterator` out of memory on WMT14 `de` dataset
Issue -
State: closed - Opened by Kami-chanw 2 months ago
- 1 comment
#1577 - `train_from_iterator` out of memory on WMT14 `de` dataset
Issue -
State: closed - Opened by Kami-chanw 2 months ago
- 1 comment
#1576 - Issue with `SentencePieceUnigramTokenizer` Handling Unknown Tokens
Issue -
State: open - Opened by Munikumar09 2 months ago
#1576 - Issue with `SentencePieceUnigramTokenizer` Handling Unknown Tokens
Issue -
State: open - Opened by Munikumar09 2 months ago
- 1 comment
#1576 - Issue with `SentencePieceUnigramTokenizer` Handling Unknown Tokens
Issue -
State: open - Opened by Munikumar09 2 months ago
#1576 - Issue with `SentencePieceUnigramTokenizer` Handling Unknown Tokens
Issue -
State: open - Opened by Munikumar09 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1575 - apply_chat_template api usage consult
Issue -
State: open - Opened by FanZhang91 2 months ago
#1574 - Use pyo3 smd v0.21
Pull Request -
State: closed - Opened by EricLBuehler 2 months ago
- 1 comment
#1574 - Use pyo3 smd v0.21
Pull Request -
State: open - Opened by EricLBuehler 2 months ago
- 1 comment
#1574 - Use pyo3 smd v0.21
Pull Request -
State: open - Opened by EricLBuehler 2 months ago
- 1 comment
#1574 - Use pyo3 smd v0.21
Pull Request -
State: open - Opened by EricLBuehler 2 months ago
- 1 comment
#1574 - Use pyo3 smd v0.21
Pull Request -
State: open - Opened by EricLBuehler 2 months ago
- 1 comment
#1573 - Truncation performs slowly. Tokenizer firstly encodes long sequence and then truncates it.
Issue -
State: open - Opened by galtimur 2 months ago
#1573 - Truncation performs slowly. Tokenizer firstly encodes long sequence and then truncates it.
Issue -
State: open - Opened by galtimur 2 months ago
- 2 comments
Labels: Feature Request
#1572 - BPE Split pretokenization rule is not reflected in the vocabulary
Issue -
State: closed - Opened by meliksahturker 2 months ago
- 2 comments
#1571 - Bump spm_precompiled to 0.1.3
Pull Request -
State: closed - Opened by MikeIvanichev 3 months ago
- 4 comments
#1570 - [Feature] support Assign token to update the content of a token
Pull Request -
State: open - Opened by ArthurZucker 3 months ago
- 1 comment
#1570 - [Feature] support Assign token to update the content of a token
Pull Request -
State: open - Opened by ArthurZucker 3 months ago
- 1 comment
#1570 - [Feature] support Assign token to update the content of a token
Pull Request -
State: open - Opened by ArthurZucker 3 months ago
- 3 comments
Labels: Stale
#1569 - Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
#1569 - Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
#1569 - Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
#1569 - Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
#1569 - Revert "[BREAKING CHANGE] Ignore added_tokens (both special and not) …
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
#1568 - [Fix metaspace prepending scheme] ⛓️💥⛓️💥
Pull Request -
State: open - Opened by ArthurZucker 3 months ago
- 1 comment
#1568 - [Fix metaspace prepending scheme] ⛓️💥⛓️💥
Pull Request -
State: closed - Opened by ArthurZucker 3 months ago
- 1 comment
Labels: Stale
#1567 - Tokenizer.from_bytes() not available in python bindings
Issue -
State: open - Opened by RamvigneshPasupathy 3 months ago
- 2 comments
Labels: Feature Request
#1567 - Tokenizer.from_bytes() not available in python bindings
Issue -
State: closed - Opened by RamvigneshPasupathy 3 months ago
- 4 comments
Labels: Stale, Feature Request
#1566 - Custom fast PreTokenizer, ported via PyO3 to Python
Issue -
State: open - Opened by vandrw 3 months ago
- 2 comments