Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / huggingface/tokenizers issues and pull requests

#1526 - Link to download the training text in `docs/source/quicktour.rst` is broken

Issue - State: closed - Opened by 14jdelap 6 months ago - 6 comments
Labels: Stale

#1525 - How to write custom Wordpiece class?

Issue - State: closed - Opened by xinyinan9527 6 months ago - 3 comments
Labels: Stale

#1524 - Convert huggingface tokenizer into sentencepiece format

Issue - State: closed - Opened by RRaphaell 6 months ago - 3 comments
Labels: Stale

#1523 - ❓Get stats (e.g. counts) about the merged pairs

Issue - State: closed - Opened by pietrolesci 7 months ago - 3 comments
Labels: Stale

#1522 - Error: Cannot find module 'tokenizers/bindings/tokenizer'

Issue - State: closed - Opened by meichangsu1 7 months ago - 1 comment
Labels: Stale

#1521 - remove enforcement of non special when adding tokens

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 2 comments

#1520 - Why are 'unknown' tokens randomly added to my tokenized input?

Issue - State: closed - Opened by tshmak 7 months ago - 2 comments

#1520 - Why are 'unknown' tokens randomly added to my tokenized input?

Issue - State: closed - Opened by tshmak 7 months ago - 2 comments

#1519 - Why the tokenizer is slower than tiktoken?

Issue - State: open - Opened by BigBinnie 7 months ago - 8 comments

#1519 - Why the tokenizer is slower than tiktoken?

Issue - State: open - Opened by BigBinnie 7 months ago - 5 comments

#1518 - Loading `tokenizer.model` with Rust API

Issue - State: open - Opened by EricLBuehler 7 months ago - 5 comments

#1518 - Loading `tokenizer.model` with Rust API

Issue - State: closed - Opened by EricLBuehler 7 months ago - 11 comments
Labels: Stale

#1518 - Loading `tokenizer.model` with Rust API

Issue - State: open - Opened by EricLBuehler 7 months ago - 10 comments

#1518 - Loading `tokenizer.model` with Rust API

Issue - State: open - Opened by EricLBuehler 7 months ago - 7 comments

#1517 - Llama3 tokenizer with Incorrect offset_mapping

Issue - State: open - Opened by justin-shao 7 months ago - 2 comments
Labels: Stale

#1517 - Llama3 tokenizer with Incorrect offset_mapping

Issue - State: open - Opened by justin-shao 7 months ago

#1517 - Llama3 tokenizer with Incorrect offset_mapping

Issue - State: closed - Opened by justin-shao 7 months ago - 3 comments
Labels: Stale

#1516 - Tokens Removed from Trained Custom BPE Tokenizer

Issue - State: closed - Opened by rteehas 7 months ago

#1516 - Tokens Removed from Trained Custom BPE Tokenizer

Issue - State: closed - Opened by rteehas 7 months ago

#1516 - Tokens Removed from Trained Custom BPE Tokenizer

Issue - State: closed - Opened by rteehas 7 months ago

#1515 - UnigramTrainer: byte_fallback is false.

Issue - State: open - Opened by Moddus 7 months ago - 3 comments
Labels: Feature Request, training

#1515 - UnigramTrainer: byte_fallback is false.

Issue - State: open - Opened by Moddus 7 months ago - 4 comments
Labels: Feature Request, training

#1514 - BPE Trainer doesn't respect the `vocab_size` parameter when dataset size is increased

Issue - State: closed - Opened by Abhinay1997 7 months ago - 3 comments
Labels: Stale

#1514 - BPE Trainer doesn't respect the `vocab_size` parameter when dataset size is increased

Issue - State: open - Opened by Abhinay1997 7 months ago - 2 comments
Labels: Stale

#1513 - [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder

Pull Request - State: open - Opened by Narsil 7 months ago - 2 comments

#1513 - [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder

Pull Request - State: closed - Opened by Narsil 7 months ago - 6 comments

#1512 - Breaking changes in v0.19.1 for tiktoken/llama3

Issue - State: closed - Opened by sanderland 7 months ago - 7 comments
Labels: Stale

#1512 - Breaking changes in v0.19.1 for tiktoken/llama3

Issue - State: closed - Opened by sanderland 7 months ago - 7 comments
Labels: Stale

#1511 - Fix "dictionnary" typo

Pull Request - State: open - Opened by nprisbrey 7 months ago

#1511 - Fix "dictionnary" typo

Pull Request - State: closed - Opened by nprisbrey 7 months ago - 3 comments

#1510 - change conditional compilation for regex libraries

Pull Request - State: open - Opened by semaraugusto 7 months ago

#1510 - change conditional compilation for regex libraries

Pull Request - State: closed - Opened by semaraugusto 7 months ago - 1 comment
Labels: Stale

#1509 - Cross-compilation fails for custom target

Issue - State: closed - Opened by semaraugusto 7 months ago - 3 comments
Labels: Stale

#1509 - Cross-compilation fails for custom target

Issue - State: closed - Opened by semaraugusto 7 months ago - 1 comment
Labels: Stale

#1508 - Add `.editorconfig` and `rustfmt.toml` for Consistent Code Formatting

Pull Request - State: closed - Opened by tal7aouy 7 months ago - 1 comment
Labels: Stale

#1507 - Treatment of hyphenated words

Issue - State: closed - Opened by rattle99 7 months ago - 2 comments
Labels: Stale

#1507 - Treatment of hyphenated words

Issue - State: closed - Opened by rattle99 7 months ago - 2 comments
Labels: Stale

#1507 - Treatment of hyphenated words

Issue - State: closed - Opened by rattle99 7 months ago - 1 comment
Labels: Stale

#1507 - Treatment of hyphenated words

Issue - State: closed - Opened by rattle99 7 months ago - 7 comments
Labels: Stale

#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens

Issue - State: closed - Opened by dwash96 7 months ago - 1 comment

#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens

Issue - State: closed - Opened by dwash96 7 months ago - 1 comment

#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens

Issue - State: closed - Opened by dwash96 7 months ago - 2 comments

#1506 - Python Binding: Tokenizer.from_file() cannot parse JSON file of tokens

Issue - State: closed - Opened by dwash96 7 months ago - 2 comments

#1505 - Failing to build bindings with 0.19.1

Issue - State: closed - Opened by bryteise 7 months ago - 7 comments
Labels: Stale

#1505 - Failing to build bindings with 0.19.1

Issue - State: closed - Opened by bryteise 7 months ago - 7 comments
Labels: Stale

#1505 - Failing to build bindings with 0.19.1

Issue - State: closed - Opened by bryteise 7 months ago - 6 comments
Labels: Stale

#1505 - Failing to build bindings with 0.19.1

Issue - State: open - Opened by bryteise 7 months ago - 1 comment

#1504 - add serialization for `ignore_merges`

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 1 comment

#1503 - corrected typo in the documentations for pre-tokenizers

Pull Request - State: closed - Opened by GorkaUrbizu 7 months ago
Labels: Stale

#1502 - offline installation

Issue - State: closed - Opened by HankLiu10 7 months ago - 3 comments
Labels: Stale

#1501 - Extended vocab tokenizer merging text into a single string without spaces while decoding

Issue - State: closed - Opened by savanth14 7 months ago - 4 comments
Labels: Stale

#1500 - Issue in installing rudalle on google colab, !pip install rudalle

Issue - State: closed - Opened by deepanshh786 7 months ago - 2 comments
Labels: Stale

#1499 - Fixing doc.

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1498 - Bumping all versions 3 times (ty transformers :) )

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1497 - Remove 3.13 (potential undefined behavior.)

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1497 - Remove 3.13 (potential undefined behavior.)

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1496 - StripAccents doesn't work

Issue - State: closed - Opened by NivinaNull 7 months ago - 1 comment
Labels: Stale

#1494 - PyO3 0.21.

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1494 - PyO3 0.21.

Pull Request - State: closed - Opened by Narsil 7 months ago - 1 comment

#1493 - Add more support for tiktoken based tokenizers

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 1 comment

#1493 - Add more support for tiktoken based tokenizers

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 1 comment

#1493 - Add more support for tiktoken based tokenizers

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 1 comment

#1493 - Add more support for tiktoken based tokenizers

Pull Request - State: closed - Opened by ArthurZucker 7 months ago - 1 comment

#1492 - Fix unsoundness in `tokenizers::utils::parallelism`

Pull Request - State: closed - Opened by albertsgarde 7 months ago - 4 comments

#1491 - Unsound use of unsafe in `src/utils/parallelism.rs`

Issue - State: closed - Opened by albertsgarde 7 months ago - 1 comment
Labels: Stale

#1489 - Discrepancy Between GitHub Release and NPM Package Version & Missing Dependencies

Issue - State: closed - Opened by superBertBerg 7 months ago - 5 comments
Labels: Stale

#1488 - Fix data directory for test

Pull Request - State: closed - Opened by atupone 7 months ago - 1 comment
Labels: Stale

#1487 - Is it possible to pass a tokenizer from Python into Rust?

Issue - State: closed - Opened by albertsgarde 8 months ago - 2 comments
Labels: Stale

#1486 - Fix Strip decoder doc comment

Pull Request - State: closed - Opened by jacklee1792 8 months ago
Labels: Stale

#1485 - error: casting `&T` to `&mut T` is undefined behavior

Issue - State: closed - Opened by Jipok 8 months ago - 10 comments
Labels: Stale

#1484 - Candidate release

Pull Request - State: closed - Opened by ArthurZucker 8 months ago - 1 comment

#1483 - fix: change var name from `vocab` to `vocab_file`

Pull Request - State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale

#1483 - fix: change var name from `vocab` to `vocab_file`

Pull Request - State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale

#1482 - fix: typo

Pull Request - State: closed - Opened by shenxiangzhuang 8 months ago
Labels: Stale

#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked

Issue - State: open - Opened by AngledLuffa 8 months ago - 1 comment

#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked

Issue - State: open - Opened by AngledLuffa 8 months ago - 1 comment

#1481 - `BertWordPieceTokenizer` not saving with `sep_token` marked

Issue - State: closed - Opened by AngledLuffa 8 months ago - 2 comments

#1480 - tokenizers-linux-x64-musl is not found when running inside node apline docker

Issue - State: closed - Opened by madhurjya-acko 8 months ago - 2 comments
Labels: Stale

#1479 - Bump express from 4.18.1 to 4.19.2 in /tokenizers/examples/unstable_wasm/www

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 2 comments
Labels: dependencies, javascript, Stale

#1479 - Bump express from 4.18.1 to 4.19.2 in /tokenizers/examples/unstable_wasm/www

Pull Request - State: open - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies, javascript, Stale

#1478 - Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /tokenizers/examples/unstable_wasm/www

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 2 comments
Labels: dependencies, javascript, Stale

#1478 - Bump webpack-dev-middleware from 5.3.3 to 5.3.4 in /tokenizers/examples/unstable_wasm/www

Pull Request - State: open - Opened by dependabot[bot] 8 months ago - 1 comment
Labels: dependencies, javascript, Stale

#1476 - Refactor metaspace

Pull Request - State: closed - Opened by ArthurZucker 8 months ago - 7 comments

#1475 - Issue merging across whitespaces

Issue - State: closed - Opened by henrycharlesworth 8 months ago - 2 comments
Labels: Stale

#1474 - BPE Decoder cleanup option

Issue - State: closed - Opened by w-zygmuntowicz 8 months ago - 2 comments
Labels: Stale

#1473 - Assign `<unusedXX>` tokens with `special_tokens` without growing vocab size

Issue - State: closed - Opened by jacobwjs 8 months ago - 6 comments
Labels: Stale, Feature Request, planned

#1472 - Bump follow-redirects from 1.15.4 to 1.15.6 in /tokenizers/examples/unstable_wasm/www

Pull Request - State: closed - Opened by dependabot[bot] 8 months ago - 2 comments
Labels: dependencies, javascript, Stale

#1471 - Train tokenizer on integer lists, not strings

Issue - State: closed - Opened by rteehas 8 months ago - 7 comments
Labels: Stale

#1470 - Tokens display issues

Issue - State: closed - Opened by jordane95 8 months ago - 2 comments
Labels: Stale