Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / bhavnicksm/chonkie issues and pull requests
#82 - Bump version to v0.2.2 for release
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#81 - Expose the seperation delim for simple multilingual chunking
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#80 - [Fix] Unify dataclasses under a types.py for ease
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#79 - [FEAT] Add "auto" threshold configuration via Statistical analysis in SemanticChunker + minor fixes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement
#78 - Add support for BaseRefinery and OverlapRefinery + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#77 - [FEAT] Add BaseRefinery and OverlapRefinery support
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#76 - [FEAT]Regarding the implementation principles of SemanticChunker and some flexibility requirements
Issue -
State: closed - Opened by RemixaWorld 3 months ago
- 5 comments
Labels: enhancement
#75 - Update the docs path to docs.chonkie.ai
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#74 - docs: update README.md
Pull Request -
State: open - Opened by eltociear 3 months ago
#73 - [BUG] WordChunker's `chunk_batch` function fail
Issue -
State: open - Opened by kime541200 3 months ago
- 1 comment
Labels: bug
#72 - Add TEVL to speed-up sentence chunking
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#71 - Add TEVL to speed up sentence chunker
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#71 - Add TEVL to speed up sentence chunker
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#70 - [Fix] Allow for functions as token_counters in BaseChunkers
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#70 - [Fix] Allow for functions as token_counters in BaseChunkers
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#69 - Add support for automated testing with Github Actions
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#69 - Add support for automated testing with Github Actions
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#68 - Add `min_chunk_size` to SDPMChunker + Lint codebase with ruff + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: chore
#68 - Add `min_chunk_size` to SDPMChunker + Lint codebase with ruff + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: chore
#67 - [BUG] example code for WordChunker is not working
Issue -
State: closed - Opened by mozz85 3 months ago
- 3 comments
Labels: bug
#67 - [BUG] example code for WordChunker is not working
Issue -
State: closed - Opened by mozz85 3 months ago
- 3 comments
Labels: bug
#66 - Added automated testing using Github Actions
Pull Request -
State: closed - Opened by pratyushmittal 3 months ago
- 3 comments
#66 - Added automated testing using Github Actions
Pull Request -
State: closed - Opened by pratyushmittal 3 months ago
- 3 comments
#65 - Fixed similarity_percentile with sdpm chunker + added test
Pull Request -
State: closed - Opened by pratyushmittal 3 months ago
- 4 comments
#64 - [BUG] EmbeddingsRegistry custom tokenizer does not work
Issue -
State: open - Opened by rsharma-autessa 3 months ago
- 2 comments
Labels: bug
#64 - [BUG] EmbeddingsRegistry custom tokenizer does not work
Issue -
State: open - Opened by rsharma-autessa 3 months ago
- 2 comments
Labels: bug
#63 - [Update] Change default embedding model in SemanticChunkers
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#62 - [Update] Bump version to 0.2.1.post1 and require Python 3.9 or higher
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#61 - [help] failed on SemanticChunker's example
Issue -
State: closed - Opened by mozz85 3 months ago
- 14 comments
Labels: bug
#60 - [Refactor] Add min_chunk_size parameter to SemanticChunker and SentenceChunker
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#59 - [BUG] SDPM & Semantic Chunking Example not working
Issue -
State: closed - Opened by regstuff 3 months ago
- 2 comments
Labels: bug
#58 - [Fix] Add fix for #55
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#57 - [Fix] AutoEmbeddings not loading `all-minilm-l6-v2` but loads `All-MiniLM-L6-V2`
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#56 - Update DOCS.md - fixed embeddings path after recent change
Pull Request -
State: closed - Opened by pratyushmittal 3 months ago
- 3 comments
#55 - [BUG] Newlines are not removed after pre-processing in SemanticChunker
Issue -
State: closed - Opened by Pringled 3 months ago
- 3 comments
Labels: bug
#54 - [Refactor] Optimize similarity calculation by using np.divide for imp…
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#53 - [Fix] Refactor WordChunker, SentenceChunker pre-chunk splitting for reconstruction tests + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#52 - [Fix] Token counts from Tokenizers and Transformers adding special tokens
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement
#51 - [fix] Reorganize optional dependencies in pyproject.toml: rename 'sem…
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#50 - [DISC] Benchmarking Chonkie Mega-Thread
Issue -
State: open - Opened by bhavnicksm 3 months ago
- 1 comment
Labels: documentation, enhancement
#49 - [FEAT] Add support for Model2VecEmbeddings + Switch default embeddings to Model2VecEmbeddings
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement
#48 - Reconstruction Test
Pull Request -
State: closed - Opened by mrmps 3 months ago
- 3 comments
#47 - [DOCS] Add info about initial embeddings support and how to add custom embeddings
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#46 - Add initial OpenAIEmbeddings support to Chonkie ✨
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#45 - Refactor BaseChunker, SemanticChunker and SDPMChunker to support BaseEmbeddings
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#44 - [FEAT] Add SentenceTransformerEmbeddings, EmbeddingsRegistry and AutoEmbeddings provider support
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement
#43 - [DISC] Improving Documentation
Issue -
State: open - Opened by bhavnicksm 3 months ago
- 4 comments
Labels: documentation, enhancement, help wanted
#42 - [BUG] Chunkers failing the test of recronstruction
Issue -
State: closed - Opened by mrmps 3 months ago
- 6 comments
Labels: bug
#41 - [FEAT] - Add model2vec embedding models
Pull Request -
State: closed - Opened by sky-2002 3 months ago
- 15 comments
Labels: enhancement
#41 - [FEAT] - Add model2vec embedding models
Pull Request -
State: closed - Opened by sky-2002 3 months ago
- 15 comments
Labels: enhancement
#40 - [FEAT] Min chunk size (for semantic chunkers)
Issue -
State: closed - Opened by kbarendrecht 3 months ago
- 2 comments
Labels: enhancement
#40 - [FEAT] Min chunk size (for semantic chunkers)
Issue -
State: open - Opened by kbarendrecht 3 months ago
- 1 comment
Labels: enhancement
#39 - [FEAT] Add async support to SDPMChunker and to SemanticChunker
Issue -
State: open - Opened by rodion-m 3 months ago
- 7 comments
Labels: enhancement
#39 - [FEAT] Add async support to SDPMChunker and to SemanticChunker
Issue -
State: open - Opened by rodion-m 3 months ago
- 5 comments
Labels: enhancement
#38 - [FEAT] Add an ability to use OpenAI / VoyageAI / Cohere embeddings with SDPMChunker via LiteLLM
Issue -
State: open - Opened by rodion-m 3 months ago
- 5 comments
Labels: enhancement
#38 - [FEAT] Add an ability to use OpenAI / VoyageAI / Cohere embeddings with SDPMChunker via LiteLLM
Issue -
State: open - Opened by rodion-m 3 months ago
- 5 comments
Labels: enhancement
#37 - [BUG] start_index and end_index inaccurate for repetitive text chunks
Issue -
State: open - Opened by bhavnicksm 3 months ago
Labels: bug
#37 - [BUG] start_index and end_index inaccurate for repetitive text chunks
Issue -
State: open - Opened by bhavnicksm 3 months ago
Labels: bug
#36 - [FEAT] Allow configuring backend for Sentence_Transformers (e.g. ONNX, openVINO)
Issue -
State: closed - Opened by kbarendrecht 3 months ago
- 3 comments
Labels: enhancement
#36 - [FEAT] Allow configuring backend for Sentence_Transformers (e.g. ONNX, openVINO)
Issue -
State: closed - Opened by kbarendrecht 3 months ago
- 3 comments
Labels: enhancement
#35 - Bump version to 0.2.0.post1 in pyproject.toml and __init__.py
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#35 - Bump version to 0.2.0.post1 in pyproject.toml and __init__.py
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#34 - Use `__slots__` instead of `slots=True` for python3.9 support
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#34 - Use `__slots__` instead of `slots=True` for python3.9 support
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#33 - [BUG] TypeError: dataclass() got an unexpected keyword argument 'slots'
Issue -
State: closed - Opened by AgentT30 3 months ago
- 2 comments
Labels: bug
#33 - [BUG] TypeError: dataclass() got an unexpected keyword argument 'slots'
Issue -
State: closed - Opened by AgentT30 3 months ago
- 2 comments
Labels: bug
#32 - Major Update: Fix bugs + Update docs + Add slots to dataclasses + update word & sentence splitting logic + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#32 - Major Update: Fix bugs + Update docs + Add slots to dataclasses + update word & sentence splitting logic + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#31 - [BUG]pyo3_runtime.PanicException: no entry found for key
Issue -
State: closed - Opened by wbbeyourself 3 months ago
- 4 comments
Labels: bug
#31 - [BUG]pyo3_runtime.PanicException: no entry found for key
Issue -
State: closed - Opened by wbbeyourself 3 months ago
- 4 comments
Labels: bug
#30 - [DOCS] Fix typo for import tokenizer in quick start example
Pull Request -
State: closed - Opened by jasonacox 3 months ago
- 1 comment
Labels: documentation
#30 - [DOCS] Fix typo for import tokenizer in quick start example
Pull Request -
State: closed - Opened by jasonacox 3 months ago
- 1 comment
Labels: documentation
#29 - [BUG] Fix the start_index and end_index to point to character indices, not token indices
Pull Request -
State: closed - Opened by mrmps 3 months ago
- 2 comments
Labels: bug
#29 - [BUG] Fix the start_index and end_index to point to character indices, not token indices
Pull Request -
State: closed - Opened by mrmps 3 months ago
- 2 comments
Labels: bug
#28 - Add initial batching support via `chunk_batch` fn + update DOCS
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#28 - Add initial batching support via `chunk_batch` fn + update DOCS
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#27 - Update dependency version of SentenceTransformer to at least 2.3.0
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#27 - Update dependency version of SentenceTransformer to at least 2.3.0
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#26 - [BUG]AttributeError: 'SentenceTransformer' object has no attribute 'similarity'
Issue -
State: closed - Opened by heweapon 3 months ago
- 6 comments
Labels: bug
#26 - [BUG]AttributeError: 'SentenceTransformer' object has no attribute 'similarity'
Issue -
State: closed - Opened by heweapon 3 months ago
- 6 comments
Labels: bug
#25 - ImportError: cannot import name 'tokenizer' from 'tokenizers' (/usr/local/lib/python3.10/site-packages/tokenizers/__init__.py)
Issue -
State: closed - Opened by abchbx 3 months ago
- 1 comment
#25 - ImportError: cannot import name 'tokenizer' from 'tokenizers' (/usr/local/lib/python3.10/site-packages/tokenizers/__init__.py)
Issue -
State: closed - Opened by abchbx 3 months ago
- 1 comment
#24 - fix: tokenizer mismatch for `SemanticChunker` + Add BaseEmbeddings
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#24 - fix: tokenizer mismatch for `SemanticChunker` + Add BaseEmbeddings
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#23 - Can I load offline tokenizers in it?
Issue -
State: open - Opened by a136214808 3 months ago
- 2 comments
Labels: bug
#23 - Can I load offline tokenizers in it?
Issue -
State: open - Opened by a136214808 3 months ago
- 2 comments
Labels: bug
#22 - Update README.md + minor updates
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#22 - Update README.md + minor updates
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#21 - Remove Spacy dependency from 'sentence' install + Add FAQ to DOCS.md
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#21 - Remove Spacy dependency from 'sentence' install + Add FAQ to DOCS.md
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#20 - Remove Spacy dependency from Chonkie
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#20 - Remove Spacy dependency from Chonkie
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#19 - Add FastEmbed Support for Embedding Generation/Inference
Issue -
State: closed - Opened by adithya-s-k 3 months ago
- 5 comments
Labels: enhancement
#19 - Add FastEmbed Support for Embedding Generation/Inference
Issue -
State: open - Opened by adithya-s-k 3 months ago
- 4 comments
Labels: enhancement
#18 - `TokenChunker` does not support multiple inputs
Issue -
State: closed - Opened by not-lain 3 months ago
- 5 comments
Labels: bug, enhancement
#18 - `TokenChunker` does not support multiple inputs
Issue -
State: closed - Opened by not-lain 3 months ago
- 5 comments
Labels: bug, enhancement
#17 - Update README.md + fix DOCS.md typo
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#17 - Update README.md + fix DOCS.md typo
Pull Request -
State: closed - Opened by bhavnicksm 3 months ago
#16 - Incorrect import in Docs, SDPMChunker reference
Issue -
State: closed - Opened by Om-Alve 3 months ago
- 1 comment
#16 - Incorrect import in Docs, SDPMChunker reference
Issue -
State: closed - Opened by Om-Alve 3 months ago
- 1 comment