Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / chonkie-ai/chonkie issues and pull requests

#72 - Add TEVL to speed-up sentence chunking

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#71 - Add TEVL to speed up sentence chunker

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#71 - Add TEVL to speed up sentence chunker

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#70 - [Fix] Allow for functions as token_counters in BaseChunkers

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#70 - [Fix] Allow for functions as token_counters in BaseChunkers

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#69 - Add support for automated testing with Github Actions

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#69 - Add support for automated testing with Github Actions

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#68 - Add `min_chunk_size` to SDPMChunker + Lint codebase with ruff + minor changes

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: chore

#68 - Add `min_chunk_size` to SDPMChunker + Lint codebase with ruff + minor changes

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: chore

#67 - [BUG] example code for WordChunker is not working

Issue - State: closed - Opened by mozz85 3 months ago - 3 comments
Labels: bug

#67 - [BUG] example code for WordChunker is not working

Issue - State: closed - Opened by mozz85 3 months ago - 3 comments
Labels: bug

#66 - Added automated testing using Github Actions

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 3 comments

#66 - Added automated testing using Github Actions

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 3 comments

#65 - Fixed similarity_percentile with sdpm chunker + added test

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 5 comments

#65 - Fixed similarity_percentile with sdpm chunker + added test

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 5 comments

#64 - [BUG] EmbeddingsRegistry custom tokenizer does not work

Issue - State: closed - Opened by rsharma-autessa 3 months ago - 6 comments
Labels: bug

#64 - [BUG] EmbeddingsRegistry custom tokenizer does not work

Issue - State: closed - Opened by rsharma-autessa 3 months ago - 6 comments
Labels: bug

#63 - [Update] Change default embedding model in SemanticChunkers

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#63 - [Update] Change default embedding model in SemanticChunkers

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#61 - [help] failed on SemanticChunker's example

Issue - State: closed - Opened by mozz85 3 months ago - 14 comments
Labels: bug

#61 - [help] failed on SemanticChunker's example

Issue - State: closed - Opened by mozz85 3 months ago - 14 comments
Labels: bug

#59 - [BUG] SDPM & Semantic Chunking Example not working

Issue - State: closed - Opened by regstuff 3 months ago - 2 comments
Labels: bug

#59 - [BUG] SDPM & Semantic Chunking Example not working

Issue - State: closed - Opened by regstuff 3 months ago - 2 comments
Labels: bug

#58 - [Fix] Add fix for #55

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#58 - [Fix] Add fix for #55

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#56 - Update DOCS.md - fixed embeddings path after recent change

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 3 comments

#56 - Update DOCS.md - fixed embeddings path after recent change

Pull Request - State: closed - Opened by pratyushmittal 3 months ago - 3 comments

#55 - [BUG] Newlines are not removed after pre-processing in SemanticChunker

Issue - State: closed - Opened by Pringled 3 months ago - 3 comments
Labels: bug

#55 - [BUG] Newlines are not removed after pre-processing in SemanticChunker

Issue - State: closed - Opened by Pringled 3 months ago - 3 comments
Labels: bug

#52 - [Fix] Token counts from Tokenizers and Transformers adding special tokens

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement

#52 - [Fix] Token counts from Tokenizers and Transformers adding special tokens

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement

#50 - [DISC] Benchmarking Chonkie Mega-Thread

Issue - State: open - Opened by bhavnicksm 3 months ago - 2 comments
Labels: documentation, enhancement

#50 - [DISC] Benchmarking Chonkie Mega-Thread

Issue - State: open - Opened by bhavnicksm 3 months ago - 2 comments
Labels: documentation, enhancement

#49 - [FEAT] Add support for Model2VecEmbeddings + Switch default embeddings to Model2VecEmbeddings

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement

#49 - [FEAT] Add support for Model2VecEmbeddings + Switch default embeddings to Model2VecEmbeddings

Pull Request - State: closed - Opened by bhavnicksm 3 months ago
Labels: enhancement

#48 - Reconstruction Test

Pull Request - State: closed - Opened by mrmps 3 months ago - 3 comments

#48 - Reconstruction Test

Pull Request - State: closed - Opened by mrmps 3 months ago - 3 comments

#46 - Add initial OpenAIEmbeddings support to Chonkie ✨

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#46 - Add initial OpenAIEmbeddings support to Chonkie ✨

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#43 - [DISC] Improving Documentation

Issue - State: closed - Opened by bhavnicksm 3 months ago - 5 comments
Labels: documentation, enhancement, help wanted, in progress

#42 - [BUG] Chunkers failing the test of recronstruction

Issue - State: closed - Opened by mrmps 3 months ago - 7 comments
Labels: bug

#41 - [FEAT] - Add model2vec embedding models

Pull Request - State: closed - Opened by sky-2002 3 months ago - 15 comments
Labels: enhancement

#40 - [FEAT] Min chunk size (for semantic chunkers)

Issue - State: closed - Opened by kbarendrecht 3 months ago - 2 comments
Labels: enhancement

#39 - [FEAT] Add async support to SDPMChunker and to SemanticChunker

Issue - State: open - Opened by rodion-m 3 months ago - 7 comments
Labels: enhancement

#38 - [FEAT] Add an ability to use OpenAI / VoyageAI / Cohere embeddings with SDPMChunker via LiteLLM

Issue - State: open - Opened by rodion-m 3 months ago - 5 comments
Labels: enhancement

#37 - [BUG] start_index and end_index inaccurate for repetitive text chunks

Issue - State: closed - Opened by bhavnicksm 3 months ago - 1 comment
Labels: bug

#36 - [FEAT] Allow configuring backend for Sentence_Transformers (e.g. ONNX, openVINO)

Issue - State: closed - Opened by kbarendrecht 3 months ago - 3 comments
Labels: enhancement

#35 - Bump version to 0.2.0.post1 in pyproject.toml and __init__.py

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#34 - Use `__slots__` instead of `slots=True` for python3.9 support

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#33 - [BUG] TypeError: dataclass() got an unexpected keyword argument 'slots'

Issue - State: closed - Opened by AgentT30 3 months ago - 2 comments
Labels: bug

#31 - [BUG]pyo3_runtime.PanicException: no entry found for key

Issue - State: closed - Opened by wbbeyourself 3 months ago - 4 comments
Labels: bug

#30 - [DOCS] Fix typo for import tokenizer in quick start example

Pull Request - State: closed - Opened by jasonacox 3 months ago - 1 comment
Labels: documentation

#29 - [BUG] Fix the start_index and end_index to point to character indices, not token indices

Pull Request - State: closed - Opened by mrmps 3 months ago - 2 comments
Labels: bug

#28 - Add initial batching support via `chunk_batch` fn + update DOCS

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#26 - [BUG]AttributeError: 'SentenceTransformer' object has no attribute 'similarity'

Issue - State: closed - Opened by heweapon 3 months ago - 6 comments
Labels: bug

#23 - Can I load offline tokenizers in it?

Issue - State: closed - Opened by a136214808 3 months ago - 3 comments
Labels: bug

#22 - Update README.md + minor updates

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#20 - Remove Spacy dependency from Chonkie

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#19 - Add FastEmbed Support for Embedding Generation/Inference

Issue - State: closed - Opened by adithya-s-k 3 months ago - 5 comments
Labels: enhancement

#18 - `TokenChunker` does not support multiple inputs

Issue - State: closed - Opened by not-lain 3 months ago - 5 comments
Labels: bug, enhancement

#17 - Update README.md + fix DOCS.md typo

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#16 - Incorrect import in Docs, SDPMChunker reference

Issue - State: closed - Opened by Om-Alve 3 months ago - 1 comment

#14 - Development

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#13 - Run Black + Isort + beautify the code a bit

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#11 - Bump version to 0.1.1 in pyproject.toml and __init__.py

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#10 - Update README.md

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#7 - Update README.md + remove .github action

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#5 - Update README.md

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#4 - Add support for Transformers and TikToken

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#3 - v0.0.1a8

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#2 - Update Logo (for PyPI) + Update README.md + Fix packaging bug

Pull Request - State: closed - Opened by bhavnicksm 3 months ago

#1 - v0.0.1a4

Pull Request - State: closed - Opened by bhavnicksm 3 months ago