Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / chonkie-ai/chonkie issues and pull requests

#155 - [FEAT] Add support for callable `token_counter` as input for rule-based Chunkers

Pull Request - State: closed - Opened by bhavnicksm 10 days ago - 1 comment

#153 - [FEAT] Support Ollama Embeddings

Pull Request - State: open - Opened by Udayk02 16 days ago

#152 - RecursiveRules integration into OverlapRefinery

Pull Request - State: open - Opened by Sankgreall 18 days ago

#151 - [BUG] SentenceChunker throws IndexError

Issue - State: open - Opened by YashRajOjha-Draup 19 days ago - 1 comment
Labels: bug

#150 - [FEAT] Support RecursiveRules (in reverse) for OverlapRefiner

Issue - State: open - Opened by Sankgreall 20 days ago - 3 comments
Labels: enhancement

#149 - Fix __repr__ to output all fields

Pull Request - State: open - Opened by shreyashnigam 23 days ago - 1 comment

#148 - [FEAT] Advanced regex based parsing + XML+ chunk metadata

Issue - State: open - Opened by finnschwall 24 days ago - 2 comments
Labels: enhancement

#147 - [FEAT] Support `return_type` with `texts` output type

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago
Labels: enhancement

#146 - [FEAT] Support `return_type` as `texts` for direct text handling

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago
Labels: enhancement

#145 - [DOCS] Benchmarking update

Pull Request - State: closed - Opened by shreyashnigam about 1 month ago - 1 comment

#144 - [FEAT] Support regular expressions for splitting in RecursiveChunker

Issue - State: open - Opened by sophiehenning about 1 month ago - 3 comments
Labels: enhancement

#143 - [chore] Bump up the package version to v0.4.1

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#142 - Support Ollama Embeddings

Issue - State: closed - Opened by bhavnicksm about 1 month ago

#141 - [FIX] Minor fixes + Stylistic enhancements for TQDM and Multiprocessing

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago - 1 comment

#140 - Replace dead discord link with infinite lifetime

Pull Request - State: closed - Opened by shreyashnigam about 1 month ago - 1 comment

#139 - [FEAT] Add support for OllamaEmbeddings in Chonkie

Issue - State: open - Opened by chenzf11 about 1 month ago - 1 comment
Labels: enhancement

#138 - [FEAT] Add TQDM progress bars for `chunk_batch` + Update README.md

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago - 1 comment

#137 - [FIX] Handle edge case for RecursiveChunker (#131)

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#136 - [fix] High `chunk_overlap` causes last chunk to be entirely redundant

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#135 - [DOCS] Update readme intro to match docs.

Pull Request - State: closed - Opened by shreyashnigam about 1 month ago - 1 comment

#134 - [FIX] Remove tests for Py3.8 — Incompatible for support

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago - 1 comment

#133 - [FEAT] Add support for Python 3.8

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago - 1 comment

#132 - [FIX] `start_index` incorrect when `chunk_overlap` is not 0 (#116)

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#130 - [FEAT] Support Cohere Embeddings for SemanticChunker and SDPMChunker #118

Pull Request - State: open - Opened by Udayk02 about 1 month ago - 5 comments

#129 - Feature Implementation: Cohere Embeddings Support #118

Pull Request - State: closed - Opened by Udayk02 about 1 month ago

#127 - [FIX] Support class methods as `token_counter` objects for `CustomEmbeddings` (#92)

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago
Labels: bug

#126 - [FIX] #116: Incorrect`start_index` when `chunk_overlap` is not 0

Pull Request - State: closed - Opened by Udayk02 about 1 month ago - 12 comments

#125 - [DOCS] Token Chunking Docs Should Mention Batch Size

Issue - State: open - Opened by shreyashnigam about 1 month ago
Labels: bug, documentation

#124 - [BUG] No argument `batch_size` in sentence chunker

Issue - State: closed - Opened by shreyashnigam about 1 month ago - 1 comment
Labels: bug

#123 - [RFC] 🦛 Roadmap for Q1 2025

Issue - State: open - Opened by shreyashnigam about 1 month ago - 2 comments
Labels: roadmaps

#122 - Update CONTRIBUTING.md with first issue hyperlink

Pull Request - State: closed - Opened by shreyashnigam about 1 month ago - 1 comment

#121 - [fix] CI: reports were not being uploaded to Codecov

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#119 - Add CONTRIBUTING.md, update issue templates, CI, Codecov and more...

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#118 - [FEAT] Support Cohere Embeddings for SemanticChunker and SDPMChunker

Issue - State: open - Opened by bhavnicksm about 1 month ago - 3 comments
Labels: enhancement, good first issue

#117 - [FEAT] Add support for code file chunking

Issue - State: open - Opened by bvqbao about 1 month ago - 1 comment
Labels: enhancement

#116 - [BUG] Am I doing something wrong here? Possible issue with start and end index?

Issue - State: closed - Opened by KamarulAdha about 1 month ago - 15 comments
Labels: bug, good first issue, in progress

#115 - [FEAT] Add `tqdm` progressbars as an optional parameter for `chunk_batch` in chunkers

Issue - State: closed - Opened by bhavnicksm about 1 month ago - 1 comment
Labels: enhancement

#114 - [chore] Bump version to "v0.4.0" + minor change

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#112 - [fix] #106: Missing last sentence in the SemanticChunker

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#111 - Patch SemanticChunker._group_sentences_window() to add last sentence …

Pull Request - State: closed - Opened by philipchung about 1 month ago - 2 comments

#108 - [FEAT] Add support for RecursiveChunking + minor fixes

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#107 - Add initial support for Recursive Chunking (`RecursiveChunker`)

Pull Request - State: closed - Opened by bhavnicksm about 1 month ago

#106 - [BUG] SemanticChunker and SDPMChunker Truncates Last Sentence

Issue - State: closed - Opened by philipchung about 1 month ago - 2 comments
Labels: bug, in progress

#105 - [FEAT] Add support for Contextual Chunking from Anthropic

Issue - State: open - Opened by bhavnicksm about 2 months ago
Labels: enhancement

#104 - [BUG] embedding_model is not a valid embedding model', 'Please install the `semantic` extra to use this feature'

Issue - State: closed - Opened by universewill about 2 months ago - 4 comments
Labels: bug, question, in progress

#103 - [Minor] Add Discord badge to README for community engagement

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#102 - [fix] Docstrings in SemanticChunker should include **kwargs

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#101 - [fix] Add LateChunker support to chunker and module exports

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#100 - Update version to 0.3.0 in pyproject.toml and __init__.py

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#98 - [FEAT] Add LateChunker

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#97 - Add initial support for Late Chunking

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#95 - [BUG] SentenceChunker token counts are randomly off by one token

Issue - State: closed - Opened by bhavnicksm about 2 months ago - 1 comment
Labels: bug, in progress

#94 - BGE-M3 custom embeddings always have the same number of chunks between Semantic and SDPM Chunker

Issue - State: closed - Opened by armsp about 2 months ago - 6 comments
Labels: bug, in progress

#93 - Semantic Similarity does not work - got an unexpected keyword argument 'similarity_threshold'

Issue - State: closed - Opened by armsp about 2 months ago - 4 comments
Labels: bug, documentation, in progress

#92 - BaseEmbeddings.embed should be treated as callable

Issue - State: closed - Opened by ascendant512 about 2 months ago - 4 comments
Labels: bug

#91 - [FEAT] Support Hierarchial Chunking with Semantic Chunking as a secondary

Issue - State: open - Opened by theoden8 about 2 months ago - 3 comments
Labels: enhancement

#90 - [Fix] WordChunker chunk_batch fail

Pull Request - State: closed - Opened by sky-2002 about 2 months ago - 3 comments

#87 - [Fix] #37: Incorrect indexing when repitition is present in the text

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#87 - [Fix] #37: Incorrect indexing when repitition is present in the text

Pull Request - State: closed - Opened by bhavnicksm about 2 months ago

#86 - [Fix]:add initial_sentences param and fix custom tokenizer does not work

Pull Request - State: open - Opened by ljhssga about 2 months ago - 2 comments

#86 - [Fix]:add initial_sentences param and fix custom tokenizer does not work

Pull Request - State: closed - Opened by ljhssga about 2 months ago - 3 comments

#85 - How to set "trust_remote_code=True" for SentenceTransformerEmbedding?

Issue - State: closed - Opened by anhhct 2 months ago - 3 comments
Labels: bug, good first issue, in progress

#85 - How to set "trust_remote_code=True" for SentenceTransformerEmbedding?

Issue - State: open - Opened by anhhct 2 months ago - 2 comments
Labels: bug, good first issue, in progress

#84 - [BUG] TokenChunker Batch_chunking gives wrong end_index

Issue - State: closed - Opened by CharlesMoslonka 2 months ago - 2 comments
Labels: bug, in progress

#84 - [BUG] TokenChunker Batch_chunking gives wrong end_index

Issue - State: open - Opened by CharlesMoslonka 2 months ago - 1 comment
Labels: bug, in progress

#83 - [BUG] Semantic Chunks are to Big

Issue - State: closed - Opened by aribornstein 2 months ago - 3 comments
Labels: bug

#83 - [BUG] Semantic Chunks are to Big

Issue - State: closed - Opened by aribornstein 2 months ago - 4 comments
Labels: bug

#82 - Bump version to v0.2.2 for release

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#82 - Bump version to v0.2.2 for release

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#81 - Expose the seperation delim for simple multilingual chunking

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#81 - Expose the seperation delim for simple multilingual chunking

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#80 - [Fix] Unify dataclasses under a types.py for ease

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#80 - [Fix] Unify dataclasses under a types.py for ease

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#78 - Add support for BaseRefinery and OverlapRefinery + minor changes

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#78 - Add support for BaseRefinery and OverlapRefinery + minor changes

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#77 - [FEAT] Add BaseRefinery and OverlapRefinery support

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#77 - [FEAT] Add BaseRefinery and OverlapRefinery support

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#76 - [FEAT]Regarding the implementation principles of SemanticChunker and some flexibility requirements

Issue - State: closed - Opened by RemixaWorld 2 months ago - 5 comments
Labels: enhancement

#76 - [FEAT]Regarding the implementation principles of SemanticChunker and some flexibility requirements

Issue - State: closed - Opened by RemixaWorld 2 months ago - 5 comments
Labels: enhancement

#75 - Update the docs path to docs.chonkie.ai

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#75 - Update the docs path to docs.chonkie.ai

Pull Request - State: closed - Opened by bhavnicksm 2 months ago

#74 - docs: update README.md

Pull Request - State: closed - Opened by eltociear 2 months ago - 1 comment

#74 - docs: update README.md

Pull Request - State: closed - Opened by eltociear 2 months ago - 1 comment

#73 - [BUG] WordChunker's `chunk_batch` function fail

Issue - State: open - Opened by kime541200 2 months ago - 1 comment
Labels: bug, in progress

#73 - [BUG] WordChunker's `chunk_batch` function fail

Issue - State: closed - Opened by kime541200 2 months ago - 2 comments
Labels: bug, in progress

#72 - Add TEVL to speed-up sentence chunking

Pull Request - State: closed - Opened by bhavnicksm 2 months ago