Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / chonkie-ai/chonkie issues and pull requests
#156 - [DOCS] Update Benchmarks - Include Wikipedia-100k and Wikipedia-500k run timings
Pull Request -
State: closed - Opened by bhavnicksm 10 days ago
#155 - [FEAT] Add support for callable `token_counter` as input for rule-based Chunkers
Pull Request -
State: closed - Opened by bhavnicksm 10 days ago
- 1 comment
#154 - [Feat] Add LiteLLMEmbeddings - Support SemanticChunking through LiteLLM
Pull Request -
State: open - Opened by Dhan996 14 days ago
#153 - [FEAT] Support Ollama Embeddings
Pull Request -
State: open - Opened by Udayk02 16 days ago
#152 - RecursiveRules integration into OverlapRefinery
Pull Request -
State: open - Opened by Sankgreall 18 days ago
#151 - [BUG] SentenceChunker throws IndexError
Issue -
State: open - Opened by YashRajOjha-Draup 19 days ago
- 1 comment
Labels: bug
#150 - [FEAT] Support RecursiveRules (in reverse) for OverlapRefiner
Issue -
State: open - Opened by Sankgreall 20 days ago
- 3 comments
Labels: enhancement
#149 - Fix __repr__ to output all fields
Pull Request -
State: open - Opened by shreyashnigam 23 days ago
- 1 comment
#148 - [FEAT] Advanced regex based parsing + XML+ chunk metadata
Issue -
State: open - Opened by finnschwall 24 days ago
- 2 comments
Labels: enhancement
#147 - [FEAT] Support `return_type` with `texts` output type
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
Labels: enhancement
#146 - [FEAT] Support `return_type` as `texts` for direct text handling
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
Labels: enhancement
#145 - [DOCS] Benchmarking update
Pull Request -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
#144 - [FEAT] Support regular expressions for splitting in RecursiveChunker
Issue -
State: open - Opened by sophiehenning about 1 month ago
- 3 comments
Labels: enhancement
#143 - [chore] Bump up the package version to v0.4.1
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#142 - Support Ollama Embeddings
Issue -
State: closed - Opened by bhavnicksm about 1 month ago
#141 - [FIX] Minor fixes + Stylistic enhancements for TQDM and Multiprocessing
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
- 1 comment
#140 - Replace dead discord link with infinite lifetime
Pull Request -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
#139 - [FEAT] Add support for OllamaEmbeddings in Chonkie
Issue -
State: open - Opened by chenzf11 about 1 month ago
- 1 comment
Labels: enhancement
#138 - [FEAT] Add TQDM progress bars for `chunk_batch` + Update README.md
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
- 1 comment
#137 - [FIX] Handle edge case for RecursiveChunker (#131)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#136 - [fix] High `chunk_overlap` causes last chunk to be entirely redundant
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#135 - [DOCS] Update readme intro to match docs.
Pull Request -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
#134 - [FIX] Remove tests for Py3.8 — Incompatible for support
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
- 1 comment
#133 - [FEAT] Add support for Python 3.8
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
- 1 comment
#132 - [FIX] `start_index` incorrect when `chunk_overlap` is not 0 (#116)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#131 - [BUG] Recursive Chunker Fails If The Substring Between Two `\n` Is Less Than `min_characters_per_chunk`
Issue -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
Labels: bug
#130 - [FEAT] Support Cohere Embeddings for SemanticChunker and SDPMChunker #118
Pull Request -
State: open - Opened by Udayk02 about 1 month ago
- 5 comments
#129 - Feature Implementation: Cohere Embeddings Support #118
Pull Request -
State: closed - Opened by Udayk02 about 1 month ago
#128 - [Fix] Add fix for #92: Support `class.method` as a Tokenizer for `CustomEmbedding` +. minor changes
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#127 - [FIX] Support class methods as `token_counter` objects for `CustomEmbeddings` (#92)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
Labels: bug
#126 - [FIX] #116: Incorrect`start_index` when `chunk_overlap` is not 0
Pull Request -
State: closed - Opened by Udayk02 about 1 month ago
- 12 comments
#125 - [DOCS] Token Chunking Docs Should Mention Batch Size
Issue -
State: open - Opened by shreyashnigam about 1 month ago
Labels: bug, documentation
#124 - [BUG] No argument `batch_size` in sentence chunker
Issue -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
Labels: bug
#123 - [RFC] 🦛 Roadmap for Q1 2025
Issue -
State: open - Opened by shreyashnigam about 1 month ago
- 2 comments
Labels: roadmaps
#122 - Update CONTRIBUTING.md with first issue hyperlink
Pull Request -
State: closed - Opened by shreyashnigam about 1 month ago
- 1 comment
#121 - [fix] CI: reports were not being uploaded to Codecov
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#120 - [FEAT] Add TQDM to default installs + CONTRIBUTING.md + other minor updates
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#119 - Add CONTRIBUTING.md, update issue templates, CI, Codecov and more...
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#118 - [FEAT] Support Cohere Embeddings for SemanticChunker and SDPMChunker
Issue -
State: open - Opened by bhavnicksm about 1 month ago
- 3 comments
Labels: enhancement, good first issue
#117 - [FEAT] Add support for code file chunking
Issue -
State: open - Opened by bvqbao about 1 month ago
- 1 comment
Labels: enhancement
#116 - [BUG] Am I doing something wrong here? Possible issue with start and end index?
Issue -
State: closed - Opened by KamarulAdha about 1 month ago
- 15 comments
Labels: bug, good first issue, in progress
#115 - [FEAT] Add `tqdm` progressbars as an optional parameter for `chunk_batch` in chunkers
Issue -
State: closed - Opened by bhavnicksm about 1 month ago
- 1 comment
Labels: enhancement
#114 - [chore] Bump version to "v0.4.0" + minor change
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#113 - [fix] Add fix for #106: Reconstruction tests for SemanticChunker failing, missing last sentence
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#112 - [fix] #106: Missing last sentence in the SemanticChunker
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#111 - Patch SemanticChunker._group_sentences_window() to add last sentence …
Pull Request -
State: closed - Opened by philipchung about 1 month ago
- 2 comments
#110 - [fix] Correct the start and end indices for TokenChunker in Batch mode (#84)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#109 - [fix] Correct the start and end indices for TokenChunker in Batch mode (#84)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#108 - [FEAT] Add support for RecursiveChunking + minor fixes
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#107 - Add initial support for Recursive Chunking (`RecursiveChunker`)
Pull Request -
State: closed - Opened by bhavnicksm about 1 month ago
#106 - [BUG] SemanticChunker and SDPMChunker Truncates Last Sentence
Issue -
State: closed - Opened by philipchung about 1 month ago
- 2 comments
Labels: bug, in progress
#105 - [FEAT] Add support for Contextual Chunking from Anthropic
Issue -
State: open - Opened by bhavnicksm about 2 months ago
Labels: enhancement
#104 - [BUG] embedding_model is not a valid embedding model', 'Please install the `semantic` extra to use this feature'
Issue -
State: closed - Opened by universewill about 2 months ago
- 4 comments
Labels: bug, question, in progress
#103 - [Minor] Add Discord badge to README for community engagement
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#102 - [fix] Docstrings in SemanticChunker should include **kwargs
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#101 - [fix] Add LateChunker support to chunker and module exports
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#100 - Update version to 0.3.0 in pyproject.toml and __init__.py
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#99 - [FIX] Update outdated package versions + set max limit to numpy to v2.2 (buggy)
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#98 - [FEAT] Add LateChunker
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#97 - Add initial support for Late Chunking
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#96 - [FIX] MEGA Bug Fix PR: Fix WordChunker batching, Fix SentenceChunker token counts, Initialization + more
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#95 - [BUG] SentenceChunker token counts are randomly off by one token
Issue -
State: closed - Opened by bhavnicksm about 2 months ago
- 1 comment
Labels: bug, in progress
#94 - BGE-M3 custom embeddings always have the same number of chunks between Semantic and SDPM Chunker
Issue -
State: closed - Opened by armsp about 2 months ago
- 6 comments
Labels: bug, in progress
#93 - Semantic Similarity does not work - got an unexpected keyword argument 'similarity_threshold'
Issue -
State: closed - Opened by armsp about 2 months ago
- 4 comments
Labels: bug, documentation, in progress
#92 - BaseEmbeddings.embed should be treated as callable
Issue -
State: closed - Opened by ascendant512 about 2 months ago
- 4 comments
Labels: bug
#91 - [FEAT] Support Hierarchial Chunking with Semantic Chunking as a secondary
Issue -
State: open - Opened by theoden8 about 2 months ago
- 3 comments
Labels: enhancement
#90 - [Fix] WordChunker chunk_batch fail
Pull Request -
State: closed - Opened by sky-2002 about 2 months ago
- 3 comments
#89 - [Fix] #88: SemanticChunker raises UnboundLocalError: local variable 'threshold' referenced before assignment
Pull Request -
State: closed - Opened by arpesenti about 2 months ago
- 1 comment
#88 - [BUG] SemanticChunker raises UnboundLocalError: local variable 'threshold' referenced before assignment
Issue -
State: closed - Opened by arpesenti about 2 months ago
- 4 comments
Labels: bug
#87 - [Fix] #37: Incorrect indexing when repitition is present in the text
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#87 - [Fix] #37: Incorrect indexing when repitition is present in the text
Pull Request -
State: closed - Opened by bhavnicksm about 2 months ago
#86 - [Fix]:add initial_sentences param and fix custom tokenizer does not work
Pull Request -
State: open - Opened by ljhssga about 2 months ago
- 2 comments
#86 - [Fix]:add initial_sentences param and fix custom tokenizer does not work
Pull Request -
State: closed - Opened by ljhssga about 2 months ago
- 3 comments
#85 - How to set "trust_remote_code=True" for SentenceTransformerEmbedding?
Issue -
State: closed - Opened by anhhct 2 months ago
- 3 comments
Labels: bug, good first issue, in progress
#85 - How to set "trust_remote_code=True" for SentenceTransformerEmbedding?
Issue -
State: open - Opened by anhhct 2 months ago
- 2 comments
Labels: bug, good first issue, in progress
#84 - [BUG] TokenChunker Batch_chunking gives wrong end_index
Issue -
State: closed - Opened by CharlesMoslonka 2 months ago
- 2 comments
Labels: bug, in progress
#84 - [BUG] TokenChunker Batch_chunking gives wrong end_index
Issue -
State: open - Opened by CharlesMoslonka 2 months ago
- 1 comment
Labels: bug, in progress
#83 - [BUG] Semantic Chunks are to Big
Issue -
State: closed - Opened by aribornstein 2 months ago
- 3 comments
Labels: bug
#83 - [BUG] Semantic Chunks are to Big
Issue -
State: closed - Opened by aribornstein 2 months ago
- 4 comments
Labels: bug
#82 - Bump version to v0.2.2 for release
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#82 - Bump version to v0.2.2 for release
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#81 - Expose the seperation delim for simple multilingual chunking
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#81 - Expose the seperation delim for simple multilingual chunking
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#80 - [Fix] Unify dataclasses under a types.py for ease
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#80 - [Fix] Unify dataclasses under a types.py for ease
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#79 - [FEAT] Add "auto" threshold configuration via Statistical analysis in SemanticChunker + minor fixes
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
Labels: enhancement
#79 - [FEAT] Add "auto" threshold configuration via Statistical analysis in SemanticChunker + minor fixes
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
Labels: enhancement
#78 - Add support for BaseRefinery and OverlapRefinery + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#78 - Add support for BaseRefinery and OverlapRefinery + minor changes
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#77 - [FEAT] Add BaseRefinery and OverlapRefinery support
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#77 - [FEAT] Add BaseRefinery and OverlapRefinery support
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#76 - [FEAT]Regarding the implementation principles of SemanticChunker and some flexibility requirements
Issue -
State: closed - Opened by RemixaWorld 2 months ago
- 5 comments
Labels: enhancement
#76 - [FEAT]Regarding the implementation principles of SemanticChunker and some flexibility requirements
Issue -
State: closed - Opened by RemixaWorld 2 months ago
- 5 comments
Labels: enhancement
#75 - Update the docs path to docs.chonkie.ai
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#75 - Update the docs path to docs.chonkie.ai
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago
#74 - docs: update README.md
Pull Request -
State: closed - Opened by eltociear 2 months ago
- 1 comment
#74 - docs: update README.md
Pull Request -
State: closed - Opened by eltociear 2 months ago
- 1 comment
#73 - [BUG] WordChunker's `chunk_batch` function fail
Issue -
State: open - Opened by kime541200 2 months ago
- 1 comment
Labels: bug, in progress
#73 - [BUG] WordChunker's `chunk_batch` function fail
Issue -
State: closed - Opened by kime541200 2 months ago
- 2 comments
Labels: bug, in progress
#72 - Add TEVL to speed-up sentence chunking
Pull Request -
State: closed - Opened by bhavnicksm 2 months ago