nltk/nltk issues and pull requests

#3243 - Questions about Copilot + Open Source Software Hierarchy

Issue - State: closed - Opened by liaochris 6 months ago

#3242 - Formatted code with black=24.3.0 in codebase

Pull Request - State: closed - Opened by jeslinpjames 6 months ago - 2 comments
Labels: corpus, tagger, parsing, stem/lemma

#3241 - UTF-8 codec can't decode byte 0×e9 in position 122

Issue - State: open - Opened by ikrammohamdi 7 months ago

#3240 - Add support for disabling the sorting and list creation for WordNet object relation methods

Pull Request - State: open - Opened by bryant1410 7 months ago - 14 comments
Labels: corpus

#3239 - stem accuracy

Issue - State: closed - Opened by Moustafa1Rizk1 7 months ago - 2 comments

#3238 - It would be nice to have a mapping from arpabet to IPA for the cmudict

Issue - State: open - Opened by fcbond 7 months ago - 3 comments

#3237 - ci: bump action versions

Pull Request - State: closed - Opened by purificant 7 months ago
Labels: CI

#3236 - Best NLTK books

Issue - State: open - Opened by StepHaze 7 months ago

#3235 - Reversed y labels in dispersion_plot

Issue - State: open - Opened by kvmilos 7 months ago - 2 comments

#3234 - i want to write python script i have italian text files that who i verify my word in italian dictionery please solve

Issue - State: closed - Opened by imtiaz231 7 months ago - 1 comment

#3233 - 10x Faster Levenshtein Distances

Pull Request - State: open - Opened by ashvardanian 7 months ago - 3 comments
Labels: metrics

#3232 - Fraction object creation fails with extra kwargs in bleu_score.py

Issue - State: closed - Opened by destroy-lonely 8 months ago - 2 comments

#3231 - Make WordNet's synset relations available from the lemmas

Pull Request - State: closed - Opened by ekaf 8 months ago - 2 comments
Labels: corpus

#3230 - Fix #3124- bug with PickleCorpusView raising UnicodeDecodeError

Pull Request - State: closed - Opened by Ubadub 8 months ago - 3 comments
Labels: corpus

#3229 - Add reference to entropy implementation used

Pull Request - State: closed - Opened by mbauwens 8 months ago - 3 comments
Labels: language-model

#3228 - module 'nltk' has no attribute 'data

Issue - State: closed - Opened by peronc 9 months ago - 2 comments

#3227 - A potential edge case for WordNetLemmatizer.lemmatize()

Issue - State: closed - Opened by bowenyi-umich 9 months ago - 1 comment

#3226 - import error with numpy 1.24.4

Issue - State: closed - Opened by mcdominik 9 months ago - 3 comments

#3225 - Avoid recursive suffix stripping in wordnet morphy

Pull Request - State: closed - Opened by ekaf 9 months ago - 3 comments
Labels: corpus, stem/lemma

#3224 - fix for word_tokenize() Failing to Split English Contractions When Followed by [\t\n\f\r]

Pull Request - State: closed - Opened by Higgs32584 9 months ago - 9 comments
Labels: tokenizer

#3222 - Implement vocabulary introduction for texttiling

Pull Request - State: open - Opened by Syzygy2048 9 months ago
Labels: tokenizer

#3221 - add workaround for cache sometimes not being restored correctly on macos

Pull Request - State: closed - Opened by purificant 9 months ago
Labels: CI

#3220 - Not able to download the NLTK data module (python as well as manual download)

Issue - State: closed - Opened by subhra-ranjan-padhy 9 months ago - 2 comments

#3219 - upgrade automated code checks, part 2

Pull Request - State: closed - Opened by purificant 10 months ago - 1 comment
Labels: corpus, tokenizer, tagger, parsing, stem/lemma, classifier, GUI, twitter, cluster, metrics, internals

#3218 - Silence verbose warnings in closure

Pull Request - State: closed - Opened by ekaf 10 months ago - 6 comments

#3217 - upgrade automated code checks

Pull Request - State: closed - Opened by purificant 10 months ago - 1 comment
Labels: classifier

#3216 - sunset python 3.7

Pull Request - State: closed - Opened by purificant 10 months ago
Labels: CI

#3215 - quickfix syntax / typo

Pull Request - State: closed - Opened by purificant 10 months ago - 1 comment
Labels: metrics

#3214 - ci: update labeler to v5, change config file to new format

Pull Request - State: closed - Opened by purificant 10 months ago
Labels: CI

#3213 - ci: update actions

Pull Request - State: closed - Opened by purificant 10 months ago
Labels: CI

#3212 - Dispersion Plot was not populating in correct order on Y axis. I have corrected that order. Please use the below code in dispersion.py file.

Issue - State: closed - Opened by DS3006 10 months ago - 2 comments

#3211 - KneserNeyInterpolated has problem with OOV words during testing and perplexity is always inf

Issue - State: open - Opened by nilinykh 10 months ago - 7 comments

#3210 - `TreebankWordDetokenizer().detokenize()` introduces unexpected spaces before periods.

Issue - State: open - Opened by Alnusjaponica 10 months ago

#3209 - Refactor LanguageModel class, adding split functionality and unit tests

Pull Request - State: open - Opened by venkat1924 10 months ago

#3208 - Tokenizer punkt zip file sometimes does not unpackage

Issue - State: open - Opened by ryonsteele 10 months ago

#3207 - fix: enable py 3.12 in ci and fix error in bleu calculation

Pull Request - State: closed - Opened by k4black 11 months ago - 17 comments
Labels: CI

#3206 - Bug in nltk.draw.dispersion_plot with nltk 3.8.1, matplotlib-base 3.8.0, matplotlib-inline 0.1.6 and numpy 1.26

Issue - State: closed - Opened by m-d-grunnill 11 months ago - 2 comments

#3205 - Prevent crash on BLEU if weights are np array

Pull Request - State: closed - Opened by tomaarsen 11 months ago

#3204 - `corpus_bleu` function does not catch all the expections when calling `weights[0][0]`

Issue - State: closed - Opened by zhaochenyang20 11 months ago - 3 comments

#3203 - Make sure that we invoke all the intended regex patterns in ToktokTokenizer...

Pull Request - State: closed - Opened by alexrudnick 11 months ago - 3 comments
Labels: tokenizer

#3202 - ToktokTokenizer doesn't call one of the included replacement patterns and thus doesn't tokenize some punctuation, like opening guillemets

Issue - State: closed - Opened by alexrudnick 11 months ago - 1 comment

#3201 - fix broken link to the Coding Horror blog post in CONTRIBUTING.md

Pull Request - State: closed - Opened by alexrudnick 11 months ago - 1 comment

#3200 - Import of Trie fails in mwe.py

Issue - State: closed - Opened by passionate-zebracorn 11 months ago - 1 comment

#3199 - Fix dunning log likelihood ValueError

Pull Request - State: closed - Opened by vivekkalyan 11 months ago - 1 comment
Labels: tokenizer

#3198 - NLTK is considering "hi" and "hello" as a noun.

Issue - State: closed - Opened by RishitAtwal 11 months ago - 4 comments

#3197 - NLTK thinks `turn` is a noun when it shoud be a verb.

Issue - State: closed - Opened by alf1e 11 months ago - 1 comment

#3196 - Problems Running Examples Starting with Babelize

Issue - State: closed - Opened by mdebellis 11 months ago - 1 comment

#3195 - Add a function of splitting combined words.

Issue - State: open - Opened by wxz 12 months ago

#3194 - Unable to download Stopwords and also unable to access stopwords zip file manually.

Issue - State: closed - Opened by mdabdulrahman 12 months ago - 2 comments

#3193 - Add support for a `sort` argument in WordNet methods

Issue - State: closed - Opened by bryant1410 12 months ago - 22 comments
Labels: enhancement

#3192 - Trouble with installation importing nltk

Issue - State: closed - Opened by davidam 12 months ago - 1 comment

#3191 - Potential Regex Denial of Service (ReDoS)

Issue - State: open - Opened by ready-research almost 1 year ago

#3190 - minor fix for wordnet lemmatization pos param documentation

Pull Request - State: closed - Opened by sharpblade4 about 1 year ago - 1 comment
Labels: stem/lemma

#3189 - word_tokenize() Failed to Split English Contractions When Followed by [\t\n\f\r]

Issue - State: closed - Opened by donglihe-hub about 1 year ago - 3 comments

#3188 - Update Penn POS descriptions in chunkparser_app.py

Pull Request - State: closed - Opened by nathanjmcdougall about 1 year ago - 1 comment
Labels: GUI

#3187 - not download punkt

Issue - State: open - Opened by NIRA02525 about 1 year ago - 6 comments

#3186 - Missing English words in words()

Issue - State: open - Opened by BaGRoS about 1 year ago - 4 comments

#3185 - Download somehow blocked

Issue - State: closed - Opened by sjkoelle about 1 year ago - 1 comment

#3184 - In CoreNLPParser, how can I get output as different formats, e.g., 'wordsAndTags' or 'typedDependencies'

Issue - State: open - Opened by Lopa07 about 1 year ago

#3183 - Refactoring

Pull Request - State: closed - Opened by tosemml about 1 year ago - 2 comments
Labels: corpus, classifier, metrics

#3182 - Formatargspec Warning in import line

Issue - State: open - Opened by nvenkatcivil about 1 year ago

#3181 - edit_distance_align() in distance.py gives wrong alignment path when substitution_cost is greater than 2

Issue - State: open - Opened by yzhaoinuw about 1 year ago

#3180 - Bug on edit distance align

Pull Request - State: open - Opened by yzhaoinuw about 1 year ago
Labels: metrics

#3179 - Incorrect documentation in nltk.stem.lancaster.LancasterStemmer class

Issue - State: open - Opened by Smeetp1234 about 1 year ago

#3178 - Lepor : A machine translation evaluation Metric.

Pull Request - State: open - Opened by ulhaqi12 about 1 year ago - 11 comments
Labels: enhancement, nice idea, translate

#3177 - I tried everything and still I get: [nltk_data] Error loading taggers: Package 'taggers' not found in [nltk_data] index

Issue - State: closed - Opened by venturaEffect about 1 year ago - 24 comments
Labels: installation

GitHub / nltk/nltk issues and pull requests