tesseract-ocr/langdata_lstm issues and pull requests

#61 - Tatar language data quality issues

Issue - State: open - Opened by rsabirov 4 months ago

#60 - NO fas.unicharset and fas.xheights file for Persian Language

Issue - State: open - Opened by AinazRafiei 9 months ago - 6 comments

#59 - Rename frk -> deu_latf (ISO 639-3, ISO 15924)

Pull Request - State: closed - Opened by stweil 12 months ago - 15 comments

#57 - grc letters with dot below

Issue - State: open - Opened by nisbet-hubbard about 1 year ago

#56 - θ in Greek book font rendered as swash form

Issue - State: open - Opened by nisbet-hubbard about 1 year ago - 2 comments
Labels: enhancement

#55 - Missing GREEK LUNATE SIGMA SYMBOL in grc and script/Greek models

Issue - State: open - Opened by nisbet-hubbard about 1 year ago - 4 comments
Labels: bug

#54 - Slight modification in Bodhi for incorporating a few unique characters in Drenjongke

Issue - State: open - Opened by bloodgroup-cplusplus over 1 year ago

#53 - Adding Additional Fonts for bodhi and dzongkha

Pull Request - State: open - Opened by bloodgroup-cplusplus over 1 year ago

#52 - Adding additional language Denjongke (sikkimese bhutia) to tesseract language dataset

Issue - State: closed - Opened by bloodgroup-cplusplus over 1 year ago - 3 comments
Labels: enhancement

#51 - Armenian letter և missing in hye language - confirmation

Issue - State: closed - Opened by reneclais over 1 year ago - 1 comment

#50 - Armenian.traineddata contains the missing character, so I suggest to try that model.

Issue - State: closed - Opened by reneclais over 1 year ago - 2 comments
Labels: question

#49 - Missed letter in the hye.traineddata

Issue - State: open - Opened by reneclais over 1 year ago - 3 comments

#48 - English traineddata file does not contain the '±' character?

Issue - State: open - Opened by Furtifk over 2 years ago - 7 comments

#47 - Bontot janda

Issue - State: closed - Opened by Awiemanja over 2 years ago
Labels: invalid

#46 - Add Shan language data

Pull Request - State: open - Opened by ronaldaug over 3 years ago - 2 comments

#45 - Training data should include bullet-like characters

Issue - State: open - Opened by wollmers over 3 years ago

#44 - Added unicharset file to Akkadian language

Pull Request - State: closed - Opened by wincentbalin over 3 years ago - 1 comment

#43 - Update deu.unicharset

Pull Request - State: closed - Opened by OttoKerner over 3 years ago - 3 comments

#42 - Missing some Thai numbers in Thai language (tha)

Issue - State: open - Opened by crossknight almost 4 years ago

#41 - Inherited.unicharset built by copying lines from existing unicharsets

Pull Request - State: open - Opened by Shreeshrii about 4 years ago - 1 comment

#40 - how to train this files to get .traineddata

Issue - State: closed - Opened by josef821 about 4 years ago - 3 comments

#39 - Update asm.wordlist

Pull Request - State: open - Opened by hjkgithub over 4 years ago - 3 comments

#38 - Alternative way to download langdata_lstm master file instead from github

Issue - State: closed - Opened by timjin520 over 4 years ago - 11 comments
Labels: question

#37 - wrong default mapping of some Romanian diacritics

Issue - State: open - Opened by latrau about 7 years ago - 6 comments

#36 - Missing support for Coptic script

Issue - State: open - Opened by stweil almost 5 years ago - 1 comment
Labels: enhancement

#35 - Update desired_characters for fin model

Pull Request - State: open - Opened by jmokoistinen almost 5 years ago

#34 - Update dan/desired_characters based on the Swedish one

Pull Request - State: closed - Opened by poizan42 about 5 years ago - 1 comment

#33 - Add support for Shan language (shn)

Issue - State: closed - Opened by ronaldaug about 5 years ago - 8 comments
Labels: enhancement

#32 - Support for New Reiwa Era Character ㋿ in Japanese

Issue - State: open - Opened by prateek4sep over 5 years ago - 1 comment

#31 - Tesseract fails to detect letters Å and å in Finnish language.

Issue - State: open - Opened by jmokoistinen over 5 years ago - 4 comments

#30 - Add the "@" character please to the list of desired characters

Pull Request - State: closed - Opened by Furtifk about 5 years ago - 2 comments
Labels: enhancement

#29 - Danish traineddata file doesn't include the "@" character

Issue - State: open - Opened by Furtifk about 5 years ago - 9 comments
Labels: bug, enhancement, help wanted

#28 - Trailing spaces on line 27 of eng.punc

Issue - State: open - Opened by juliangilbey over 5 years ago - 4 comments
Labels: question

#27 - Please use more fonts for training Uyghur

Issue - State: open - Opened by gheyret over 5 years ago

#26 - Normalize unicode in texts

Pull Request - State: closed - Opened by stweil over 5 years ago

#25 - Duplicate fonts names in okfonts

Issue - State: closed - Opened by amitdo over 5 years ago - 2 comments
Labels: enhancement

#24 - Please add description for repo - Suggested Text:

Issue - State: open - Opened by Shreeshrii over 5 years ago

#23 - Partially revert commit 02cc8f028532367dd44ba5fb3cbb6ac0bf73d6ad

Pull Request - State: closed - Opened by stweil over 5 years ago - 2 comments

#22 - error related to script data during training

Issue - State: closed - Opened by Shreeshrii over 5 years ago - 9 comments

#21 - Add Apache license file

Pull Request - State: closed - Opened by stweil over 5 years ago - 1 comment

#20 - Fix langdata config for Chinese, Japanese and German

Pull Request - State: closed - Opened by stweil almost 6 years ago - 1 comment

#19 - Move script data to new script subdirectory

Pull Request - State: closed - Opened by stweil almost 6 years ago - 2 comments

#18 - rename kur to kur_ara

Pull Request - State: closed - Opened by Shreeshrii almost 6 years ago - 4 comments

#17 - Apparently Lao\Lao.unicharset Has Uncommitted Changes

Issue - State: closed - Opened by ColdWinterWind about 6 years ago - 1 comment

#16 - tessedit_ocr_engine_mode 1 for san (Sanskrit language, Devanagari script)

Pull Request - State: closed - Opened by Shreeshrii about 6 years ago - 1 comment

#15 - tessedit_ocr_engine_mode 1 for nep (Nepali language, Devanagari script)

Pull Request - State: closed - Opened by Shreeshrii about 6 years ago

#14 - tessedit_ocr_engine_mode 1 for mar (Marathi language, Devanagari script)

Pull Request - State: closed - Opened by Shreeshrii about 6 years ago

#13 - tessedit_ocr_engine_mode 1 for hin (Hindi language, Devanagari script)

Pull Request - State: closed - Opened by Shreeshrii about 6 years ago

#12 - fix unicharset errors

Pull Request - State: closed - Opened by Timilehin about 6 years ago

#11 - update yoruba unicharset

Pull Request - State: closed - Opened by Timilehin about 6 years ago

#10 - improve yoruba training data quality

Pull Request - State: closed - Opened by Timilehin about 6 years ago

#9 - Should we update swe.training_text if new characters are added to desired_characters ?

Issue - State: open - Opened by aslamy about 6 years ago - 1 comment

#8 - update yoruba desired characters

Pull Request - State: closed - Opened by Timilehin about 6 years ago

#7 - tesseract 4.00 Trainging_text iteration failed to respond tp page 3402

Issue - State: open - Opened by YiWenFY about 6 years ago

#6 - Arabic training text is only 80 lines

Issue - State: open - Opened by Shreeshrii about 6 years ago - 2 comments

#5 - Added special characters to swedish desired_characters file

Pull Request - State: closed - Opened by aslamy about 6 years ago - 1 comment

#4 - Missing many special characters in desired_characters file (Swedish)

Issue - State: open - Opened by aslamy about 6 years ago - 9 comments

#3 - Is it possible to add few pre-1918 Russian characters to RUS language files?

Issue - State: open - Opened by alexei-kouprianov over 6 years ago - 4 comments

#2 - change kur_ara to kmr - Kurdish in Latin script - Kurmanji

Pull Request - State: closed - Opened by Shreeshrii over 6 years ago

#1 - Wordlists and training texts contain lots of errors

Issue - State: open - Opened by stweil over 6 years ago - 16 comments

GitHub / tesseract-ocr/langdata_lstm issues and pull requests