GitHub / bigcode-project/bigcode-analysis issues and pull requests
#46 - Update README.md
Pull Request -
State: closed - Opened by christiancopeland about 2 years ago
#46 - Update README.md
Pull Request -
State: closed - Opened by christiancopeland about 2 years ago
#45 - Update README.md
Pull Request -
State: closed - Opened by christiancopeland about 2 years ago
#43 - download dataset from kaggle
Pull Request -
State: open - Opened by xu3kev over 2 years ago
- 1 comment
#42 - Pull Requests
Pull Request -
State: open - Opened by loubnabnl over 2 years ago
#41 - kaggle dataset
Pull Request -
State: open - Opened by loubnabnl over 2 years ago
#40 - Stackoverflow processing
Pull Request -
State: open - Opened by loubnabnl over 2 years ago
#39 - [WIP] textbooks filtering
Pull Request -
State: open - Opened by loubnabnl over 2 years ago
#38 - [WIP] code reviews dataset
Pull Request -
State: open - Opened by loubnabnl over 2 years ago
#37 - Add pdf of Miro board of MozFest
Pull Request -
State: closed - Opened by harm-devries over 2 years ago
#36 - Chinchilla analysis
Pull Request -
State: closed - Opened by harm-devries almost 3 years ago
#35 - add scaling laws notebook
Pull Request -
State: closed - Opened by lvwerra almost 3 years ago
- 4 comments
#34 - Data inspection
Pull Request -
State: closed - Opened by harm-devries about 3 years ago
#33 - add github issues analysis notebook
Pull Request -
State: closed - Opened by loubnabnl about 3 years ago
#32 - Add unimax exploration notebook
Pull Request -
State: closed - Opened by harm-devries about 3 years ago
- 1 comment
#31 - Issues language identifier
Pull Request -
State: closed - Opened by Muhtasham about 3 years ago
#30 - Minhash Improvement
Pull Request -
State: closed - Opened by ChenghaoMou about 3 years ago
- 1 comment
#29 - add kenlm experiment
Pull Request -
State: closed - Opened by lvwerra over 3 years ago
#28 - update readmes of filtering methods
Pull Request -
State: closed - Opened by loubnabnl over 3 years ago
#27 - add code preprocessing and comment to code notebook
Pull Request -
State: closed - Opened by loubnabnl over 3 years ago
#26 - Email regex modified
Pull Request -
State: closed - Opened by paulovn over 3 years ago
#25 - add PII detection pipeline and analysis notebooks
Pull Request -
State: closed - Opened by loubnabnl over 3 years ago
#24 - Use detect-secrets to scan secrets (WIP)
Pull Request -
State: closed - Opened by liyongsea over 3 years ago
- 2 comments
#23 - Evaluate CodeGen on safe and all-license dataset
Issue -
State: closed - Opened by harm-devries over 3 years ago
- 3 comments
Labels: good first issue
#22 - MQA experiments on AWS SageMaker Lab
Pull Request -
State: closed - Opened by ocramz over 3 years ago
- 4 comments
#21 - requirements uses the right branch of transformers
Pull Request -
State: closed - Opened by ocramz over 3 years ago
#20 - cannot import AttentionType from gpt2
Issue -
State: closed - Opened by ocramz over 3 years ago
#19 - [Decontamination] Add readme and instructions to run substring decontamination
Issue -
State: closed - Opened by RaymondLi0 over 3 years ago
- 1 comment
#18 - update readme and requirements
Pull Request -
State: closed - Opened by ChenghaoMou over 3 years ago
#17 - Reorganize data analysis folder and update readmess
Pull Request -
State: closed - Opened by loubnabnl over 3 years ago
#16 - add subtsring decontamination
Pull Request -
State: closed - Opened by RaymondLi0 over 3 years ago
#15 - github scraping speed limit
Issue -
State: open - Opened by bigximik over 3 years ago
#14 - Add decontamination code
Pull Request -
State: closed - Opened by ChenghaoMou over 3 years ago
- 3 comments
#13 - Decontamination
Issue -
State: closed - Opened by ChenghaoMou over 3 years ago
- 9 comments
#12 - Broken link
Issue -
State: closed - Opened by Sleepyhead01 over 3 years ago
- 1 comment
#11 - Adding alternative minhash script
Pull Request -
State: closed - Opened by ChenghaoMou over 3 years ago
- 15 comments
#10 - [Near Deduplication] Tokenization
Issue -
State: open - Opened by ChenghaoMou over 3 years ago
- 2 comments
#9 - [Near Deduplication] Post processing
Issue -
State: open - Opened by ChenghaoMou over 3 years ago
#8 - [Exact Substring Deduplication] Analysis
Issue -
State: open - Opened by ChenghaoMou over 3 years ago
- 1 comment
#7 - [Near Deduplication] Benchmark
Issue -
State: open - Opened by ChenghaoMou over 3 years ago
- 2 comments
#6 - Create CONTRIBUTING.md
Pull Request -
State: closed - Opened by lvwerra over 3 years ago
#5 - Add filtering to the near deduplicated safe dataset
Issue -
State: closed - Opened by loubnabnl over 3 years ago
- 1 comment
Labels: data curation
#4 - Multi query experiments
Pull Request -
State: closed - Opened by bigximik over 3 years ago
#3 - Reorganize bigcode-data-analysis repository
Issue -
State: closed - Opened by loubnabnl over 3 years ago
- 1 comment
Labels: documentation, enhancement
#2 - Rename model names on HF hub
Issue -
State: closed - Opened by harm-devries over 3 years ago
- 1 comment
Labels: documentation, enhancement
#1 - Upload github dataset with license column
Issue -
State: closed - Opened by harm-devries over 3 years ago
Labels: enhancement, data curation