GitHub / NVIDIA/KAI-scheduler issues and pull requests
#469 - add queue controller operand
Pull Request -
State: open - Opened by enoodle about 11 hours ago
#468 - added configurable plugins hub for podgrouper using interface and RegisterPlugins
Pull Request -
State: open - Opened by natasharomm about 12 hours ago
#466 - fix: simplify name to admission
Pull Request -
State: open - Opened by SiorMeir about 15 hours ago
#465 - adding admission operand
Pull Request -
State: closed - Opened by enoodle about 16 hours ago
- 1 comment
#462 - Time aware fairness: prometheus time decay
Pull Request -
State: open - Opened by itsomri 1 day ago
#461 - Track enhancement for optimizing job order reflection
Issue -
State: open - Opened by singh1203 1 day ago
Labels: enhancement
#460 - If a podgroup has scheduled pod and a required topology constraint, t…
Pull Request -
State: open - Opened by davidLif 3 days ago
#457 - Fixed a bug where workload with subgroups would not consider addition…
Pull Request -
State: closed - Opened by romanbaron 3 days ago
- 3 comments
#456 - test: expand test window and add logs
Pull Request -
State: open - Opened by SiorMeir 3 days ago
#455 - using values from constants.go in options
Pull Request -
State: closed - Opened by enoodle 4 days ago
- 1 comment
#453 - Fix changelog v0.8.3
Pull Request -
State: closed - Opened by itsomri 4 days ago
#451 - move crds into helm chart
Pull Request -
State: closed - Opened by enoodle 4 days ago
- 5 comments
#450 - removing unused arguments from admission
Pull Request -
State: closed - Opened by enoodle 4 days ago
- 1 comment
#448 - Bump k8s.io/kubernetes from 1.32.6 to 1.32.7
Pull Request -
State: closed - Opened by dependabot[bot] 6 days ago
Labels: dependencies, go
#445 - Add job order reflection plugin with HTTP endpoint
Pull Request -
State: closed - Opened by singh1203 7 days ago
- 9 comments
#443 - Not using summerized MinAvailable for PodGroup, but look per SubGroup
Pull Request -
State: closed - Opened by omer-dayan 8 days ago
- 1 comment
#442 - Revert skip on pr ci md v0.6
Pull Request -
State: closed - Opened by itsomri 8 days ago
#440 - Skip on-pr workflow on .md file changes
Pull Request -
State: open - Opened by itsomri 9 days ago
#438 - Removed gpu operator as dependency
Pull Request -
State: closed - Opened by itsomri 9 days ago
- 1 comment
#437 - Remove gpu operator dependency v0.6
Pull Request -
State: closed - Opened by itsomri 9 days ago
#436 - Fixed changelog v0.8.2
Pull Request -
State: closed - Opened by itsomri 9 days ago
#435 - Time aware fairness usage: prometheus client
Pull Request -
State: open - Opened by itsomri 9 days ago
#434 - Roman/subgroups e2e
Pull Request -
State: closed - Opened by romanbaron 10 days ago
- 1 comment
#433 - Update gpu operator import
Pull Request -
State: open - Opened by itsomri 10 days ago
#432 - v0.4 - cherry pick mig and memory fixes
Pull Request -
State: open - Opened by enoodle 10 days ago
#431 - update changelog
Pull Request -
State: closed - Opened by enoodle 10 days ago
#430 - v0.6 - cherry pick mig and memory fixes
Pull Request -
State: closed - Opened by enoodle 10 days ago
#429 - Roman/subgroup pod status cache
Pull Request -
State: closed - Opened by romanbaron 10 days ago
- 2 comments
#427 - Extending hierarhical podgroup structure to support multiple levels o…
Pull Request -
State: open - Opened by romanbaron 11 days ago
#426 - Default unlimited queue upon first installation
Issue -
State: open - Opened by romanbaron 11 days ago
Labels: enhancement, help wanted
#425 - add kai config crd
Pull Request -
State: closed - Opened by enoodle 12 days ago
- 1 comment
#422 - Fix: incorrect scheduling decision and calculation when using MIG
Pull Request -
State: closed - Opened by hello2mao 13 days ago
- 2 comments
#421 - fix: status updater deleted pod group race
Pull Request -
State: closed - Opened by enoodle 13 days ago
- 1 comment
#420 - IsInferencePreemptible cleanup
Pull Request -
State: closed - Opened by romanbaron 14 days ago
- 1 comment
#415 - Added prometheus resource usage client
Pull Request -
State: closed - Opened by itsomri 14 days ago
- 1 comment
#413 - 0.6 - fix: leader election use nodepool value for lock name
Pull Request -
State: open - Opened by enoodle 14 days ago
#412 - Basic topology e2e test
Pull Request -
State: closed - Opened by davidLif 14 days ago
- 1 comment
#410 - Added a new scheduler flag . When enabled, a DisruptionTarget conditi…
Pull Request -
State: closed - Opened by romanbaron 15 days ago
- 1 comment
#409 - fix: on-pr.yaml trim 0 from git rev
Pull Request -
State: closed - Opened by enoodle 15 days ago
#405 - Changelog prep for v0.8.0
Pull Request -
State: closed - Opened by romanbaron 16 days ago
#403 - Fixed getNumOfTasksToAllocatePerSubGroup so it will account tasks wit…
Pull Request -
State: closed - Opened by romanbaron 17 days ago
- 2 comments
#402 - High availability support
Issue -
State: closed - Opened by romanbaron 18 days ago
Labels: enhancement
#400 - Ephemeral-Storage in MaxNodePoolResources Uses Raw Bytes (×1000 Off) in Error Message
Issue -
State: open - Opened by JosefNagelschmidt 20 days ago
Labels: bug
#398 - v0.6 - add runtime class support
Pull Request -
State: closed - Opened by enoodle 20 days ago
#395 - Updated oauth2
Pull Request -
State: open - Opened by itsomri 20 days ago
#394 - Update golang.org/x/oauth2 to v0.28.0
Pull Request -
State: closed - Opened by itsomri 20 days ago
#393 - Allow reclaiming with lower utilization ratio (#374)
Pull Request -
State: open - Opened by enoodle 20 days ago
#392 - v0.6 - reclaimable util ratio
Pull Request -
State: open - Opened by enoodle 20 days ago
#387 - Fixed unnecessary string evaluations
Pull Request -
State: open - Opened by itsomri 23 days ago
#386 - Time aware fairness client plugins
Pull Request -
State: closed - Opened by itsomri 23 days ago
- 6 comments
#384 - Set PodGroup before adding tasks to job info
Pull Request -
State: open - Opened by romanbaron 24 days ago
#383 - fix: humanize message and add tests
Pull Request -
State: closed - Opened by SiorMeir 24 days ago
- 3 comments
#382 - Fixed eviction_info tests
Pull Request -
State: closed - Opened by romanbaron 25 days ago
#381 - Added subgroups support to stale gang eviction action
Pull Request -
State: closed - Opened by romanbaron 25 days ago
- 3 comments
#380 - support runtimeClasses
Pull Request -
State: open - Opened by enoodle 26 days ago
#379 - Deprecate `isInferencePreemptible` flag and related configuration
Pull Request -
State: open - Opened by singh1203 26 days ago
#378 - KAI scheduler not working with AWS EKS Auto Mode
Issue -
State: open - Opened by msaeedevops 27 days ago
#377 - Adjusted grove podGrouper plugin to use subgroups
Pull Request -
State: open - Opened by romanbaron 27 days ago
#376 - fix: use global.securityContext in crd-upgrader
Pull Request -
State: closed - Opened by simoncampion 27 days ago
- 2 comments
#375 - Add topology plugin prePredicate unitests
Pull Request -
State: closed - Opened by davidLif 28 days ago
#374 - Allow reclaiming with lower utilization ratio
Pull Request -
State: closed - Opened by enoodle 28 days ago
- 3 comments
#373 - RuntimeClass Overhead resources and Scheduling constraints are not considered
Issue -
State: closed - Opened by enoodle 28 days ago
Labels: bug
#372 - Refactor subg
Pull Request -
State: open - Opened by omer-dayan 28 days ago
#369 - Fix topology plugin typos
Pull Request -
State: open - Opened by davidLif 29 days ago
#368 - Roman/subgroups eviction info
Pull Request -
State: closed - Opened by romanbaron 29 days ago
- 4 comments
#367 - Utilize SubGroupOrderFn in allocation_info.go for partial allocation …
Pull Request -
State: closed - Opened by romanbaron 30 days ago
- 3 comments
#366 - Few renames in topology_plugin_job_filtering.go
Pull Request -
State: open - Opened by romanbaron 30 days ago
#363 - Add unit test for createResourceReservationPod to cover runtimeClassName and metadata
Pull Request -
State: open - Opened by singh1203 30 days ago
#362 - Add unit test for `createResourceReservationPod`
Issue -
State: open - Opened by singh1203 30 days ago
Labels: enhancement
#360 - Explicitly set runtimeClassName to `nvidia` for GPU reservation pods
Pull Request -
State: closed - Opened by singh1203 about 1 month ago
- 2 comments
#359 - [WIP] Min Member default subgroup
Pull Request -
State: open - Opened by omer-dayan about 1 month ago
#358 - Metrics db requirements
Pull Request -
State: closed - Opened by itsomri about 1 month ago
#357 - Added SubGroupOrder plugin
Pull Request -
State: closed - Opened by romanbaron about 1 month ago
- 2 comments
#356 - predicate and nodeOrder for the topology plugin + topology result caching - topology job filter PRs part 4
Pull Request -
State: closed - Opened by davidLif about 1 month ago
- 2 comments
#355 - Register predicate and nodeOrder functions for the topology plugin - - topology job filter PRs part 4
Pull Request -
State: closed - Opened by davidLif about 1 month ago
#354 - Given topology domains AllocatablePods, calculate the domains to be used for job allocation - topology job filter PRs part 3
Pull Request -
State: closed - Opened by davidLif about 1 month ago
- 3 comments
#352 - Topology structs updates - topology job filter PRs part 1
Pull Request -
State: open - Opened by davidLif about 1 month ago
#351 - Added SubGroups support in allocation_info.go and jobSolver
Pull Request -
State: open - Opened by romanbaron about 1 month ago
#350 - Siormeir/feat-create-admission-webhooks-service-phase-3
Pull Request -
State: closed - Opened by SiorMeir about 1 month ago
- 6 comments
#347 - Take SubGroups into consideration when checking if job is ready for s…
Pull Request -
State: closed - Opened by romanbaron about 1 month ago
- 2 comments
#346 - Fair share changelog
Pull Request -
State: open - Opened by itsomri about 1 month ago
#345 - Refactored DeleteTaskInfo
Pull Request -
State: closed - Opened by romanbaron about 1 month ago
- 1 comment
#344 - Natasha/queue metrics fixes
Pull Request -
State: open - Opened by natasharomm about 1 month ago
#342 - Added SubGroups to PodGroup CRD and SubGroupInfo to PodGroupInfo
Pull Request -
State: open - Opened by romanbaron about 1 month ago
#341 - Small refactoring in job_info, preparing for SubGroups
Pull Request -
State: open - Opened by romanbaron about 1 month ago
#340 - NVIDIA runtime class in cluster
Issue -
State: open - Opened by romanbaron about 1 month ago
Labels: enhancement
#336 - cherry pick - validate priorityclass exists in podgrouper
Pull Request -
State: closed - Opened by natasharomm about 1 month ago
#334 - [WIP] filter job by topology - step 1
Pull Request -
State: open - Opened by davidLif about 1 month ago
#333 - Updated the oauth2 dependency due to GHSA-6v2p-p543-phr9 - v0.6
Pull Request -
State: closed - Opened by davidLif about 1 month ago
#332 - Updated the oauth2 dependency due to GHSA-6v2p-p543-phr9
Pull Request -
State: closed - Opened by davidLif about 1 month ago
#330 - Natasha/validate priority exists for podgroup
Pull Request -
State: closed - Opened by natasharomm about 1 month ago
- 3 comments
#329 - Removed PDBs from snapshot
Pull Request -
State: open - Opened by romanbaron about 2 months ago
#327 - 0.6 - cherry pick fix resources checks
Pull Request -
State: open - Opened by enoodle about 2 months ago
#323 - Time aware fairness proposal - tests
Pull Request -
State: open - Opened by itsomri about 2 months ago
#321 - design priority based fair share distribution
Pull Request -
State: closed - Opened by enoodle about 2 months ago
#319 - v0.4 cherrypick of https://github.com/NVIDIA/KAI-Scheduler/pull/313
Pull Request -
State: closed - Opened by davidLif about 2 months ago
#318 - v0.6 cherrypick of https://github.com/NVIDIA/KAI-Scheduler/pull/313
Pull Request -
State: closed - Opened by davidLif about 2 months ago
#316 - Remove unnecessary deepcopy to optimize performance
Pull Request -
State: closed - Opened by hello2mao about 2 months ago
- 2 comments
#315 - [Performance] Scheduling completes in 2–3 seconds on large clusters
Issue -
State: open - Opened by hello2mao about 2 months ago
#314 - Added rbac for Grove PodCliqueScalingGroup
Pull Request -
State: open - Opened by sanjaychatterjee about 2 months ago