Solution Architect AI infrastructure & Private Cloud at Sofia, Bulgaria
Job role insights
-
Date posted
April 15, 2026
-
Closing date
April 15, 2027
-
Hiring location
Bulgaria Sofia
-
Career level
Senior
-
Qualification
Bachelor Degree Master’s Degree
-
Experience
10+ Years
-
Gender
m/f/d m/w/d
Description
Job Description:
Required Skills:
1. HPC & AI Infrastructure
· Extensive knowledge of HPC technologies and workload scheduler such as Slurm and/or Altair PBS Pro,
· Proficient in HPC cluster management tools, including HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
· Experience with HPC cluster managers like HPE Cluster Management (HPCM) and/or NVIDIA Base Command Manager.
· Good understanding with high-speed networking stacks (InfiniBand, Mellanox) and performance tuning of HPC components.
· Solid grasp of high-speed networking technologies, such as InfiniBand and Ethernet.
2. Containerization & Orchestration
· Extensive hands-on experience with containerization technologies such as Docker, Podman, and Singularity
· Proficiency with at least two container orchestration platforms: CNCF Kubernetes, Red Hat OpenShift, SUSE Rancher (RKE/K3S), Canonical Charmed Kubernetes.
· Strong understanding of GPU technologies, including the NVIDIA GPU Operator for Kubernetes-based environments and DCGM (Data Center GPU Manager) for GPU health and performance monitoring.
3.Operating Systems & Virtualization
· Extensive experience in Linux system administration, including package management, boot process troubleshooting, performance tuning, and network configuration.
· Proficient with multiple Linux distributions, with hands-on expertise in at least two of the following: RHEL, SLES, and Ubuntu.
· Experience with virtualization technologies, including KVM and OpenShift Virtualization, for deploying and managing virtualized workloads in hybrid cloud environments.
4. Cloud, DevOps & MLOps
· Solid understanding of hybrid cloud architectures and experience working with major cloud platforms in conjunction with on-premises infrastructure.
· Familiarity with DevOps practices, including CI/CD pipelines, infrastructure as code (IaC), and microservices-based application delivery.
· Experience integrating and operationalizing open-source AI/ML tools and frameworks, supporting the full model lifecycle from development to deployment.
· Good understanding of cloud-native security, observability, and compliance frameworks, ensuring secure and reliable AI/ML operations at scale.
5. Networking & Protocols
· Strong understanding of core networking principles, including DNS, TCP/IP, routing, and load balancing, essential for designing resilient and scalable infrastructure.
· Working knowledge of key network protocols, such as S3, NFS, and SMB/CIFS, for data access, transfer, and integration across hybrid environments.
6. Programming & Automation
· Proficiency in scripting or programming languages such as Python and Bash.
· Experience automating infrastructure and AI workflows.
7. Soft Skills & Leadership
· Excellent problem-solving, analytical thinking, and communication skills for engaging both technical and non-technical stakeholders.
· Proven ability to lead complex technical projects from requirements gathering through architecture, design, and delivery.
· Strong business acumen with the ability to align technical solutions with client challenges and objectives.
Qualifications:
· Bachelor’s/master’s degree in computer science, Information Technology, or a related field.
· Professional certifications in AI Infrastructure, Containers and Kubernetes are highly desirable —such as RHCSA, RHCE, CNCF certifications (CKA, CKAD, CKS), NVIDIA-Certified Associate - AI Infrastructure and Operations
· Typically, 8–10 years of hands-on experience in architecting and implementing HPC, AI/ML, and container platform solutions within hybrid or private cloud environments, with a strong focus on scalability, performance, and enterprise integration.
Interested in this job?
365 days left to apply
