DevOps Production Engineer

RHINO HEALTH

RHINO HEALTH

Software Engineering

Tel Aviv-Yafo, Israel

Posted on May 29, 2026

About Rhino Federated Computing

Rhino Federated Computing solves one of the biggest challenges in AI: seamlessly connecting siloed data through federated computing. The Rhino Federated Computing Platform (Rhino FCP) serves as the ‘data collaboration tech stack’, extending from providing computing resources to data preparation & discoverability, to model development & monitoring — all in a secure, privacy-preserving environment.

To do this, Rhino FCP offers flexible architecture (multi-cloud and on-prem hardware), end-to-end data management workflows (multimodal data, schema definition, harmonization, and visualization), privacy enhancing technologies (e.g., differential privacy), and allows secure deployment of custom code and third-party applications via persistent data pipelines.

Rhino is trusted by more than 60 leading organizations worldwide — including 14 of Newsweek’s “Best Smart Hospitals” and top 20 global biopharma companies — and is leveraging this foundation for financial services, ecommerce, and beyond.

The company is headquartered in Boston, with its main R&D center in Tel Aviv.

About the Role

The Production Engineer will play a key role in ensuring the reliability, performance, and operational excellence of Rhino’s Federated Computing Platform (Rhino FCP). This distributed infrastructure supports cutting-edge AI/ML research and development across highly regulated industries, including healthcare, finance, and life sciences, by enabling secure, privacy-preserving data collaboration worldwide.

You will be responsible for maintaining and improving production environments deployed both in cloud environments and behind customer firewalls. You will work closely with Platform, Backend, and Product teams to ensure systems remain stable, observable, and highly available.

This role focuses heavily on operational ownership, production monitoring, troubleshooting complex environments, incident response, and improving deployment reliability and operational tooling. It is ideal for someone who enjoys solving production challenges, improving system reliability, and building operational excellence in fast-moving environments.

Key Responsibilities

Production Operations & Reliability:
Maintain and support production environments across customer deployments and centralized cloud services, ensuring high availability and operational stability.

Monitoring and Observability:
Develop, improve, and maintain monitoring, alerting, and logging systems to proactively identify issues and improve visibility across distributed systems.

Incident Response and Troubleshooting:
Investigate, troubleshoot, and resolve complex infrastructure and application issues across cloud and on-premises environments, participating in incident management and root cause analysis.

Deployment Management:
Manage and support production deployments, upgrades, and maintenance activities across geographically distributed customer environments.

Operational Excellence:
Identify operational bottlenecks and continuously improve reliability, scalability, automation, and support processes.

Collaboration Across Teams:
Work closely with Backend, DevOps, and Product Engineering teams to support new features, improve operational readiness, and ensure smooth production adoption.

Automation and Tooling:
Contribute to internal tooling and automation efforts that reduce manual operational work and improve deployment and support efficiency.

Preferred Skills

Candidates should have 3–5 years of professional experience with a mix of the experiences described below:

  • 3–5 years of experience in Production Engineering, DevOps, SRE, or similar operational roles
  • Experience working with cloud environments (AWS and/or GCP preferred)
  • Experience operating Kubernetes-based environments
  • Strong Linux administration and troubleshooting skills
  • Experience with Docker and containerized workloads
  • Experience with Python and scripting for automation
  • Experience with Infrastructure-as-Code and configuration management tools (Terraform, Ansible, or similar)
  • Experience implementing and maintaining monitoring and observability systems (Prometheus/VictoriaMetrics, Grafana, alerting systems, logging pipelines, etc.)
  • Experience working with CI/CD tools such as GitHub Actions
  • Experience troubleshooting networking-related issues in distributed environments
  • Familiarity with GitOps concepts and tools (ArgoCD is an advantage)
  • Strong debugging and problem-solving abilities
  • Comfortable working in dynamic startup environments

Advantages

  • Experience supporting AI/ML products or platforms
  • Experience operating distributed systems
  • Experience supporting customer-facing production systems
  • Experience working with security-focused environments or privacy-sensitive workloads
  • Experience with VPN technologies, mTLS, ingress systems, or service networking concepts

The role is open to candidates who are based in Israel (hybrid work environment).