Merritt Krakowitzer

About

In my current capacity as Head of Platform Engineering and Gaming Infrastructure I orchestrate a team of talented engineers and guide the strategic roadmap for the development and deployment of robust, scalable infrastructure as well as building self-service platforms that empowers developers, reduces bottlenecks, and boosts productivity. My role entails promoting a culture of innovation and efficiency, implementing industry best practices, and fostering collaboration between development and operations teams. I strive to balance strategic objectives with tactical needs, ensuring our platforms are reliable, secure, and agile to accommodate the fast-paced demands of the digital gaming landscape.

My main areas of focus has been with Infrastructure as Code, Test-Driven Infrastructure, Open Source technologies, and the incorporation of software engineering practices in an operations environment, with a keen interest in the evolving DevOps landscape and collaborating with developers. These diverse experiences laid a strong foundation that propelled me into leadership roles.

Looking ahead, I am keen on exploring opportunities that will enable me to apply my platform engineering expertise and leadership skills at a strategic level. I am particularly interested in Senior Engineering Leadership roles where I can influence the technological direction of an organization and oversee large-scale implementation of robust and scalable infrastructure.

Skills

Leadership

Platform Strategy
Servant Leadership
Innovation
Collaboration
Decision Making
Communication
Team building

Architecture

Infrastructure as code
GitOps
Infrastructure Design and Architecture
SaaS/PaaS/IaaS

Languages, Operating Systems & Tools

linux (RHEL, Centos, Alma)
unix (Solaris, AIX)
git
bash
go
python
puppet
ansible
terraform

Containers Orchestration

kubernetes
docker
freeBSD jails
Solaris zones

Platform Development & Administration

GitHub
prometheus
HAProxy
nginx
apache
tomcat
Cloudflare
logz.io

Data Management

PostgreSQL
Elasticsearch
Cassandra

Containers & Cloud

kubernetes
Docker
AWS
Triton Datacenter

Recent Projects

Cloudflare Implementation in Response to Security Threats

Steered a team in a critical and time-sensitive project to boost security, performance, and reliability of our web infrastructure in response to escalated security attacks on our customer base. The team implemented Cloudflare's comprehensive suite of services, including their Web Application Firewall (WAF), RateLimiting, Bot Management and CDN. Utilized my leadership skills and platform engineering know-how to guide the team towards swift and successful implementation, guaranteeing seamless integration with our existing systems and minimal disruption to users. This strategic response not only significantly fortified our digital security but also improved website performance and overall user experience.

Unified Platform

The aim of this project was to empower our software engineering teams with self-service capabilities. The architecture, leveraging Kubernetes, can be deployed both on AWS, Azure and on-premise, demonstrating our commitment to providing a productised, flexible, environment-agnostic platform. We have integrated our deployments with GitHub Actions and Terraform Cloud, catering to varied business needs and enabling consistency and reliability across the our environments. Our focus on platform engineering as a product-centric discipline has ultimately redefined our workflow, enabling greater speed and autonomy for developers, while ensuring a robust, scalable and secure platform for our multi-tenant SaaS product and customers who require a single tenant environments.

Migration of Jenkins and Gitlab to GitHub

In a significant shift, we successfully migrated from Jenkins to GitHub Actions and transitioned from GitLab to GitHub, bringing all our version control and CI/CD pipelines onto a single cloud-hosted platform. This consolidated approach not only enhanced team collaboration and streamlined processes, but it also eliminated the overhead of managing and maintaining self-hosted solutions. With this, we were able to reallocate resources and focus on our core platform, resulting in accelerated development cycles and improved productivity. The move underlined our commitment to leveraging cloud services to deliver efficient, scalable, and reliable solutions.

Experience

Allwyn Lottery Solutions (formerly Camelot Lottery Solutions)

Head of Platform Engineering and Gaming Infrastructure - January 2020 - Present

In my current role I champion an innovative approach where we treat platform engineering as a product team. In this paradigm, our platform is not just an infrastructural element, but a product in itself, developed and managed to deliver value to our internal customers - our developers and other stakeholders and our external customers’ production gaming environments in America and Europe.

My leadership style is characterised by strategic decision-making with a focus on thinking long term whilst fostering an atmosphere of innovation in a safe to fail environment.

I believe in leading by example, navigating challenges, and driving technological initiatives with a visionary approach.

Lead DevOps Engineer - May 2016 - December 2019

My role was multifaceted and centred around operating the production gaming infrastructure and delivering projects for the Irish National lottery.

I led a small team of engineers, facilitating daily Kanban standups, and providing mentorship to team members.

Key projects included

The design and implementation of an on-premise bare-metal container platform.
Creating self-service infrastructure and application deployments for all our customers gaming infrastructure using rundeck, ansible and terraform.
Transition our monitoring platform from Nagios to Prometheus.

Worked with the engineering teams advising them on best practices and improvements required to scale our applications in production.

Collaborated closely with our software engineering teams in the UK and Greece to resolve production incidents and improve our systems’ operability.

I wrote a number of custom tools and applications in python and golang, examples include:

A prometheus exporter for collecting custom application metrics.
A promtheus exporter for collecting metrics of our Github Action Runners. Source code available here
A prometheus exporter for collecting metrics of our retail and terminal infrastructure.
A promtheus exporter for collecting cloudflare metrics.

May 2016 - Present

First National Bank

Senior Systems Engineer - HomeLoans - July 2015 - February 2016

My primary role was to automate the infrastructure and application components of the continuous delivery pipeline, while ensuring the highest quality deliverables in collaboration with the development team.

My responsibilities included working alongside developers to meet their infrastructure needs and assisting in troubleshooting both application and infrastructure issues. I was integral in automating software deployments via Bamboo and Puppet in collaboration with the release team.

I ensured the quality of our infrastructure codebase through regular code reviews and verifying sufficient test coverage. A key aspect of my role was the integration of our codebase with the development toolchain, comprising Stash, Jira, and Bamboo.

As part of my efforts to share knowledge and promote best practices, I held weekly training sessions with the operations team. We discussed a variety of topics such as infrastructure-as-code with Puppet, test-driven infrastructure with RSpec/serverspec, and containerization technologies.

In managing the Puppet infrastructure, I was responsible for its installation and configuration, implementing best practices for our Puppet codebase, and ensuring adequate monitoring and metrics with tools such as Zabbix, Nagios, Elasticsearch, and Kibana. My role, therefore, embodied a balance of technical expertise, collaboration, and mentorship, all aimed towards delivering quality products.

Senior Technical Specialist - Technical Services Division - November 2011 - June 2015

My work focused on Infrastructure as Code (IaC) and configuration management while providing mentorship to junior and intermediate system administrators.

Key achievements included the implementation of an automated Linux server rollout process using Cobbler, as well as deploying a Puppet-based configuration management system for over 2000 Linux nodes.

I developed custom modules for First National Bank, handling a wide array of configurations including security standards, PCI, SSH, Apache, FileSystem ACLs, Postfix, PostgreSQL, Puppet infrastructure, name resolution, AD authentication, and software like Jira, Crucible, Confluence, Stash, Bamboo, and Maven.

I also deployed community modules for managing functionalities such as host-based firewalls, log rotation, NTP, kernel parameters, nginx, MySQL, PuppetDB, and Tomcat, among others.

My role included scripting to automate tasks, such as VM creation via the VMWare API, monitoring, database refreshes, software installations, and ensuring compliance to baseline standards. This automation made us capable of deploying hundreds of VMs within minutes, pre-configured and ready for use.

I implemented Kanban within the Linux team, executed daily tasks like documentation, troubleshooting, and performance tuning. I also developed a solution to automate the refresh of the Human Resource department’s Oracle database and application servers, reducing the process time from 3-5 days to under an hour.

My dedication and innovation were recognized through several awards, including the FNB Core Technology Solutions Service award in 2013, an FNB Innovation award in 2014, and runner-up for the FNB Core Technology Solutions Maverick award in 2013 and 2014.

November 2011 - February 2016

Liberty Life Insurance

Unix Manager

I led a small, highly focused team in a demanding, high-profile insurance industry setting. We were entrusted with the upkeep of over 180 servers, expected to deliver 99.999% uptime across development, pre-production, and production environments. Our team prioritised process-driven workflows and automation, consistently leveraging tools to streamline our work.

Key projects included designing an automated rollout procedure for our Solaris servers based on SUN Jumpstart Enterprise Toolkit, utilising custom Perl scripts. The process ensured uniformity in installing, customising, securing, and configuring storage and applications on Solaris servers, prepped for customer handover.

I led the migration of Liberty Life’s production servers from end-of-life Fujitsu PrimePower 1500s and SUN E25K’s to three Fujitsu m9000 servers running 30 domains. This migration leveraged our automated rollout procedure.

Implementing Puppet for data centre automation and configuration management was another critical project, enabling consistent configuration across 180 physical Solaris servers and 350 Solaris containers (zones).

In the realm of monitoring, I implemented Nagios, an open-source tool, with over 100 custom scripts, and utilised Cacti, another open-source tool, for capacity planning with custom scripts graphing various OS metrics.

Process documentation was crucial to our operation, establishing standards and procedures to enhance team productivity. I also implemented rigorous security standards that exceeded audit requirements using Solaris tools such as RBAC, BART, IPFilter, and Solaris Auditing.

A Python-based password management system I wrote handled daily OS password changes, securely storing encrypted passwords in a database for secure user access when required.

My role also involved providing senior management with monthly reports covering incidents, problems, project status, and capacity management, while also mentoring intermediate system administrators.

November 2008 - October 2011

Faritec

UNIX Availability Consultant

August 2007 – October 2008

AL Indigo

Sun Solaris Support Analyst - July 2006 – July 2007

Team Leader - South African Post Office - January 2005 – July 2006

Team Leader - AL Indigo - PSIRA - January 2003 – January 2005

July 2003 – July 2007

Fashaf

Systems Administrator

July 2000 – December 2002

Buxton Clothing

Network Administrator

July 1999 – July 2000

Philips S.A.

In-house support

December 1998 – July 1999

Certifications

Certified Kubernetes Administrator (CKA)

In Progress

RedHat Enterprise Performance Tuning - RH442

2014

Red Hat Certificate of Expertise in Server Hardening

2014

Red Hat Certified Virtualization Administrator (RHCVA) 2012

2012

Standard Bank Global Foundation Leadership Programme

2010

Red Hat Certified Engineer (RHCE)

2001 & 2011

SCSA (Sun Certified System Administrator)

2006

SCNA (Sun Certified Network Administrator)

2006

Linux Professional Institute Level

2005