Staff Software Engineer - Site Reliability and Observability
Who We Are
Teraswitch is on a mission to provide the highest performance, lowest latency bare metal servers in the world. With 20 datacenter locations around the world, Teraswitch has served thousands of customers across 185 countries with our solutions. Founded by Brendan Mannella, Teraswitch is one of the largest privately-held infrastructure companies in the world.
The Job
The Software Engineering Site Reliability Engineer (SRE) is a Software Engineer responsible for ensuring the reliability, scalability, and performance of software systems. Their job profile includes:
System Monitoring and Troubleshooting: Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.
Automation and Infrastructure: Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.
Performance Optimization: Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems.
Incident Response and Root Cause Analysis: Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future.
Collaboration with Development Teams: Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.
Continuous Improvement: Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.
Develop Systems for Internal Developers: Identify areas that can be improved in the Software Development Lifecycle to remove cognitive overhead on developers and help them on the happy path towards developers sustainable, reliable, and resilient software utilizing industry standard practices
Additional Job Description
What You'll Do
Implement scalable, reliable, secure SRE and Observability platform to monitor health of our production system and provide a holistic view of the environment.
Deliver tools/software to improve the reliability, scalability and operability of services.
Collaborate with engineering teams to analyze and provide inputs in architecture, infrastructure resources, observability to achieve reliability and scalability goals.
Serve as a technical leader for key initiatives across the organization, identify potential issues and opportunities, and lead teams to architect the next generation reliability software.
Deliver impact by building software that helps maintain reliability on our backend and frontend systems.
Improve best practices through developing technical implementations that solve multiple developer and business needs.
Participate in 24/7 On-call Rotation of critical systems.
Your Skills & Abilities (Required Qualifications)
7+ years of hands-on SRE experience (software development, systems monitoring) with Software Development experience (Java, golang, python)
Experience building and operating high-availability, fault-tolerant, scalable, distributed software in production: Building monitoring, defining alerts, writing run books, establishing dashboards etc.
Experience with monitoring and logging tools, such as Grafana, Loki, Logstash, Clickhouse, etc
Experience with owning and maintaining software including the SDLC and deployment.
Strong working knowledge of Docker, Kubernetes, Terraform, Chef or Ansible .
Experience troubleshooting production applications driving mitigation and remediation.
BS/MS in Computer Science/Engineering preferred
Compensation and Benefits
Along with competitive pay, as a full-time Teraswitch employee, you are eligible for the following benefits at day 1 of hire:
Health, Dental and Vision Insurance
401k with company profit sharing
Flex PTO and 11 Company Paid Holidays
Recommended Jobs
Licensed Clinical Social Worker (Lower Merion Township)
Licensed Clinical Social Worker LCSW Are you passionate about working with an underserved and rewarding population? Looking for a clinical role without the administrative burden, and one that offers…
Mechanical Design Engineer
Mechanical Design Engineer $ 70,000 - $ 80,000.00 12% 402 K & Quarterly Bonuses [ 10-15 % ] This is an exempt salary position where the employee is responsible for designing and estimating…
Data Scientist - AI & Analytics
Data Scientist - AI & Analytics Adelphi Research | Advanced Market Research Group (AMG) About the Role Adelphi Research is seeking a Research Scientist/Data Scientist to join our Advanced Mar…
Commercial Project Manager
Job Description Job Description Commercial Project Manager Gardner/Fox Associates, Inc. – Bryn Mawr, PA Gardner/Fox is a full-service design/build construction firm founded in 1987 and base…
Wound Care Medical Director - Indiana, Pennsylvania
Wound Care Medical Director Position Summary A healthcare organization is seeking a Wound Care Medical Director to provide clinical leadership and oversight for wound care services. This role…
Floating Assistant Community Manager
Job Description Job Description Description: Position Summary We are seeking a sensational full-time Floating Assistant Community Manager to join our team! Under the general supervision of th…
Quality Assurance Technician
Quality Assurance Technician FULL TIME, York PA Typical Responsibility: The laboratory technician position involves several areas of operation. Color matching, formula adjustment and qualit…
Associate Attorney
Job Summary: Fiffik Law Group, a tech-forward statewide law firm, is seeking a motivated, detail-oriented Associate Attorney to join our growing business and transactional practice. This position …
Rheumatologist - Indiana, Pennsylvania
Rheumatologist Position Summary A healthcare organization is seeking a Board Certified/Board Eligible Rheumatologist to provide comprehensive evaluation, diagnosis, and management of patients…