Project description
Monitoring as a Platform, a platform that is the first critical step towards a self-managed infrastructure, and includes capabilities like real-time monitoring, intelligent networks, self-healing and IoT to achieve improved productivity, organizational agility and improved employee experiences.
Responsibilities
- * Develop and maintain architectural blueprints and high‑level system designs.
* Define the overall architecture and set the technical direction for the organization.
* Translate business requirements into feasible, scalable, and robust technical solutions.
* Evaluate the client’s proposed technologies and assess their fit within the overall architecture.
* Recommend appropriate tools, platforms, and architectural approaches.
* Collaborate with engineering, product, and business teams to ensure the architecture meets functional and non-functional requirements.
Provide architectural guidance and mentorship to engineering teams.
* Ensure that systems meet enterprise standards for scalability, performance, reliability, and security.
* Oversee monitoring, observability, and operational health of systems.
* Architect systems leveraging AWS cloud services.
* Integrate AI/ML infrastructure components and automation frameworks into the solution architecture.
* Lead the design and optimization of ELK-based logging systems.
* Guide the implementation of monitoring and visualization platforms such as Grafana.
* Oversee infrastructure-as-code automation using Ansible and Terraform.
* Support containerization and orchestration using Docker and Kubernetes.
SKILLS
Must have
- * 15+ years of experience in designing and documenting architectural blueprints.
* 5+ years specifically in solution architecture or similar roles.
* Knowledge and practical use of enterprise architecture frameworks.
* Expertise in scalable system design and distributed systems.
* Strong expertise in AWS cloud architecture and services.
* Proficiency with Terraform and Ansible.
* Advanced knowledge of monitoring and observability tooling.
* Proficiency in the ELK stack (Elasticsearch, Logstash), including deployment and optimization.
* Experience with Grafana for dashboards and visualization.
* Experience with AI/ML infrastructure and automation frameworks.
* Must-have! experience with the Traditional Machine Learning (Mathematics & Statistics, Core ML Algorithms, Learning Paradigms, Model Evaluation & Validation, Data Preparation, Deployment & MLOps, Tools & Frameworks: Python libraries, ML platforms: TensorFlow, PyTorch)
Experiment tracking & reproducibility)
* Hands-on experience with Docker and Kubernetes.
* Excellent problem-solving skills.
* Strong attention to detail.
* Ability to collaborate across teams and communicate architectural decisions clearly.
Nice to have
.