Embedded Functional Test Software Engineer

Hyderabad, India
Full Time
Experienced

Summary

We are seeking talented Functional Test Software engineers with embedded systems experience to join our Hyderabad, India team focused on functionally verifying our ML optimized SW/HW solutions.  In this role, you will write test plans and develop software in our automation framework to validate high-speed I/O subsystems, along with system level testing of our solutions with ML workloads.  Background in ML hardware technologies, RDMA, the Linux kernel and Server I/O is highly desired.

Roles and Responsibilities:

  • Write comprehensive test plans that functionally verify components of our solution based on HW and SW architectural specifications
  • Develop software to exercise all test cases for each component
  • Write verification libraries for fabric communication services, network interfaces, GPU, storage, and other server based I/O components
  • Write applications, libraries and kernel modules that stress I/O technology capabilities including those that stress RDMA NIC, NCCL, CUDA and NVLink GPU technology
  • Develop test libraries in Python, C and C++
  • Develop software that integrates with Bazel based build and test environments
  • Develop low-level SW applications to test I/O performance of next-gen compute systems
  • Debug complex system issues in customer use cases
  • Assist other team members with developing test plans and writing verification software

Desired Knowledge and Skill Set:

  • Strong coding skills in multiple languages such as Python, C and C++ 
  • Good knowledge of TCP/IP and RoCE and other networking protocols
  • Knowledge of general packet flow pipelines in silicon
  • Hands on experience with ML Collective Communication and CUDA programming
  • Hands on experience with ML frameworks such as PyTorch and TensorFlow
  • Background in Linux device drivers, memory management, network communications libraries and low-level I/O performance
  • Detailed understanding of server components and applicable drivers for CPUs, memory, GPUs, networking devices and storage
  • Experience building out test framework infrastructure such as equipment provisioning, Linux system config, traffic generators, statistic monitoring, reporting and data capture 
  • Knowledge of configuration and monitoring techniques such as gRPC, gNMI, SNMP, REST, SSH, Prometheus and Grafana
  • Background in highly optimized CI/CD environments
  • Proficient in git and docker usage
  • Linux systems knowledge
  • 5+ years of software development / QA experience working closely with hardware

About Us 

Enfabrica is on a mission to revolutionize AI compute systems and infrastructure at scale through the  development of superior-scaling networking silicon and software which we call the Accelerated Compute Fabric. Founded and led by an executive team assembled from first-class semiconductor and distributed systems/software companies throughout the industry, Enfabrica sets themselves apart from other startups with a very strong engineering pedigree, a proven track record of delivering, deploying and scaling products in data center production environments, and significant investor support for our ambitious journey! Together, with their differentiated approach to solving the I/O bottlenecks in distributed AI and accelerated compute clusters, Enfabrica is unleashing the revolution in next-gen computing fabrics.

Share

Apply for this position

Required*
Apply with
We've received your resume. Click here to update it.
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

Human Check*