Senior Lead Engineer - Generative AI Infrastructure (Remote-Eligible)
Company: Capital One
Location: Newark
Posted on: May 1, 2024
|
|
Job Description:
NYC 299 Park Avenue (22957), United States of America, New York,
New York Senior Lead Engineer - Generative AI Infrastructure
(Remote-Eligible) Our mission at Capital One is to create
trustworthy, reliable and human-in-the-loop AI systems, changing
banking for good. For years, Capital One has been leading the
industry in using machine learning to create real-time,
intelligent, automated customer experiences. From informing
customers about unusual charges to answering their questions in
real time, our applications of AI & ML are bringing humanity and
simplicity to banking. Because of our investments in public cloud
infrastructure and machine learning platforms, we are now uniquely
positioned to harness the power of AI. We are committed to building
world-class applied science and engineering teams and continue our
industry leading capabilities with breakthrough product experiences
and scalable, high-performance AI infrastructure. At Capital One,
you will help bring the transformative power of emerging AI
capabilities to reimagine how we serve our customers and businesses
who have come to love the products and services we build. We are
looking for an experienced Sr. Lead Engineer, Generative AI
Infrastructure to help us build the foundations of our AI
capabilities. You will work on a wide range of initiatives, whether
that's building large-scale distributed training clusters, or
deploying LLMs on GPU instances for real-time applications and
decisioning systems, or supporting cutting-edge AI research and
development, all in our public cloud infrastructure. You will work
closely with our cloud and container infrastructure teams as well
as our world-class team of AI researchers to design and implement
key capabilities. Examples of projects you will work on: Deploy a
thousand-node training cluster optimizing storage and networking
stack, with tightly coupled training pipelines to take advantage of
multiple parallelism strategies, in our public cloud. Design and
build fault-tolerant infrastructure to support long-running
large-scale training tasks reliably despite failure of individual
nodes, using containers and check-pointing libraries. Design and
build run-time infrastructure for serving large ML models such as
LLMs and FMs in our public cloud. Build infrastructure for
deploying search indexes and embeddings in vector databases that
will work closely with the rest of our capabilities. Capital One is
open to hiring a Remote Employee for this opportunity Basic
Qualifications: Bachelor's degree in Computer Science, Computer
Engineering or a technical field At least 8 years of experience
designing and building data-intensive solutions using distributed
computing At least 8 years of experience programming with Python,
Go, Scala, or Java At least 1 year of experience with HPCs, vector
embedding, or semantic search technologies At least 1 year of
experience building, scaling, and optimizing training or
inferencing systems for deep neural networks Preferred
Qualifications: Master's or Doctoral degree in Computer science,
Computer Engineering, Electrical engineering, Mathematics, or a
similar field. Background in machine learning with experience in
large scale training and deployment of deep neural nets and/or
transformer architectures. Experience with machine learning
frameworks such as TensorFlow or Pytorch, Lightning, Mosaic ML etc.
Ability to move fast in an environment with ambiguity at times, and
with competing priorities and deadlines. Experience at tech and
product-driven companies/startups preferred. Ability to iterate
rapidly with researchers and engineers to improve a product
experience while building the foundational capabilities.
Familiarity with deploying large neural network models in demanding
production environments. Experience with building GPU clusters in
the public cloud with tightly-coupled storage and networking.
Capital One will consider sponsoring a new qualified applicant for
employment authorization for this position. The minimum and maximum
full-time annual salaries for this role are listed below, by
location. Please note that this salary information is solely for
candidates hired to perform work within one of these locations, and
refers to the amount Capital One is willing to pay at the time of
this posting. Salaries for part-time roles will be prorated based
upon the agreed upon number of hours to be regularly worked. New
York City (Hybrid On-Site): $234,700 - $267,900 for Sr. Lead
Machine Learning Engineer San Francisco, California (Hybrid
On-Site): $248,700 - $283,800 for Sr. Lead Machine Learning
Engineer Remote (Regardless of Location): $198,900 - $227,000 for
Sr. Lead Machine Learning Engineer Candidates hired to work in
other locations will be subject to the pay range associated with
that location, and the actual annualized salary amount offered to
any candidate at the time of hire will be reflected solely in the
candidate's offer letter. This role is also eligible to earn
performance based incentive compensation, which may include cash
bonus(es) and/or long term incentives (LTI). Incentives could be
discretionary or non discretionary depending on the plan. Capital
One offers a comprehensive, competitive, and inclusive set of
health, financial and other benefits that support your total
well-being. Learn more at the Capital One Careers website .
Eligibility varies based on full or part-time status, exempt or
non-exempt status, and management level. This role is expected to
accept applications for a minimum of 5 business days. No agencies
please. Capital One is an equal opportunity employer committed to
diversity and inclusion in the workplace. All qualified applicants
will receive consideration for employment without regard to sex
(including pregnancy, childbirth or related medical conditions),
race, color, age, national origin, religion, disability, genetic
information, marital status, sexual orientation, gender identity,
gender reassignment, citizenship, immigration status, protected
veteran status, or any other basis prohibited under applicable
federal, state or local law. Capital One promotes a drug-free
workplace. Capital One will consider for employment qualified
applicants with a criminal history in a manner consistent with the
requirements of applicable laws regarding criminal background
inquiries, including, to the extent applicable, Article 23-A of the
New York Correction Law; San Francisco, California Police Code
Article 49, Sections 4901-4920; New York City's Fair Chance Act;
Philadelphia's Fair Criminal Records Screening Act; and other
applicable federal, state, and local laws and regulations regarding
criminal background inquiries. If you have visited our website in
search of information on employment opportunities or to apply for a
position, and you require an accommodation, please contact Capital
One Recruiting at 1-800-304-9102 or via email at
RecruitingAccommodationcapitalone.com . All information you provide
will be kept confidential and will be used only to the extent
required to provide needed reasonable accommodations. For technical
support or questions about Capital One's recruiting process, please
send an email to Careerscapitalone.com Capital One does not
provide, endorse nor guarantee and is not liable for third-party
products, services, educational tools or other information
available through this site. Capital One Financial is made up of
several different entities. Please note that any position posted in
Canada is for Capital One Canada, any position posted in the United
Kingdom is for Capital One Europe and any position posted in the
Philippines is for Capital One Philippines Service Corp.
(COPSSC).
Keywords: Capital One, North Bergen , Senior Lead Engineer - Generative AI Infrastructure (Remote-Eligible), Engineering , Newark, New Jersey
Click
here to apply!
|