Overview

Technical standards—shared expectations for definitions, performance of products and systems, and testing—underpin nearly all aspects of technology and manufacturing globally. Evaluations check whether those expectations are being met through tests, measurements, audits, and certifications. Together, standards and evaluations help ensure that technologies are safe, reliable, and compatible as they scale across sectors and borders.

Their impact often goes unnoticed but is ever-present: firehoses fit every hydrant in the country, credit cards work at ATMs worldwide, blood tests produce reliable results across different labs, and electrical outlets deliver consistent voltage. This is all because organizations defined shared technical specifications and verified compliance with them. The International Organization for Standardization alone has published over 25,000 international standards, covering everything from screw thread dimensions to AI risk management.

Standards and evaluations play a central role in emerging technology governance. For AI, standards help define terminology, set performance and safety expectations, and establish testing methodologies, while evaluations assess whether systems meet those expectations in practice. For biotechnology, standards govern everything from how labs handle dangerous pathogens to how experimental results are documented and reproduced, while evaluations verify that facilities and researchers meet those requirements. In both fields, foundational questions like what “safe” or “trustworthy” means in measurable terms are still being worked out.

This guide explains how the US government develops and uses technical standards and evaluations, how they intersect with emerging technology policy, outlines key institutions and processes, and discusses considerations and opportunities for working in this space.

Why does the government use technical standards and evaluations?

Government involvement in technical standards serves several overlapping purposes:

  1. Reducing interoperability and coordination costs: Without shared technical language, government agencies, companies, and other organizations risk building incompatible systems. Standards provide common definitions, data formats, measurement methods, and interfaces that let organizations coordinate more effectively, preventing every program from having to invent bespoke solutions. 
  2. Making other policy tools enforceable. Regulations, procurement requirements, and grant conditions often need technical specificity, which standards can help supply. Regulators often use existing standards rather than developing their own technical requirements from scratch.
  3. Building trust in products and systems. When technologies are novel and vendor claims are hard to verify, buyers, regulators, and the public need a basis for confidence. Evaluations (e.g. tests, audits, benchmarks) can verify performance and safety and provide a basis for comparing products and systems. The National Highway Traffic Safety Administration (NHTSA), for example, evaluates vehicle safety using standardized testing protocols that produce comparable evidence across manufacturers.
  4. Filling gaps where regulation is premature or impractical. For emerging technologies, regulation may lag behind development. Standards and evaluations can shape industry practice in the interim, influencing behavior without the force of law. This is especially relevant for AI and biotechnology, where agencies like the National Institute for Standards and Technology (NIST) have published voluntary risk management frameworks that are widely adopted despite carrying no legal mandate. Although technically voluntary, such standards can become effectively binding when agencies reference them in procurement requirements, grant conditions, or regulatory guidance—meaning that organizations must comply to win contracts, receive funding, or satisfy regulators. When effective, these voluntary standards can also lay the groundwork for future regulation by establishing the technical foundations that rules eventually build on.

Technical standards and evaluations basics 

What are technical standards?

A technical standard is a shared set of requirements, definitions, or specifications that guide how a product, process, or system should perform, communicate, or be measured. Standards let multiple organizations build compatible systems and let buyers and regulators compare like with like.

Standards come in several forms, including:

  • Terminology and definition standards align meaning for key terms so that different institutions are talking about the same thing.
  • Performance standards set measurable targets (e.g. accuracy, latency, reliability) and acceptable thresholds.
  • Safety and risk standards specify hazard controls, safety margins, and risk management practices.
  • Data and interoperability standards define data formats and interfaces so that systems can exchange information reliably.
  • Measurement and test standards define how evaluators should measure performance, safety, and uncertainty.

In practice, many standards blend several of these functions: a single standard might define terminology, set performance thresholds, and specify the test methods used to verify compliance. 

While the term “standards” formally refers to consensus documents produced by accredited standards development organizations (SDOs) through structured, multi-year processes involving industry, government, and international partners, many use the term to describe a broader set of instruments, including agency-published frameworks, federal guidance, evaluation protocols, and best practices.

This guide covers both, but the distinction matters: formal consensus standards carry wide legitimacy and durability but can take years to develop, while best practices, frameworks and evaluation approaches can be updated much more quickly. This is an important advantage for fast-moving fields like AI, where the technology may evolve far faster than any multi-year consensus process can effectively track.

What are evaluations?

An evaluation is a structured process for generating evidence about whether a system, product, or practice meets a standard or other stated claim. Depending on the domain, evaluations can test performance, safety, robustness, security, compliance, or other characteristics.

Evaluations typically rely on some combination of:

  • Test protocols: Step-by-step procedures that define inputs, conditions, and scoring rules (e.g. crash test protocols used to rate vehicle safety).
  • Benchmarks: Curated tasks or scenarios designed to test a specific capability or risk (e.g. standardized test sets used to compare AI system performance).
  • Reference data and materials: Shared baselines that ensure measurements are comparable across different evaluators and labs.
  • Reporting requirements: Documentation standards that support reproducibility and allow others to verify results.

In many government settings, evaluation results gain force through conformity assessment—the processes that make verification credible, repeatable, and usable for decision-making. Conformity assessment typically works in three layers:

  • Testing and audits generate evidence about whether a system meets requirements.
  • Certification packages that evidence into a formal signal that buyers and regulators can act on.
  • Accreditation verifies that the labs and certifiers performing these checks are themselves competent and consistent.

The lifecycle of standards and evals

Most standards and evals work follows roughly the same pathway. If you understand this chain, you can usually “place” any institution, document, or job in the ecosystem.

  1. Problem identification: An agency, industry group, or standards committee identifies a coordination, safety, security, or quality problem that needs a shared solution.
  2. Standard development: A working group defines requirements, interfaces, and measurement or test methods to address the problem.
  3. Evaluation: A lab, assessor, or internal team tests whether systems meet the relevant requirements or claims.
  4. Adoption: A decision-maker ties compliance or results to a consequential decision (e.g. procurement, regulation, certification, grant conditions, or licensing).
  5. Revision: Adoption changes behavior over time; failures and edge cases trigger updates to the standard.

Who makes standards?

Standards can originate directly from government agencies, from non-governmental standards development organizations (SDOs), or commonly through some combination, where agencies develop technical foundations that SDOs then incorporate into formal consensus standards. 

Below are some of the key institutions for emerging technology policy.

International and multilateral bodies:

  • International Organization for Standardization (ISO): A global, non-governmental organization that develops voluntary consensus standards across nearly every industrial and technological domain, with over 25,000 active standards. ISO standards shape global expectations for quality, safety, and interoperability.
  • International Electrotechnical Commission (IEC): Works alongside ISO on standards for electrical, electronic, and related technologies, including joint AI and cybersecurity standards.
  • International Telecommunication Union (ITU): A UN agency that develops standards for information and communication technologies, with growing work on AI and digital infrastructure.

US coordinating bodies:

  • American National Standards Institute (ANSI): A non-governmental organization that coordinates the US voluntary standards system. ANSI doesn’t usually write standards itself; rather, it accredits US standards developers and represents US interests in ISO and IEC.1

US government:

  • National Institute of Standards and Technology (NIST): A federal agency under the Department of Commerce that develops measurement methods, reference materials, test protocols, and evaluation frameworks. NIST is not a regulatory agency, but its work frequently becomes the technical backbone for agency programs, procurement requirements, and regulatory guidance.
  • Other federal agencies: The Food and Drug Administration (FDA), Department of Defense (DOD), the Centers for Disease Control and Prevention (CDC), and others also create and adopt standards in their domains. Their roles are covered more in the sections below.

Industry and professional standards bodies:

  • IEEE Standards Association, HL7, and hundreds of other sector-specific organizations develop voluntary consensus standards through committees that include industry, government, and academic participants. Key bodies for AI and biosecurity are listed in the emerging technology section below.

How US agencies adopt and use standards

In US practice, agencies use standards in policymaking and procurement in several common ways:

  • Incorporating standards into a regulation, either by reference (using the standard as-is) or as a starting point when drafting new requirements (e.g. OSHA incorporates ANSI standards for protective equipment directly into its workplace safety rules, meaning employers must comply with those standards as a legal requirement).
  • Treating compliance with a standard as an accepted way to satisfy a broader regulatory requirement2 (e.g. FDA allows medical device manufacturers to demonstrate safety by conforming to recognized consensus standards, rather than independently proving each requirement from scratch).
  • Relying on widespread industry adoption of a standard, while keeping formal regulation available if voluntary compliance proves insufficient (e.g. NIST’s Cybersecurity Framework was widely adopted by critical infrastructure operators voluntarily before elements of it were referenced in binding requirements).

Agencies also reference standards in contract language and vendor qualification requirements, meaning companies must meet referenced standards to win or keep federal contracts. This can effectively create mandatory requirements without new regulation. For example, DOD’s Cybersecurity Maturity Model Certification (CMMC) requires defense contractors to meet specific cybersecurity standards to be eligible for contracts.

Several commitments and directives reinforce government use of standards. The WTO Technical Barriers to Trade Agreement instructs countries to base technical regulations on international standards when those standards meet domestic needs. In the US, the National Technology Transfer and Advancement Act and OMB Circular A-119 direct agencies to use voluntary consensus standards whenever feasible, rather than writing unique government standards from scratch. According to the ISO, the US federal government is the largest single creator and user of specifications and standards, with more than 44,000 statutes, technical regulations, or purchasing specifications.

Beyond regulation and procurement, standards and evaluations also intersect with other major policy tools:

  • Federal R&D funding: Agencies can tie grants to shared performance metrics, standardized test protocols, or agreed reporting formats, shaping both what gets funded and how research is conducted and measured.
  • Trade and geopolitics: International standards can function as technical barriers to trade. When a country adopts a standard as a market requirement, foreign companies that don’t meet it face higher costs to enter that market, even without tariffs or explicit restrictions. This enables countries that lead in setting standards to shape global markets in their favor.

Together, these channels explain why standards work often produces outsized downstream effects without new laws or regulations.

Standards & evaluations for emerging technology

Emerging technologies create substantial uncertainty for standard-setting: rapid iteration, unclear failure modes, dual-use risks, and weak consensus on what constitutes “safe” or “effective”. Traditional standards processes, built for mature industries where the science is settled, often struggle to keep pace. In response, US agencies have leaned on a few practical approaches: publishing voluntary risk-management frameworks and guidance, developing measurement and test methods, building evaluation capacity within government, and using procurement and funding conditions to encourage adoption.

AI

AI standards work looks more fragmented than in mature engineering fields. No single body controls AI standard-setting end to end, and many “standards-like” documents take the form of risk frameworks, evaluation protocols and best practices, or benchmarks. In recent years, NIST has played a central coordinating role in the US by publishing risk-management frameworks and profiles, developing measurement and evaluation methods and best practices, and contributing technical expertise to broader standards efforts.

NIST’s Center for AI Standards and Innovation (CAISI) leads federal AI evaluations, establishes voluntary agreements with frontier AI developers for pre-deployment testing, and coordinates with DOD, DHS, and the intelligence community on national security-related AI assessments.3 In February 2026, CAISI launched the AI Agent Standards Initiative to develop standards for agentic AI systems. NIST’s Information Technology Laboratory (ITL) also contributes to domestic and international standards initiatives.

Beyond NIST, several non-governmental and international organizations play key roles in AI standards, evaluation, and governance (illustrative examples, not comprehensive):

Below is a timeline of key federal and international actions shaping how AI systems are standardized, evaluated, and governed, along with a compilation of major AI-related standards by issue area.

Biosecurity and life sciences

Biosecurity relies on a patchwork of lab practices, clinical standards, safety engineering requirements, and sector-specific guidance rather than a single unified standard setter. In practice, biosafety and biosecurity work often turns on operational protocols, facility and equipment requirements, and verification methods that labs, hospitals, and manufacturers can implement and auditors can check.

The closest thing to a unifying baseline is the CDC and NIH’s Biosafety in Microbiological and Biomedical Laboratories (BMBL), which defines biosafety levels, risk assessment methods, and recommended practices for laboratories working with hazardous biological agents. The BMBL is advisory, not regulatory, but it functions as a de facto national standard: institutions incorporate its requirements into their own policies, and federal agencies reference it in grant conditions and oversight programs. When an institutional biosafety committee evaluates a proposed experiment or a federal inspector reviews a BSL-3 facility, the BMBL is typically the benchmark they’re working from.

Beyond lab safety, biosecurity standards also govern dual-use research oversight, synthetic biology screening, and diagnostic testing. The NIH Guidelines for Research Involving Recombinant or Synthetic Nucleic Acid Molecules set requirements for institutions receiving NIH funding, enforced through institutional biosafety committees. For clinical and diagnostic laboratories, CLSI standards and FDA-recognized consensus standards establish the technical requirements that labs must meet to ensure accurate, reproducible results. Bio standards are increasingly relevant to pandemic resilience in the built environment, where indoor air quality standards for infection prevention (such as ASHRAE Standard 241, the first standard specifically focused on control of infectious aerosols) set ventilation and filtration requirements for schools, hospitals, and other public spaces.

Unlike AI, where standards work is still defining what to measure, biosecurity standards often build on decades of established science, with the main policy challenges centering on keeping oversight current as capabilities advance (particularly in synthetic biology and gain-of-function research) and extending coverage to non-federally funded work.

Key non-governmental and international organizations in biosecurity standards include:

Below is a timeline of key federal and international actions shaping biosecurity and life sciences standards and oversight, along with a compilation of major bio-related standards by issue area.

Why (not) work on technical standards & evaluations?

Standards and evaluations can shape technology faster than formal regulation. Here’s why this work matters for policy impact and career development.

The case for impact

When the US government creates, adopts, or references standards, it can steer technology development without passing new legislation or standing up large-scale funding programs. Standards and evaluations sit upstream of regulation, procurement, and funding, so small technical choices can propagate widely.

  • Early influence on technology design: When technologies move quickly, rulemaking often lags. Standards let agencies shape private-sector design choices before products scale, by publishing guidance, building evaluation programs, or referencing consensus standards in procurement. Because standards can specify what systems must achieve (e.g. accuracy thresholds or safety benchmarks) rather than dictate how they must work, they preserve room for innovation while advancing public objectives. Performance standards for vehicle fuel efficiency, for example, set targets without prescribing engine designs.
  • Reduced fragmentation: When multiple companies and standards bodies push incompatible definitions, metrics, or test methods, the result is confusion, duplicated compliance costs, and the risk that a single dominant player locks in its preferred approach as the de facto standard. Government-backed baselines can preempt this.
  • Trust through evidence: Public evaluations and shared test protocols make claims about safety, reliability, or transparency more verifiable. NHTSA’s vehicle crash-test ratings, for example, give consumers a credible, independent basis for comparing safety.
  • International reach: US-origin standards frequently get adopted or adapted by other governments, international buyers, and multilateral bodies. Countries that lead in developing standards for emerging technologies can shape global markets around their domestic industry’s approach, which is one reason US participation in international standards bodies is frequently framed as a national competitiveness issue.

The case for professional growth

Standards and evaluation work builds technical depth and broad policy fluency simultaneously, with skills that transfer to regulatory roles, federal R&D program management, procurement, and private-sector compliance and safety teams.

  • Drafting the specifications, not just advising on them: Unlike most policy roles, standards work involves writing the actual measurement methods, thresholds, and test protocols that become binding requirements. This develops an unusually concrete form of technical policy expertise.
  • Early vantage point: Standards work is where key terms get defined, metrics get chosen, and “good enough” gets quantified, often before regulators or procurement offices lock in downstream requirements. This gives practitioners early visibility into how a technology area is likely to be governed.
  • Cross-sector relationships: Standards work often requires structured engagement with industry, academia, and other agencies through committees, workshops, and public comment cycles. The professional network this builds spans sectors in a way that few other government roles offer.

Limitations

  • Voluntary by default: Standards don’t bind unless agencies embed them in regulation, procurement requirements, grant conditions, or accreditation programs. This means adoption depends on stakeholders seeing enough value to comply voluntarily.
  • Political contestation: Standards development is shaped by the interests and power dynamics of its participants. Industry participants lobby for specifications that reduce their compliance costs, concentrated technical expertise can skew what looks “feasible” even without bad intent, countries compete for influence over international standards to advantage their domestic industries, and advocacy groups push for more stringent requirements.
  • Update lag: Fast-moving technologies can outpace consensus processes, leaving standards outdated shortly after publication. Detailed standards take longer to develop, which creates a persistent tension between precision and timeliness, especially for AI and other emerging technologies. Even in relatively mature fields like cybersecurity risk management, NIST standards can take five to ten years to develop or update. 
  • Measurement gaps: Standards and evaluations tend to measure what is technically straightforward to test, not necessarily what matters most for safety or public welfare. When standards set measurable targets, organizations naturally optimize for those metrics, sometimes at the expense of harder-to-measure properties. An AI model might score well on a benchmark without being safe or reliable in deployment; a lab might pass an inspection checklist without addressing its most serious operational risks.
  • Consensus tradeoffs: Standards developed through broad consensus can reflect what is widely achievable rather than what is most ambitious. The need for universal applicability can push requirements toward a lowest-common-denominator baseline, though agencies like NIST also conduct measurement research that feeds into more demanding future standards.

How does the USG engage with standards? Who’s involved? 

US government agencies engage with standards and evaluations in three main ways. The distinction matters because each mode involves different actors, carries different legal weight, and offers different leverage points for people working in the space.

  1. Agencies creating standards themselves: Some agencies develop technical standards directly, such as NIST’s measurement methods or FDA’s device classification protocols. The resulting standards can be voluntary (e.g. the AI Risk Management Framework) or carry binding force (e.g. EPA emissions standards), depending on the agency’s statutory authority.
  2. Agencies participating in external standard-setting bodies: Most global standards are developed not by governments but by organizations like ISO and ANSI-accredited technical committees. In these settings, agencies participate as one voice among industry, academic, and civil society representatives rather than as the decision-maker, which means influence depends on showing up consistently and contributing technical expertise.
  3. Agencies or legislators adopting or referencing external standards: Rather than writing technical requirements from scratch, agencies and lawmakers frequently incorporate existing consensus standards into regulations, procurement rules, or grant conditions. This is the primary mechanism through which voluntary standards acquire legal force, and it means that work done in external standards bodies (mode 2) can eventually become binding through government action.

Agencies creating standards themselves

The process for creating technical standards varies by agency, type of standard, and whether the result is mandatory or voluntary. At a conceptual level, most agency-led processes include these steps:

  1. Identify the technical need: The agency identifies the need for a new or updated standard, whether triggered by a regular review cycle, a statutory mandate, new scientific evidence, or an incident that exposes gaps. This typically starts with program offices, technical divisions, and senior leadership.
  2. Convene subject-matter experts: The agency assembles technical expertise to inform the standard’s scope and content, through workshops, requests for information (RFIs), advisory committee meetings, or interagency working groups. Participants typically include federal scientists and engineers, external experts from academia or industry, interagency partners, and advisory boards.
  3. Draft technical content: Internal teams—agency technical staff, policy analysts, legal counsel, and sometimes contractors or federally funded research and development centers—draft the standard, including definitions, technical requirements, testing methods, and data expectations. Drafts typically draw on existing research, operational experience, and relevant external standards.
  4. Seek broader stakeholder input (when applicable): Agencies may release draft standards for public comment or hold meetings to gather feedback. Some agencies, like NIST, do this routinely; others, like CDC, do so selectively. Input comes from industry groups, think tanks, laboratories, researchers, nonprofits, state and local partners, and standards development organizations.
  5. Revise and finalize: The agency incorporates feedback, reconciles conflicting input, and verifies internal consistency. Final documents undergo legal review and leadership approval before publication.
  6. Implement and update: The agency communicates the new standard to the relevant audience (e.g. regulated entities, federal programs) and revises it over time as science advances, new risks emerge, or operational experience reveals gaps. Program offices, compliance teams, and technical staff manage this ongoing cycle.

Agencies participating in external (non-governmental) standard-setting bodies

Most technical standards are developed not by governments but by non-governmental standards development organizations. The National Technology Transfer and Advancement Act and OMB Circular A-119 direct federal agencies to rely on voluntary consensus standards whenever feasible, so agencies have strong incentives to shape those standards from within. Agencies can’t control the process, but they can participate by:

  • Nominating subject-matter experts to serve on technical committees
  • Voting on draft standards through formal ballot processes
  • Contributing research, test data, or technical evidence
  • Attending working groups or plenary meetings
  • Establishing or participating in ANSI-accredited US Technical Advisory Groups (TAGs), which develop consensus US positions for international standards negotiations

Agencies or legislators adopting or referencing external standards

External standards become part of federal policy when they are incorporated into regulation, procurement requirements, guidance documents, or evaluation and accreditation programs. Adoption allows the government to use well-established technical expectations without developing every standard internally.

Agencies adopt or reference external standards in several ways:

  • Regulation: Agencies write legally binding rules that require compliance with specific external standards. Incorporation by reference allows regulators to use detailed technical material without reprinting it in the statute itself.
  • Procurement and contracting: Acquisition programs reference external standards in Requests for Proposals (RFPs), Statements of Work (SOWs), and contract clauses. Contractors must meet the specified standards for their products or services to be accepted.
  • Guidance or recommended practices: Agencies issue nonbinding guidance that recommends the use of external standards. While not legally enforceable, these recommendations influence practice across regulated industries and can shape expectations in advance of formal regulation.
  • Evaluation, certification, and accreditation programs: Agencies rely on external standards in conformity assessment processes. Many federal certification, accreditation, and testing programs require compliance with recognized consensus standards to demonstrate safety, quality, or performance.

Working on standards & evals: types of roles and career opportunities 

Type of RoleResponsibilitiesTypical background (for full-time roles)Security clearanceLocationCareer guides & opportunities
Federal standards & evaluations staff Develop, maintain, or update technical standards; run public comment processes; coordinate with SDOs; conduct risk or performance assessments; translate science into technical guidanceBachelor’s degree for junior roles; advanced technical training for specialized roles; experience in scientific, engineering, or policy analysisSometimes required (for national security-adjacent roles)Washington, DC; Gaithersburg, MD (NIST campus); some roles at national labs or field officesExecutive Branch, NIST, FDA, CDC, National Labs and FFRDCs
Acquisition & procurement staff Integrate standards into solicitations and contracts; evaluate vendor compliance; work with test and evaluation offices; ensure systems meet international or domestic standards requirementsBA/MA; engineering, procurement, or systems background helpful; program management experienceSometimes required (for national security-adjacent roles)Primarily Washington, DC; also military installations and agency field offices across the USExecutive Branch, DOD, DHS
Think tank researchers or advocatesConduct policy research and analysis; submit comments on proposed standards and participate in standard-setting processes; develop recommendations; advocate for policy changes; engage with policymakers and media.BA or MA for junior roles; MA/JD/PhD for mid-career/senior; subject matter expertise; experience in policy analysis or communicationsRarely requiredPrimarily Washington, DC; some in major cities or remoteWorking in think tanks (+ fellowships, think tanks working on emerging tech policy, & resources)
Congressional staffSupport members and committees in overseeing standards-related agencies (particularly NIST); shape legislation that affects standards policy, including authorization and appropriations for federal measurement and standards programs; prepare hearings on standards-related topics; engage with NIST, SDOs, industry, and other stakeholdersBA for junior roles; BA/MA/JD for mid-career/senior roles; strong communication skills. Prior Hill experience matters more than formal credentials for senior roles; fellowships can help bypass this requirement.Rarely required (e.g. some Armed Services or Intelligence committee staff)Washington, DCWorking in Congress (+ internships, fellowships, & full-time roles)
Multilateral standards organizationsSupport development of international consensus standards; coordinate technical committees; facilitate cross-country negotiations; align standards with global regulatory, safety, and interoperability needs; coordinate US participation through national standards bodies like ANSI
BA/MA for policy or coordination roles; STEM or engineering background for technical positions; experience with standards, international policy, or technical writing
Not requiredGeneva (ISO, ITU), Brussels (IEC), remote and US-based roles at ANSI and related institutionsMultilateral governance careers; career pages for ISO, ANSI, IEC, ITU 
Industry and professional standards organizations7 Manage standards committees; coordinate voluntary consensus processes; draft and maintain technical specifications; engage with companies, researchers, and government liaisons; track emerging technology needs in sectors such as AI, health IT, aerospace, and biotechBA/MA for policy, program, or coordination roles; STEM or engineering for technical committee support; experience with industry standards or applied researchNot requiredCan be located across the US but hubs in DC, New York, BostonCareer pages for IEEE, ASTM, HL7, SAE, AAMI; early-career standards coordination roles at major SDOs
Third-party evaluation organizationsEvaluate technologies against safety, performance, or risk criteria; develop evaluation methodologies, benchmarks, and test protocols; publish findings that inform standards development and policy decisions; collaborate with government agencies and standards bodies on evaluation designAdvanced degree (MA/PhD) in computer science, statistics, or a relevant technical field; research experience in machine learning, measurement science, or experimental design; familiarity with benchmark development and evaluation methodologySometimes required (for national security-related evaluations)Washington, DC; San Francisco Bay Area; remote (varies by organization)Career pages for METR, Apollo Research, MLCommons
Conformity assessment & testing staffConduct product testing, inspection, or certification against published standards; perform laboratory accreditation assessments; support organizations seeking ISO, industry, or government-recognized certification; evaluate whether products, systems, or processes meet specified technical requirements; document and report test results for regulatory or procurement purposesBA/MA in engineering, science, or quality management; laboratory experience or auditing credentials (e.g. ISO lead auditor certification) helpful; familiarity with relevant testing standards and conformity assessment proceduresRarely required; some defense or national security testing roles may require clearanceAcross the US; testing and certification organizations operate nationally, with concentrations near manufacturing hubs and federal agency locationsCareer pages for UL Solutions, Intertek, BSI, TÜV; ANSI National Accreditation Board (ANAB) lists accredited certification bodies

Preparing for standards & evals roles

  • Building technical depth in a relevant domain. Most standards roles require enough subject-matter expertise to evaluate technical claims, assess tradeoffs in standard design, and engage credibly with engineers, scientists, and industry representatives. Graduate training in a STEM field, law, or a related discipline is common at agencies like NIST, though not always required for policy-focused positions at think tanks or in Congress.
    • For AI evaluation roles specifically, technical expertise in machine learning methods, statistical measurement, and experimental design are valuable, and many evaluation organizations draw heavily from candidates with graduate-level research experience (e.g. PhDs or Master’s in computer science, statistics, or related fields).
  • Learning how standards processes work. Familiarize yourself with how standards are developed, adopted, and referenced, including the roles of organizations like NIST, ANSI, ISO, and sector-specific standards development organizations. Some organizations offer in-depth public resources on standards and evals, including UPenn’s course series and ANSI’s overview of the US standards system (see more below). Reading recent Federal Register notices related to standards, such as requests for comment on NIST frameworks, can also help you understand how agencies engage with the public during standard development.
  • Participating in standards development. Many standards committees are open to individual participants, including early-career professionals. Joining or even observing a technical committee at an organization like IEEE, ASTM, or an ANSI-accredited body gives you firsthand exposure to consensus processes, stakeholder negotiation, and technical drafting. 
  • Gaining experience in measurement, testing, or evaluation. Hands-on experience with test design, data collection, benchmarking, or conformity assessment is valuable, whether through lab work, quality assurance roles, or research assistantships. If you’re in a technical field, look for opportunities to work on evaluation protocols, contribute to benchmark development, or support compliance testing. NIST and national labs often hire students and postdocs for measurement-focused work, and MLCommons runs a Rising Stars initiative for recent PhD grads.
    • For AI, contributing to open evaluation infrastructure—such as developing or maintaining benchmarks on platforms like Hugging Face, or participating in shared evaluation challenges run by organizations like MLCommons—can build practical experience and visibility in the evaluation community.
  • Completing a relevant fellowship or internship. Several programs place early-career professionals in standards-adjacent roles: the AAAS Science & Technology Policy Fellowship places scientists and engineers in federal agencies (including NIST and other standards-relevant offices), and NIST’s own internship and postdoctoral programs offer direct exposure to measurement science and standards development. Congressional fellowships and internships can also provide relevant experience for those interested in the legislative side of standards policy.
  • Engaging with the standards and evaluation community. Think tanks, professional societies, and advocacy organizations that work on standards policy regularly host public events, publish research, and accept public comments. Tracking organizations like ANSI, ASTM, IEEE, and NIST, as well as think tanks covering standards-related topics, can help you understand current debates and build professional relationships. Attending events like ANSI’s annual World Standards Day, NIST workshops, or SDO committee meetings can expand your network.
  • Writing or publishing on standards-related topics. Demonstrating that you can analyze and communicate about standards and evals, whether through policy briefs, blog posts, academic papers, or public comments, signals both technical credibility and policy fluency. Submitting a comment on a NIST draft framework or writing an analysis of a proposed standard for a policy outlet are practical ways to build a portfolio.
    • For evaluation roles, publishing in venues focused on measurement or benchmarking (e.g. NeurIPS, ICML, and FAccT) can signal technical credibility to both government and independent evaluation organizations.

Appendices: Day-in-the-life 

Further reading 

Footnotes