Artificial Intelligence in Science

Challenges, Opportunities and the Future of Research

Report

26 June 2023

High-performance computing leadership to enable advances in artificial intelligence and a thriving compute ecosystem

Copy link to High-performance computing leadership to enable advances in artificial intelligence and a thriving compute ecosystem

G. Tourassi
Oak Ridge National Laboratory, United States

M. Shankar
Oak Ridge National Laboratory, United States

F. Wang
Oak Ridge National Laboratory, United States

Introduction

Copy link to Introduction

The past three decades have witnessed the widespread adoption of high-performance computing (HPC) as an essential tool in the advancement of science. From accelerating critical research in the wake of a global pandemic to climate modelling and national security, HPC has become integral to the most cutting-edge scientific research around the world and across application domains. The global competition to debut the next fastest supercomputer keeps pushing the field forward. Meanwhile, increasing capabilities are bringing hallmarks of science fiction, such as artificial intelligence (AI), into daily life. However, the tremendous power of new computing systems comes with an increased concern about equitable access to these resources and their impact on supporting a thriving workforce.

The US Department of Energy advanced scientific computing research facilities

Copy link to The US Department of Energy advanced scientific computing research facilities

The mission of the Department of Energy (DOE) Office of Science (SC) is “to deliver scientific discoveries and major scientific tools to transform our understanding of nature and advance the energy, economic and national security of the United States.” From its beginnings, the SC budget supported over 25 000 researchers at more than 300 institutions and across 17 DOE national laboratories, in addition to 27 open-access experimental and computing user facilities (DOE, 2022).

To increase HPC capabilities in the United States, Congress passed the Department of Energy High-End Computing Revitalization Act of 2004 (DOE, 2022), which called for leadership computing systems. These high-end computing systems are among the most advanced in the world. They are operated and available for use by researchers in industry, institutions of higher education, national laboratories and other federal agencies.

As one of DOE’s leadership computing facilities, the Oak Ridge Leadership Computing Facility (OLCF) (OLFC, 2022) has consistently operated some of the nation’s top supercomputers. OLCF has cemented computation as the third pillar of scientific discovery by enabling breakthroughs in basic and applied sciences. This includes many notable contributions in energy efficiency, climate change and medical research. OLCF has provided cutting-edge supercomputing capability and pioneering ideas, operating a Top 10 supercomputer on the Top 500 list every year since its establishment in 2005. The leading OLCF systems in the last decade – Titan and Summit – debuted as the world’s fastest computers.

At 200 petaflops, the IBM Summit supercomputer is the OLCF’s flagship system. Launched in 2018, Summit delivers eight times the computational performance of the OLCF’s previous Cray XK7 Titan supercomputer. To achieve this performance, it uses only 4 608 nodes, a fraction of Titan’s 18 688 nodes. With the debut of the new HPE Cray EX Frontier supercomputer, OLCF will house the nation’s first exascale system. It can perform more than 1.5 exaflops and solve calculations up to 50 times faster than today’s fastest supercomputers.

As a Leadership Computing Facility, OLCF aims to provide world-class computational resources and specialised services to researchers around the world for the most computationally intensive global challenges in science and engineering. Time on DOE’s Leadership Computing Systems is managed through the two competitive allocation programmes: INCITE (Innovative and Novel Computational Impact on Theory and Experiment) (ALCF, 2022) and ALCC (ASCR Leadership Computing Challenge) (ALCC, 2022). The requests typically exceed the available resources by a factor of three to five times. Therefore, selection is competitive and based on a peer-reviewed process. Allocations of computing cycles are typically 100 times greater than routinely available for university, laboratory, and industrial scientific and engineering environments.

With the rapid explosion of data-intensive science, OLCF has experienced increasing demand to support advanced scientific workflows; incorporate AI; and run rich analytics coupled with simulation software to derive valuable insights from extreme-scale experimental and observational data.

The AI compute ecosystem: Gaps and opportunities

Copy link to The AI compute ecosystem: Gaps and opportunities

Both the National Strategic Computing Initiative (White House, 2015) and the American AI Initiative (Parker, 11 June 2020) called for a cohesive, multi-agency, strategic vision to empower and maintain scientific leadership in the United States, as well as fuel innovations in all sectors of its economy. Since then, the landscape of HPC and AI has been changing rapidly. For example, in addition to hundreds of millions of US dollars already invested by DOE in HPC and leadership computing, more than a dozen AI institutes have been established in more than 40 states. Each has unique strengths and focus areas spanning all aspects related to AI – from fundamental methods to human-AI interaction, augmentation and collaboration.

In the European Union, with strong political endorsement, member nations have taken a concerted approach towards AI, emphasising use of AI for good and for all (European Commission, 2018). In addition to research and development (R&D) excellence, they are paying particular attention to trustworthy AI in its recent proposal for AI regulations (European Commission, 2021). Furthermore, in combination with the European Processor Initiative project, the European Union has built its own roadmap to support the convergence of extreme-scale computing, big data and AI (Kovač et al., 2022).

The infusion of enormous capital is enabling and driving R&D innovations, unleashing resources and removing barriers and training a new generation of an AI-ready workforce. However, it is becoming apparent that both AI resources and talents are highly concentrated. This could put disadvantaged groups at risk, especially in developing countries and resource-strapped universities.

The jury is still out as to whether AI is a transformational force for developing nations or a disruptive force that widens the gap between rich and poor countries (Alonso, Kothari and Rehman, 2 December 2020). One thing is clear though: there is an insatiable demand for the compute power and data storage. These are increasingly intertwined and becoming an integral part of making ground-breaking scientific discoveries.

As part of their business strategy, cloud vendors such as Google Colab and Microsoft Azure both offer free allocations of computing resources. This service partially enables AI access to go from nothing to something. However, these offerings present notable limitations. For example, to maintain maximal resource schedule flexibility, Colab resources are not guaranteed and not unlimited. Even with a paid platform such as Colab Pro, access to the graphics processing unit (GPU) – a workhorse for AI-supporting computations – may be limited to a relatively older generation GPU and 24-hour running time. Although this is common practice, such policies are limiting for even moderate scientific and technical R&D. These limitations highlight key gaps but also opportunities for technology and policy advances.

The AI compute ecosystem: Technology and policy directions

Copy link to The AI compute ecosystem: Technology and policy directions

Interest in AI and the economic potential of incorporating AI tools and methods in the commercial sector has led major corporations to develop software and purpose-built hardware for AI. Tools such as TensorFlow (originating in Google) and PyTorch (originating in Facebook) have been distributed into the open-source community. This, in turn, has led to accelerated growth in the use and adoption of AI methods in a variety of industries, academic settings and major science laboratories.

Dissemination of these tools has been accompanied by a dramatic growth in technical publications in the field of AI. It has also led to a vast array of educational materials available on line all around the world. This adoption and growth, however, is constrained by the availability of computing resources and high-quality datasets that are the basis for AI.

There are two main areas where systematic approaches led by nations at the forefront of this field can help in alleviating computing and data availability constraints: the technology and policy spheres.

Technology sphere

In the technology sphere, computing infrastructure and software availability could be stewarded and shepherded so they support open science. The open-source ecosystem is a thriving location for these tools and capabilities. However, curating best practices and applications that may be shared in a rapidly changing field is critical for the global community to benefit from emerging advances. The ways in which applications must be scaled up – crucial to serious AI campaigns – cannot be the purview of the few major commercial entities.

Nationally funded laboratories and their computing infrastructures, in collaboration with industry and academia, could nurture and support the AI ecosystems for tertiary educational entities and partner countries. This is especially useful for those entitles and countries that may lack resources or are only beginning to build core competencies in this field. Step-up guides from basic skills to scalable data and software management will be needed in tutorial-accessible form. This would enable students and practitioners to begin on their personal computers or small-scale cloud resources. They would then advance to larger cloud resources or institutional-scale resources, and then on to national-scale resources. These tools and capabilities, if shared with the broader community, will enable a broader community of countries to gain from national investments.

Policy sphere

The policy sphere is associated with sharing resources, training, outcomes and guidelines. Countries at the forefront of the field, including the United States and EU leaders, may collaborate on policy frameworks to make resources available in a shared pool for deserving entities. Major commercial providers today offer computing grants to academic institutions. This model could be expanded to share computing resources and frameworks, potentially across all OECD countries. Such sharing can provide a stepping-stone for nascent and growing initiatives. At the same time, it can also prevent reinvention and provide secondary benefits such as workforce development and rapid knowledge dissemination. The field itself will benefit from common offerings enabling reproducibility, ethical use and environment-conscious AI deployments.

Conclusion

Copy link to Conclusion

AI has emerged as a central enabler to many existing and emerging scientific efforts. Furthermore, its rapid adoption has shown great promise across a wide array of domains – from health care and transportation to manufacturing and cybersecurity. Since their inception, the Leadership Computing Facilities have served as a strategic reserve to support open science. During the recent COVID-19 pandemic, DOE computing facilities played a central role in advancing the biomedical foundations needed for an accelerated response. The systems supported computationally intensive activities, including large AI-driven scientific campaigns (HPC Consortium, 2022). Leadership computing facilities dedicated to open science proved to be a unique asset. They leveraged their deep expertise in deploying and efficiently managing computing resources. At the same time, they built interdisciplinary teams to address some of the most critical data and computing problems associated with emerging scientific needs.

In the ever-expanding computing ecosystem, HPC will remain a critical building block. This is especially true for large-scale scientific campaigns that depend on interleaving large-scale modelling and simulation with AI. Still, since AI is a data-hungry endeavour, access to high-quality data will be as critical as access to compute resources. New capabilities and policies are needed to integrate leadership-class computing systems into distributed data ecosystems. This process will help accelerate scientific advances and ensure equity and democratisation of the resources.

References

ALCC (2022), “ASCR Leadership Computing Challenge”, webpage, https://science.osti.gov/ascr/Facilities/Accessing-ASCR-Facilities/ALCC (accessed 23 November 2022).

ALCF (2022), “INCITE Program”, webpage, www.alcf.anl.gov/science/incite-allocation-program (accessed 23 November 2022).

Alonso, C., S. Kothari and S. Rehman (2 December 2020), “How artificial intelligence could widen the gap between rich and poor nations”, IMF blog, https://blogs.imf.org/2020/12/02/how-artificial-intelligence-could-widen-the-gap-between-rich-and-poor-nations.

DOE (2022), “National Laboratories”, webpage, www.energy.gov/national-laboratories (accessed 23 November 2022).

European Commission (2021), “Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts”, 24 April, SEC(2021) 167 final, SWD(2021) 84 final, SWD(2021) 85 final, European Commission, Brussels, https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206&from=EN.

European Commission (2018), “Artificial Intelligence for Europe”, Communication from the Commission, Brussels, 25 April, SWD(2018) 137 final, European Commission, Brussels, https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52018DC0237&from=EN.

HPC Consortium (2022), “Who We Are”, webpage, https://covid19-hpc-consortium.org (accessed 23 November 2022).

Kovač, M. et al. (2022), “European processor initiative: Europe's approach to exascale computing”, in HPC, Big Data, and AI Convergence Towards Exascale, CRC Press, Boca Raton, FL.

OLCF (2022), Oak Ridge National Laboratory website, www.olcf.ornl.gov (accessed 23 November 2022).

Parker, L. (11 June 2020), “The American AI Initiative: The U.S. Strategy for leadership in artificial intelligence”, OECD.AI Policy Observatory blog.

White House (2015), “Executive Order – ‘Creating a National Strategic Computing Initiative’”, 29 July, Press Release, White House, Washington, DC, https://obamawhitehouse.archives.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative.

Publications

Featured publications

Data

Featured data

News & events

Featured events

About

Engage with us

Work with us

Publications

Featured publications

Data

Featured data

News & events

Featured events

About

Engage with us

Work with us

Artificial Intelligence in Science

More info

Cite this content as:

High-performance computing leadership to enable advances in artificial intelligence and a thriving compute ecosystem

Introduction

The US Department of Energy advanced scientific computing research facilities

The AI compute ecosystem: Gaps and opportunities

The AI compute ecosystem: Technology and policy directions

Technology sphere

Policy sphere

Conclusion

References

Topics

Countries & regions

Data

Publications

News & Events

About

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment

Governance

Health

Industry, business and entrepreneurship

Regional, rural and urban development

Science, technology and innovation

Society

Taxation

Trade

Energy

Nuclear energy

Transport

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Countries

Countries A - C

Countries D - I

Countries J - M

Countries N - R

Countries S - T

Countries U - Z

Regional and global engagement

Publications

Publications

Featured publications

Data

Data

Featured data

News & events

News & events

Featured events

About OECD

About

Engage with us

Work with us

Featured topics

Agriculture and fisheries

Climate change

Development

Digital

Economy

Education and skills

Employment

Environment

Finance and investment