G. Tourassi
Oak Ridge National Laboratory, United States
M. Shankar
Oak Ridge National Laboratory, United States
F. Wang
Oak Ridge National Laboratory, United States
G. Tourassi
Oak Ridge National Laboratory, United States
M. Shankar
Oak Ridge National Laboratory, United States
F. Wang
Oak Ridge National Laboratory, United States
The past three decades have witnessed the widespread adoption of high-performance computing (HPC) as an essential tool in the advancement of science. From accelerating critical research in the wake of a global pandemic to climate modelling and national security, HPC has become integral to the most cutting-edge scientific research around the world and across application domains. The global competition to debut the next fastest supercomputer keeps pushing the field forward. Meanwhile, increasing capabilities are bringing hallmarks of science fiction, such as artificial intelligence (AI), into daily life. However, the tremendous power of new computing systems comes with an increased concern about equitable access to these resources and their impact on supporting a thriving workforce.
The mission of the Department of Energy (DOE) Office of Science (SC) is “to deliver scientific discoveries and major scientific tools to transform our understanding of nature and advance the energy, economic and national security of the United States.” From its beginnings, the SC budget supported over 25 000 researchers at more than 300 institutions and across 17 DOE national laboratories, in addition to 27 open-access experimental and computing user facilities (DOE, 2022).
To increase HPC capabilities in the United States, Congress passed the Department of Energy High-End Computing Revitalization Act of 2004 (DOE, 2022), which called for leadership computing systems. These high-end computing systems are among the most advanced in the world. They are operated and available for use by researchers in industry, institutions of higher education, national laboratories and other federal agencies.
As one of DOE’s leadership computing facilities, the Oak Ridge Leadership Computing Facility (OLCF) (OLFC, 2022) has consistently operated some of the nation’s top supercomputers. OLCF has cemented computation as the third pillar of scientific discovery by enabling breakthroughs in basic and applied sciences. This includes many notable contributions in energy efficiency, climate change and medical research. OLCF has provided cutting-edge supercomputing capability and pioneering ideas, operating a Top 10 supercomputer on the Top 500 list every year since its establishment in 2005. The leading OLCF systems in the last decade – Titan and Summit – debuted as the world’s fastest computers.
At 200 petaflops, the IBM Summit supercomputer is the OLCF’s flagship system. Launched in 2018, Summit delivers eight times the computational performance of the OLCF’s previous Cray XK7 Titan supercomputer. To achieve this performance, it uses only 4 608 nodes, a fraction of Titan’s 18 688 nodes. With the debut of the new HPE Cray EX Frontier supercomputer, OLCF will house the nation’s first exascale system. It can perform more than 1.5 exaflops and solve calculations up to 50 times faster than today’s fastest supercomputers.
As a Leadership Computing Facility, OLCF aims to provide world-class computational resources and specialised services to researchers around the world for the most computationally intensive global challenges in science and engineering. Time on DOE’s Leadership Computing Systems is managed through the two competitive allocation programmes: INCITE (Innovative and Novel Computational Impact on Theory and Experiment) (ALCF, 2022) and ALCC (ASCR Leadership Computing Challenge) (ALCC, 2022). The requests typically exceed the available resources by a factor of three to five times. Therefore, selection is competitive and based on a peer-reviewed process. Allocations of computing cycles are typically 100 times greater than routinely available for university, laboratory, and industrial scientific and engineering environments.
With the rapid explosion of data-intensive science, OLCF has experienced increasing demand to support advanced scientific workflows; incorporate AI; and run rich analytics coupled with simulation software to derive valuable insights from extreme-scale experimental and observational data.
Both the National Strategic Computing Initiative (White House, 2015) and the American AI Initiative (Parker, 11 June 2020) called for a cohesive, multi-agency, strategic vision to empower and maintain scientific leadership in the United States, as well as fuel innovations in all sectors of its economy. Since then, the landscape of HPC and AI has been changing rapidly. For example, in addition to hundreds of millions of US dollars already invested by DOE in HPC and leadership computing, more than a dozen AI institutes have been established in more than 40 states. Each has unique strengths and focus areas spanning all aspects related to AI – from fundamental methods to human-AI interaction, augmentation and collaboration.
In the European Union, with strong political endorsement, member nations have taken a concerted approach towards AI, emphasising use of AI for good and for all (European Commission, 2018). In addition to research and development (R&D) excellence, they are paying particular attention to trustworthy AI in its recent proposal for AI regulations (European Commission, 2021). Furthermore, in combination with the European Processor Initiative project, the European Union has built its own roadmap to support the convergence of extreme-scale computing, big data and AI (Kovač et al., 2022).
The infusion of enormous capital is enabling and driving R&D innovations, unleashing resources and removing barriers and training a new generation of an AI-ready workforce. However, it is becoming apparent that both AI resources and talents are highly concentrated. This could put disadvantaged groups at risk, especially in developing countries and resource-strapped universities.
The jury is still out as to whether AI is a transformational force for developing nations or a disruptive force that widens the gap between rich and poor countries (Alonso, Kothari and Rehman, 2 December 2020). One thing is clear though: there is an insatiable demand for the compute power and data storage. These are increasingly intertwined and becoming an integral part of making ground-breaking scientific discoveries.
As part of their business strategy, cloud vendors such as Google Colab and Microsoft Azure both offer free allocations of computing resources. This service partially enables AI access to go from nothing to something. However, these offerings present notable limitations. For example, to maintain maximal resource schedule flexibility, Colab resources are not guaranteed and not unlimited. Even with a paid platform such as Colab Pro, access to the graphics processing unit (GPU) – a workhorse for AI-supporting computations – may be limited to a relatively older generation GPU and 24-hour running time. Although this is common practice, such policies are limiting for even moderate scientific and technical R&D. These limitations highlight key gaps but also opportunities for technology and policy advances.
Interest in AI and the economic potential of incorporating AI tools and methods in the commercial sector has led major corporations to develop software and purpose-built hardware for AI. Tools such as TensorFlow (originating in Google) and PyTorch (originating in Facebook) have been distributed into the open-source community. This, in turn, has led to accelerated growth in the use and adoption of AI methods in a variety of industries, academic settings and major science laboratories.
Dissemination of these tools has been accompanied by a dramatic growth in technical publications in the field of AI. It has also led to a vast array of educational materials available on line all around the world. This adoption and growth, however, is constrained by the availability of computing resources and high-quality datasets that are the basis for AI.
There are two main areas where systematic approaches led by nations at the forefront of this field can help in alleviating computing and data availability constraints: the technology and policy spheres.
In the technology sphere, computing infrastructure and software availability could be stewarded and shepherded so they support open science. The open-source ecosystem is a thriving location for these tools and capabilities. However, curating best practices and applications that may be shared in a rapidly changing field is critical for the global community to benefit from emerging advances. The ways in which applications must be scaled up – crucial to serious AI campaigns – cannot be the purview of the few major commercial entities.
Nationally funded laboratories and their computing infrastructures, in collaboration with industry and academia, could nurture and support the AI ecosystems for tertiary educational entities and partner countries. This is especially useful for those entitles and countries that may lack resources or are only beginning to build core competencies in this field. Step-up guides from basic skills to scalable data and software management will be needed in tutorial-accessible form. This would enable students and practitioners to begin on their personal computers or small-scale cloud resources. They would then advance to larger cloud resources or institutional-scale resources, and then on to national-scale resources. These tools and capabilities, if shared with the broader community, will enable a broader community of countries to gain from national investments.
The policy sphere is associated with sharing resources, training, outcomes and guidelines. Countries at the forefront of the field, including the United States and EU leaders, may collaborate on policy frameworks to make resources available in a shared pool for deserving entities. Major commercial providers today offer computing grants to academic institutions. This model could be expanded to share computing resources and frameworks, potentially across all OECD countries. Such sharing can provide a stepping-stone for nascent and growing initiatives. At the same time, it can also prevent reinvention and provide secondary benefits such as workforce development and rapid knowledge dissemination. The field itself will benefit from common offerings enabling reproducibility, ethical use and environment-conscious AI deployments.
AI has emerged as a central enabler to many existing and emerging scientific efforts. Furthermore, its rapid adoption has shown great promise across a wide array of domains – from health care and transportation to manufacturing and cybersecurity. Since their inception, the Leadership Computing Facilities have served as a strategic reserve to support open science. During the recent COVID-19 pandemic, DOE computing facilities played a central role in advancing the biomedical foundations needed for an accelerated response. The systems supported computationally intensive activities, including large AI-driven scientific campaigns (HPC Consortium, 2022). Leadership computing facilities dedicated to open science proved to be a unique asset. They leveraged their deep expertise in deploying and efficiently managing computing resources. At the same time, they built interdisciplinary teams to address some of the most critical data and computing problems associated with emerging scientific needs.
In the ever-expanding computing ecosystem, HPC will remain a critical building block. This is especially true for large-scale scientific campaigns that depend on interleaving large-scale modelling and simulation with AI. Still, since AI is a data-hungry endeavour, access to high-quality data will be as critical as access to compute resources. New capabilities and policies are needed to integrate leadership-class computing systems into distributed data ecosystems. This process will help accelerate scientific advances and ensure equity and democratisation of the resources.
ALCC (2022), “ASCR Leadership Computing Challenge”, webpage, https://science.osti.gov/ascr/Facilities/Accessing-ASCR-Facilities/ALCC (accessed 23 November 2022).
ALCF (2022), “INCITE Program”, webpage, www.alcf.anl.gov/science/incite-allocation-program (accessed 23 November 2022).
Alonso, C., S. Kothari and S. Rehman (2 December 2020), “How artificial intelligence could widen the gap between rich and poor nations”, IMF blog, https://blogs.imf.org/2020/12/02/how-artificial-intelligence-could-widen-the-gap-between-rich-and-poor-nations.
DOE (2022), “National Laboratories”, webpage, www.energy.gov/national-laboratories (accessed 23 November 2022).
European Commission (2021), “Proposal for a regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts”, 24 April, SEC(2021) 167 final, SWD(2021) 84 final, SWD(2021) 85 final, European Commission, Brussels, https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52021PC0206&from=EN.
European Commission (2018), “Artificial Intelligence for Europe”, Communication from the Commission, Brussels, 25 April, SWD(2018) 137 final, European Commission, Brussels, https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:52018DC0237&from=EN.
HPC Consortium (2022), “Who We Are”, webpage, https://covid19-hpc-consortium.org (accessed 23 November 2022).
Kovač, M. et al. (2022), “European processor initiative: Europe's approach to exascale computing”, in HPC, Big Data, and AI Convergence Towards Exascale, CRC Press, Boca Raton, FL.
OLCF (2022), Oak Ridge National Laboratory website, www.olcf.ornl.gov (accessed 23 November 2022).
Parker, L. (11 June 2020), “The American AI Initiative: The U.S. Strategy for leadership in artificial intelligence”, OECD.AI Policy Observatory blog.
White House (2015), “Executive Order – ‘Creating a National Strategic Computing Initiative’”, 29 July, Press Release, White House, Washington, DC, https://obamawhitehouse.archives.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative.