Molecular Science Software Suite - MS3
Winner of 1999 R&D 100 Award
from R&D Magazine ![]()
MS3 is a comprehensive, integrated set of tools that enables scientists to understand complex chemical systems at the molecular level by coupling the power of advanced computational chemistry techniques with existing and rapidly evolving high-performance, massively parallel computing systems.
Page Contents: Description - Software Improvements - Principal Applications - Future Developments - Potential Applications - Summary - Additional Information
MS3 Description
The Molecular Science Software Suite (MS3) is a unique, comprehensive, integrated suite of software that enables computational chemists to focus their advanced techniques on finding solutions to complex issues involving chemical systems. It is the first general-purpose software that provides access to high-performance, massively parallel computers for a broad range of chemists on a broad range of applications. MS3 lets chemists easily couple the power of advanced computational chemistry techniques with existing and rapidly evolving high-performance, massively parallel computing systems. A multidisciplinary team of scientists and computer experts at Pacific Northwest National Laboratory's Environmental Molecular Sciences Laboratory (EMSL) developed MS3.
Background. To address the complex environmental issues facing the nation, new scientific understanding is needed of the fundamental chemical, physical, and biological processes underlying these issues. We need to obtain a fundamental understanding of these complex phenomena at the basic, molecular level to develop new, timely, and cost-effective solutions. By using modeling and simulation techniques, scientists can now begin to perform computations with the required accuracy on the molecular systems involved in these complex issues. Computational chemistry can provide fundamental information about molecules and their behavior, so it is a key component of any modeling and simulation program used to address these issues.
High-performance, massively parallel computing systems give computational chemists the computing power they need to model and simulate ever more complex chemical systems. However, such computers are extremely difficult to run efficiently and effectively without the right computational methods. By providing access to high-performance, massively parallel computers for a broad range of applications, MS3 can be used to address environmental problems. It can also be applied to the computational "Grand Challenge" problems in computational chemistry as addressed by the chemical industry's Vision 2020 subcommittee on computational chemistry. In addition, it will provide unique insights into the molecular-level understanding of our world!
About MS3. MS3 consists of three components: 1) the Extensible Computational Chemistry Environment (Ecce), 2) the Northwest Computational Chemistry Software (NWChem), and 3) Parallel Software Development Tools (ParSoft).
Ecce is the first comprehensive, integrated, problem-solving environment developed for computational chemistry. Based on an object-oriented data model developed at EMSL, Ecce is a suite of distributed client/server applications that enable scientists to easily use computational software such as NWChem to perform complex molecular modeling and analysis tasks by accessing networked, high-performance computers from their desktop workstations. Ecce combines automated metadata and database management, modern "intelligent" graphical user interfaces, automated calculation initiation and monitoring, scientific visualization, analysis tools, and access to a hierarchical mass storage system. This interactive environment allows the user ready access to computational resources, both hardware and software, on highly sophisticated parallel computing systems.
Key components of the Ecce environment are:
- Graphical user interfaces for the computational chemistry codes including job setup and launching, job control and monitoring, and computational chemistry advisors, e.g., the Basis Set Browser,
- Visualization software and data analysis for molecular properties.
- Integrated management of the data from molecular computations
- "Data mining" techniques to maximize the use of the computational results.
An outline of MS3, with its distributed client/server model, is shown in Figure 1.
Figure 1. MS3: Ecce- NWChem- ParSoft Distributed Client/Server Model
NWChem is a new generation of high-performance molecular modeling software that runs on parallel computing systems ranging from clusters of workstations to the emerging teraflops class of massively parallel computers. NWChem is scalable to both problem size and computer size as well as portable for different high-performance computing systems. It provides a broad range of capabilities for solving sophisticated mathematical models of chemical systems from first principles at both the molecular orbital and density functional theory levels. These capabilities enable theoretical chemists to predict the fundamental characteristics of chemical systems at a level of accuracy that is otherwise obtainable only from the most sophisticated experimental approaches. NWChem also supports molecular dynamics calculations with a variety of empirical force fields to simulate macromolecular and solution systems as well as with quantum mechanical force fields. The software is modular, so that even though it has more than 500,000 lines of code, less than 10,000 lines must be modified to run at high-performance levels on any new parallel computer architecture.
The current version of NWChem is 3.2.1, and its capabilities include
Molecular electronic structure
- Energies, analytic gradients, and numerical second derivatives by finite difference of the gradients
- Self Consistent Field (RHF, UHF, high-spin ROHF)
- Gaussian Density Functional Theory (DFT) with many local and non-local exchange-correlation potentials (RHF and UHF)
- MP2 including semi-direct using frozen core and RHF or UHF reference
- Complete active space SCF (CASSCF)
- Energies, numerical gradients and second derivatives by finite difference of the energies
- MP2, MP4, CCSD(T) with RHF reference
- MP2 fully-direct with RHF reference
- MP2 using the Resolution of the Identity integral approximation (RI-MP2)
- Selected CI with second-order perturbation correction
- Operations performed by all methods
- Single point energy
- Geometry optimization (minimization and transition state)
- Molecular dynamics on the fully ab initio potential energy surface
- Normal mode vibrational analysis in cartesian coordinates
- Generation of an electron density file for graphical display
- Evaluation of static, one-electron properties
- Electrostatic potential fit of atomic partial charges
- Harmonic or hyperbolic RESP restraints
- Charge group constraints
- Interface provided to
- COLUMBUS multi-reference CI package
- Natural bond orbital (NBO) package
Periodic system electronic structure
- Gaussian Approach to Polymers, Surfaces and Solids (GAPSS), a DFT based method with many local and non-local exchange-correlation potentials
Classical mechanics
- Force field
- Effective pair potentials (AMBER, CHARMM, GROMOS)
- First order electronic polarization
- Self consistent polarization
- Smooth particle-mesh Ewald (PME) long range correction
- Distance constraints using SHAKE
- Other features
- Periodic boundary conditions (rectangular or truncated octahedron)
- Twin range energies and forces
- Constant pressure scaling
- Constant temperature scaling
- Operations
- Single configuration energies
- Energy minimization
- Molecular dynamics simulation
- Free energy simulation (except with PME)
- Multiconfiguration thermodynamic integration (MCTI)
- Multistep thermodynamic perturbation (MSTP)
- Single or dual topology
- Double-wide sampling
- Separation-shifted scaling
Combined quantum mechanics and classical mechanics
- Force field
- Quantum mechanical gradients
- Classical effective pair potentials
- Operations
- Single configuration energies
- Energy minimization
- Molecular dynamics simulation
ParSoft provides the high-performance, efficient, and portable computing libraries and tools that enable NWChem to run on a wide variety of parallel computing systems with leading-edge performance and scalability. ParSoft is targeted at both common and specific research requirements. The parallel software includes the Global Array toolkit, which provides an efficient and portable "shared-memory" programming interface for distributed-memory computers; the Parallel Eigensolver (PeIGS) Library for solving linear algebra on parallel architectures; and Chem I/O, a parallel input/output library.
Available ParSoft capabilities include:
- The Global Array (GA) Library which provides an efficient and portable shared-memory programming interface for distributed-memory computers [non-uniform memory access (NUMA)]. Both Fortran and C interfaces are supported. The GA Visualizer helps the programmer design efficient task scheduling by animating access patterns to sections of 2-dimensional arrays.
- The Disk Resident Arrays Library extends the GA NUMA programming model to data on disk storage, so that data can be efficiently transferred between disk and global memory.
- The Memory Allocator Library is a library of routines that supports dynamic memory allocation by C, Fortran or mixed-language applications.
- The Parallel EIGenSystem (PeIGS) library contains subroutines for solving standard and generalized real symmetric eigensystems on parallel computers.
- The TCGMSG Library is a toolkit for writing portable parallel programs using a message-passing model. A wide variety of common Unix workstations, mini-supercomputers and heterogeneous networks of the same are supported, along with many true parallel computers.
- The Programmable Array Compiler provides a language compiler that cross-compiles CM-Fortran to a parallel code using Fortran 77 with message passing.
MS3 is available to users for no charge through EMSL (http://www.emsl.pnl.gov). The ParSoft tools can be downloaded and license requests for Ecce and NWChem can be made from the EMSL web site (http://www.emsl.pnl.gov).
MS3 is the only comprehensive, integrated computational chemistry software suite that provides ease-of-use; portability on high-performance, massively parallel computing systems; and scalability to the problem and the computing system size. It is also unique in its ability to handle all levels of quantum chemical calculations and classical molecular dynamics simulations. Some of MS3's outstanding capabilities were recognized by selection of the paper "An Out-of-Core Implementation of the Massively Parallel Multi-Reference Configuration Interaction Program" as the Best Overall Paper at Supercomputing '98 in Orlando, Florida, November 7-13, 1998 (see http://www.supercomp.org/sc98/TechPapers/sc98_FullAbstracts/Dachsel897/).
MS3 Software Improvements
A new software development paradigm was used to develop MS3. This new paradigm is based on the realization that a key component of any successful modern software development program for scientific applications on massively parallel computers is the use of teams of computer scientists, applied mathematicians, application developers, and users to design and implement the software. The synergy of such efforts allows the development of the highest performing software with the best algorithms and the longest in-use lifetime. Such teams help minimize long-term development costs by developing software that is, to the maximum extent possible, portable and readily maintained and updated. This is especially true when tackling "Grand Challenge" computational problems where changes in computer architecture occur on a regular basis and new algorithms are constantly being developed.
The most notable features of MS3 are the integration of its three major components and its ability to allow a wide range of users to easily access high-performance, massively parallel computers to solve complex chemical problems.
- Although many of the features in MS3 are available piecemeal in other codes, it is the first truly integrated computational chemistry software system for massively parallel computers.
- It is the first truly portable and scalable computational chemistry software for parallel computers and the first modern problem-solving environment based on object-oriented databases for scientific software.
- It enables computational chemists to solve problems that are relevant to important national issues.
A model of the unique software architecture of MS3, with it's modular, integrated coupling of NWChem with ParSoft and Ecce is shown in Figure 2.
Figure 2. MS3 Infrastructure
MS3 Principal Applications
MS3 was developed to support the modeling and simulation of chemical systems relevant to U.S. Department of Energy environmental cleanup efforts, but it will also support research relevant to the other national issues described below. Originally developed to run on the 512-processor, IBM-SP massively parallel computing system in the EMSL Molecular Science Computing Facility, MS3 has since been exported to other high-performance computing systems, including the Cray T3D and T3E, the Intel Paragon, the KSR-1 and -2, Silicon Graphics Origin 2000, and clusters of workstations.
MS3 is currently in use at many of the national supercomputer centers, national laboratories, and universities. These are not just single users in most cases but are actually large computer centers. NWChem has been distributed to over 100 institutions and Ecce is used by researchers from over 10 institutions. Software from the ParSoft suite of tools has been distributed through the World Wide Web at http://www.emsl.pnl.gov/docs/parsoft/. It is being used by the computer industry, financial service companies, national laboratories, and many universities. There are about 20 downloads of the ParSoft software per month from the web site.
As noted above, there are many applications for the MS3. Calculations done with the software include:
- Understanding behavior of proteins for design of drugs and for sensors in cells
- Understanding the interaction of gram-negative bacteria with geochemical surfaces for modeling bioremediation
- Understanding the behavior of water clusters to model aqueous solutions
- Creating cluster models of ions binding in aqueous solutions and reactions of chlorocarbons in aqueous solutions
- Predicting thermodynamics of intermediates in decomposition of perfluoroalkanes (Teflon®)
- Modeling actinide complexes for remediating contaminated DOE sites
- Designing new separation systems for radioactive alkali metal ions
- Modeling zeolite catalysts, especially for applications in the petroleum and petrochemical industry
- Creating thermodynamic models of organic molecules for combustion simulation
- Predicting the behavior of inorganic clusters to model the first step in the formation of atmospheric aerosols important in atmospheric pollution.
MS3 will help researchers address the computational "Grand Challenge" problems in computational chemistry as addressed by the Chemical Industry's Vision 2020 subcommittee on computational chemistry:
- Reliable prediction of biological activity
- Reliable prediction of environmental fate from chemical structure
- Design of efficient catalysts for chemical processes
- Design of efficient processes in chemical plants from an understanding of microscopic molecular behavior
- Design of material with a given set of target properties.
MS3 Future Developments
For Ecce, future developments will focus on developing a three-tiered architecture to support
- Enhancement of the user interface to a Java environment
- Integration with collaborative technologies such as an electronic laboratory notebook and synchronous communications
- An interface to handle thermochemistry and reaction kinetics
- A molecular dynamics interface
- A secure communication layer to various database servers
- User-defined parsing and monitoring of data properties as calculations are computed.
These modifications will provide a problem-solving environment that addresses the needs of computational scientists working at different locations while using a variety of data sources.
For NWChem, future developments will include expansion of functionality to allow
- highly accurate calculations of energies for open shell systems
- analytic second derivatives for geometry optimization, transition states, and frequency determination for high correlation levels
- evaluation of complex frequency-dependent properties to augment additional experimental capabilities.
We are adding methods/algorithms to incorporate relativistic corrections for systems containing lanthanides/actinides and advanced computational chemistry techniques (e.g., molecular dynamics techniques). We will also investigate methods and algorithms needed to exploit the next generation of massively parallel computing systems.
ParSoft will be extended to support the needs of more application areas. Future capabilities will include support for multidimensional and sparse array data structures. A compression module will be added to parallel I/O to provide effective utilization of resources for out-of-core algorithms with sparse/compressed data. A new portable communication library called aggregate remote memory copy interface (ARMCI) will be included as a building block for new tools and libraries that require high-performance, one-sided asynchronous communication.
MS3 Potential Applications
There are an enormous number of potential applications for MS3 as new, faster, and more powerful high-performance computer systems are developed. Examples include
- Modeling protein-protein interactions for cell-signaling and understanding fundamental chemical processes
- Redesigning proteins for extreme environments, chemical processes, and bioremediation
- Performing relativistic chemistry calculations for predicting the chemical behavior of the actinides and other radioactive species, thereby significantly reducing the number of dangerous, costly, and difficult experiments
- Designing new catalysts for chemical processes
- Designing new drugs, including antivirals
- Designing new pesticides for agricultural practices
- Designing materials with specified properties
- Designing efficient chemical manufacturing processes based on microscopic molecular behavior
- Designing new semiconductors with feature sizes of 50 nanometers
- Designing nanomaterials and devices
- Simulating the behavior of liquids for predicting thermophysical properties
- Accurately predicting mechanisms for use in modeling combustion reactions and more energy-efficient combustion devices (boilers, engines, etc.)
- Predicting chemical reaction rates for atmospheric pollution models in the troposphere and stratosphere, including nucleation phenomena.
An important result will be reliable predictions of chemical phenomena with high accuracy and with error bars that can replace difficult and expensive experimental approaches for a broad range of molecules. We are just starting to do this today for small molecules, and continued development of the software and growth in computer power will enable us to continue revolutionizing the field of chemistry. Such computations will become even more important as we lose experimental capabilities in the measurement of many chemical phenomena.
MS3 Summary
Our nation is facing many technologically challenging issues as we try to sustain and improve our quality of life. Among the most important of these issues is our stewardship of the environment. We must clean up problems caused by past activities, and we must seek new ways of conducting our activities so we don't create new problems for future generations to deal with. The application of modern science and technology will greatly enhance our ability to solve these problems in a timely and cost-effective manner.
MS3 is a revolutionary suite of computational chemistry software that enables chemists to effectively use large-scale, high-performance, massively parallel computers to help solve complex national problems. The software is based on a new development paradigm and provides scalable performance across a wide range of computer architectures. It enables the scientific community to solve complex environmental problems in the atmosphere, in aquatic systems, and in the subterranean environment. In addition, it will be used in the search for new drugs to help us pursue longer, healthier lives; to improve our agricultural productivity; and to provide insights into how organisms work at the molecular level. Finally, it can be used to develop new products and processes that will enable both us and future generations to lead better lives.
MS3 Additional Information
Additional information on the MS3 components, including how to obtain the software, can be found at the following web pages.
