LAM/MPI logo

LAM 6.5.9 Installation Guide

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just 6.5
This page contains the installation instructions for LAM/MPI version 6.5.9. There are also some tips for writing/developing/running parallel programs, especially in parallel environments that are clusters of workstations. If you have problems configuring/compiling LAM, see the "Troubleshooting" section, below.

Here's a brief table of contents (### indicates updates for this version):

  1. ### For the impatient
  2. ### Unpacking the distribution
  3. ### Architecture-specific notes
  4. Configuration

  5. ### Building LAM
  6. ### Building LAM, ROMIO, and MPI 2 C++ examples
  7. ### Boot schema
  8. Using LAM

  9. ### Troubleshooting

  10. Clearing disk space
  11. Tuning LAM


    For the impatient

    If you don't want to read the rest of the instructions, the following should do the trick for most situations:
    	
         % gunzip -c lam-6.5.9.tar.gz | tar xf -
         % cd lam-6.5.9     % ./configure --prefix=/path/to/install/in
         [...lots of output...]
         % make
         [...lots of output...]
         % make install
         [...lots of output...]
         % make examples   # This step is optional
         [...lots of output...]
    

    If you do not specify a prefix, LAM will first look for "lamclean" in your path. If lamclean is found, it will use the parent of the directory where lamclean is located as the prefix. Otherwise, /usr/local is used (like most GNU software).

    Now go read the RELEASE_NOTES file; it contains all the information about the new features of this release of LAM/MPI.

    Common causes of failure:

    • No C++ compiler installed; use --without-mpi2cpp configure option
    • C++ compiler does not support required C++ features; use --without-mpi2cpp configure option
    • No Fortran compiler installed; user --without-fc configure option


    Unpacking the distribution

    The LAM distribution is packaged as a compressed tape archive, lam-6.5.9.tar.Z, lam-6.5.9.tar.gz, or lam-6.5.9.tar.bz2. It is available from the main LAM web site: http://www.lam-mpi.org.

    Uncompress the archive and extract the sources.

         % gunzip -c lam-6.5.9.tar.gz | tar xf -
    
    or
         % uncompress -c lam-6.5.9.tar.Z | tar xf -
    
    or
         % bunzip2 -c lam-6.5.9.tar.bz2 | tar xf -
    


    Architecture-specific notes

    LAM/MPI will build on just about any POSIX system. There are, however, a few restrictions:

    Microsoft Windows

    Microsoft Windows is not a POSIX platform. LAM/MPI currently will not build in a Windows environment.

    AIX

    It appears that GNU libtool does not presently support building shared libraries on AIX. This has been tested on AIX 4.3.3; it is not known if GNU libtool builds shared libraries on other versions of AIX.

    Additionally, in some cases, GNU libtool apparently does not function completely properly when using the "xlc" compiler. Use "cc", instead (they are both the same compiler anyway).

    Finally, there have been repeatable problems with AIX's "make" when building ROMIO. This does not appear to be ROMIO's fault -- it appears to be a bug in AIX's "make". The LAM Team suggests that you use GNU "make" (ftp://ftp.gnu.org/gnu/make/) when building on AIX platforms to avoid these problems.

    Various BSD systems

    The version of "make" that is distributed on some BSD systems (e.g., FreeBSD) requires the use of the "-i" parameter to some of LAM's make targets. For example:

    	  make -i clean
    

    HP-UX

    It appears that the default C++ compiler on HP-UX (CC) is a pre-ANSI standard C++ compiler. As such, it will not build the C++ bindings package. The C++ compiler aCC should be used to build the C++ bindings package. The C++ compiler can be specified by specifying --with-cxx=aCC as an option to configure.


    Configuration

    LAM uses a GNU configure script to perform site and architecture specific configuration.

    Change directory to the top level LAM directory (lam-6.5.9) and run the configure script.

         % ./configure {options}
    
    or
         % sh ./configure {options}
    

    By default the configure script sets the LAM install directory to the parent of where lamclean is found (if it is in your path), or /usr/local if lamclean is not in your path. This can be overridden with the --prefix option (see below).


    ROMIO issues

    Note that the ROMIO package does not currently support many GNU-like configure switches. In particular, attempting to use any of the directory-specifying options (other than --prefix) will not work as expected with ROMIO. ROMIO installs everything under $(DESTDIR)$prefix. Hence, if you attempt to use switches such as --libdir, --bindir, etc. to LAM's configure, all of the LAM (and the C++ bindings) will install as expected, but ROMIO will still install itself under $prefix.

    The ROMIO authors have been notified of this issue.


    MPI 2 C++ issues

    C++ Exceptions

    The default is to build LAM/MPI with the C++ bindings, but without C++ exception support.

    Enabling C++ exceptions typically entails a slight degradation of run-time performance because of extra bootstrapping required for every function call (particularly with gcc/g++). As such, they are disabled by default, and the MPI::ERRORS_THROW_EXCEPTIONS error handler will only print out error messages. If full exception handling capabilities are desired, LAM must be configured with the "--with-exceptions" flag. It should be noted that some C++ (and C and Fortran) compilers need additional command line flags to properly enable exception handling.

    For example, with gcc/g++ 2.95.2 and later, gcc, g77, and g++ all require the command line flag "-fexceptions". gcc and gf77 require "-fexceptions" so that they can pass C++ exceptions through C and Fortran functions properly. As such, all of LAM/MPI must be compiled with the appropriate compiler options, not just the C++ bindings. Using MPI::ERRORS_THROW_EXCEPTIONS without having compiled LAM with proper exception support will cause undefined behavior (read: core dumps and other Bad Things).

    If building with IMPI or the C++ bindings, LAM's configure script will automatically guess the necessary compiler exception support command line flags for the gcc/g++ and KCC compilers. That is, if a user selects to build the MPI 2 C++ bindings and/or the IMPI extensions, and also selects to build exception support, and g++ or KCC is selected as the C++ compiler, the appropriate exceptions flags will automatically be used.

    Users with other compilers that require command line flags for exception support should use the "--with-exflags=FLAGS" command line switch to configure.

    Note that this also applies even if you do not build the C++ bindings -- if LAM is to call C++ functions that may throw exceptions (e.g,. from an MPI error handler or other callback function), you need to build LAM with the appropriate exceptions compiler flags.

    Mixing Vendor Compilers

    A single vendor product line should be used to compile all of the C, Fortran, and C++ code. That is, if gcc is used to compile LAM, g++ should be used to compile the C++ bindings, and gcc/g++/g77 should be used to compile any user programs. Mixing multiple vendors' compilers between different components of LAM/MPI and/or to compile user MPI programs, particularly when using the C++ MPI bindings, is almost guaranteed not to work.

    C++ compilers are not link-compatible -- compiling the C++ bindings with one C++ compiler and compiling a user program that uses the MPI C++ bindings will almost certainly produce linker errors.

    Indeed, if exception support is enabled in the C++ bindings, it will only work if the C and/or Fortran code knows how to pass C++ exceptions through their code. This will only happen properly when the same compiler (or a single vendor's compiler product line, such as gcc, g77, and g++) is used to compile all components -- LAM/MPI, the C++ bindings, and the user program. Using multiple vendor compilers with C++ exceptions will almost certainly not work (read: core dumps and other Bad Things).

    The one possible exception to this rule (pardon the pun) is the KCC compiler. Since KCC turns C++ code to C code and then gives it to the back end "native" C compiler, KCC may work properly with the native C and Fortran compilers.


    VPATH builds

    Alternatively, LAM supports the "VPATH" building mechanism. If LAM/MPI is to be installed in multiple environments that require different options to configure, or require different compilers (such as compiling for multiple architectures/operating systems), the following form can be used to configure LAM:

         % cd /some/temp/directory
         % LAMTOP/configure {options}
    

    where LAMTOP is the directory where the LAM/MPI distribution tarball was expanded. This form will build the LAM executables and libraries under /some/temp/directory and will not produce any files in the LAMTOP tree. It allows multiple, concurrent builds of LAM/MPI from the same source tree.

    Note that you must have a VPATH-enabled "make" in order to use this form. The GNU "make" (ftp://ftp.gnu.org/gnu/make/) supports VPATH builds, for example, but the Solaris Workshop 5.0 "make" does not. Parts of LAM/MPI may compile correctly in a VPATH build without a VPATH-enabled compiler, but ROMIO will not.


    ./configure switches

    The configure script will create several configuration files, including share/include/lam_config.h. You may wish to inspect this file for a sanity check, but ./configure usually guesses correctly.

    There are many options available from the configure script. You can use the command "./configure --help" to list them all. An explanation of each follows (shown here in alphabetical order):

    --disable-static

    Do not build static libraries. This flag is only meaningful when --enable-shared is specified; if this flag is specified without --enable-shared, it is ignored, and static libraries are created.

    --enable-echo

    Will echo all of the commands that configure executes. This is usually for debugging purposes only, and is not recommended for end users.

    --enable-shared

    Build shared libraries. Note that this option is incompatible with --with-romio (which is the default) and --with-mpi2cpp (which is also the default) because (among other reasons) ROMIO expects to find libmpi.a, not libmpi.so.

    Also note that enabling building shared libraries does not disable building the static libraries. Specifying --enable-shared without --disable-static will result in a build taking twice as long, and installing both the static and shared libraries.

    Finally, note that neither ROMIO nor the MPI 2 C++ bindings do not currently support shared libraries. They will always be built as static libraries.

    --prefix=PREFIX

    Sets the installation location for the LAM binaries, libraries, etc., to the directory PREFIX. PREFIX must be specified as an absolute directory name.

    --with-cc=CC

    Use the C compiler CC. The C compiler can also be selected by setting the "CC" environment variable before running configure. This compiler will be used both to compile LAM, and as the default compiler for the hcc(1) and mpicc(1) wrapper compilers.

    --with-cflags=CFLAGS

    Use the C compiler flags CFLAGS. The flags passed to the C compiler can also be selected by setting the "CFLAGS" environment variable before running configure. These flags are used to compile LAM, ROMIO, and some example programs that come with LAM. If CFLAGS are not specified, ./configure will pick optimization flags to use.

    These flags are not used as default flags in any of the wrapper compilers.

    --with-cxx=CXX

    Use the C++ compiler CXX. The C++ compiler can also be selected by setting the "CXX" environment variable before running configure. This compiler will be used to compile the MPI 2 C++ bindings, IMPI support, and will be used as the default compiler for the hcp(1) and mpiCC(1) wrapper compilers.

    --with-cxxflags=CXXFLAGS

    Use the C++ compiler flags CXXFLAGS. The flags passed to the C++ compiler can also be selected by setting the "CXXFLAGS" environment variable before running configure. These flags will be used when compiling the MPI 2 C++ bindings, IMPI support, as well as some example programs that come with LAM. If CXXFLAGS are not specified, ./configure will pick optimization flags to use.

    These flags are not used as default flags in any of the wrapper compilers.

    --with-cxxldflags=CXXLDFLAGS

    Use the C++ linker flags CXXLDFLAGS. These flags will be used when compiling the MPI 2 C++ bindings, IMPI support, as well as some example programs that come with LAM. If CXXFLAGS are not specified. ./configure will pick optimization flags to use.

    These flags are not used as default flags in any of the wrapper compilers.

    --with-exceptions

    Used to enable exception handling support in the C++ bindings for MPI. Exception handling support (i.e., the MPI::ERRORS_THROW_EXCEPTIONS error handler) is disabled by default. See the section "MPI 2 C++ Issues", above.

    --with-exflags=FLAGS

    Used to specify any command line arguments that are necessary for the C, C++, and Fortran compilers to enable C++ exception support. This switch is ignored unless --with-exceptions is also specified.

    This switch is unnecessary for gcc/g77/g++ version 2.95 and above -- "-fexceptions" will automatically be used (when building --with-exceptions). Additionally, this switch is unnecessary if the KCC compiler is used -- "-x" is automatically used.

    See the section entitled "MPI 2 C++ Issues", above.

    --with-fc=FC

    Use the Fortran compiler FC. Specify FC=no (or --without-fc) to disable Fortran support if you do not have a Fortran compiler or do not require such support. This compiler will be used both to compile LAM, and as the default compiler for the hf77(1) and mpif77(1) wrapper compilers.

    --with-fflags=FFLAGS

    Use the Fortran compiler flags FFLAGS when compiling LAM. The flags passed to the Fortran compiler can also be selected by setting the "FFLAGS" environment variable before running configure. These flags will be used only when compiling some example programs that come with LAM. If FFLAGS are not specified, ./configure will pick optimization flags to use.

    These flags are not used as default flags in any of the wrapper compilers.

    --with-impi

    Use this switch to enable the IMPI extensions. The IMPI extensions are still considered experimental, and are disabled by default.

    --with-lamd-ack=SEC

    Number of seconds until an ACK is resent between LAM daemons. You probably shouldn't need to change this; the default is one a second.

    --with-lamd-hb=SEC

    Number of seconds between heartbeat messages in the LAM daemon (only applicable when running in fault tolerant mode). You probably shouldn't need to change this; the default is 120 seconds.

    --with-lamd-boot=SEC

    Set the default number of seconds to wait before a process started on a remote node is considered to have failed (e.g., during lamboot). You probably shouldn't need to change this; the default is 60 seconds.

    --with-ldflags=LDFLAGS

    Use the LD linker flags LDFLAGS. If this flag is not set on the ./configure command line, the value for CFLAGS is used. These flags are used to link LAM executables and all example programs that come with LAM. If LDFLAGS (and CFLAGS) are not specified, ./configure will pick optimization flags to use.

    These flags are not used as default flags in any of the wrapper compilers.

    --without-mpi2cpp

    Build LAM without the MPI-2 C++ bindings (see chapter 10 of the MPI-2 standard); the default is to build them. The C++ bindings require some advanced features of the C++ compiler. While most modern C++ compilers now support all the required features, you may encounter problems on some platforms. Consult the mpi2c++/README file for more information.

    --without-profiling

    Build LAM/MPI without the MPI profiling layer. The default is to build this layer, since ROMIO uses it. See the --without-romio option for more details.

    --with-pthread-lock

    Use a process shared pthread mutex to lock access to the shared memory pool rather than the default SYSV semaphore. This option is only valid with the "usysv" RPI, and on systems which support process shared pthread mutexes.

    --with-purify

    Causes LAM to zero out all data structures before using them. This option is not necessary to make LAM function correctly (LAM already zeros out relevant structure members when necessary), but it is very helpful when running MPI programs through memory checking debuggers, such as purify and the Solaris Workshop bcheck program. See the "Zeroing out LAM buffers before use" section of the RELEASE_NOTES file for more information. The default is to not enable this option.

    --without-romio

    Build LAM without ROMIO support (ROMIO provides the MPI-2 I/O support, see chapter 9 of the MPI-2 standard); the default is to build with ROMIO support. ROMIO is known to only work on certain systems. Consult the romio/README file for more information. Note that this option is incompatible with --with-shared, because (among other reasons) ROMIO expects to find libmpi.a, not libmpi.so.

    Note also that building ROMIO implies building the profiling layer. ROMIO makes extensive use of the MPI profiling layer; that is you cannot select --without-profiling without also specifying --without-romio.

    --with-romio-flags=FLAGS

    Pass FLAGS to ROMIO's configure script when it is invoked during the build process. This switch is to effect specific behavior in ROMIO, such as building for a non-default file system (e.g., PVNFS). Note that LAM already sends the following switches to ROMIO's configure script -- the --with-romio-flags switch should not be used to override them:

    • --prefix
    • -mpi
    • -mpiincdir
    • -cc
    • -fc
    • -debug (if -g is specified in CFLAGS)
    • -cflags
    • -fflags
    • -nof77 (if --without-fc is selected in LAM)
    • -make
    • -mpilib

    --with-rpi=RPI

    Build with request progression interface (RPI) transport layer RPI [RPI=tcp]. RPI must one of: tcp, sysv, or usysv. If this option is not specified, the RPI transport layer defaults to tcp. Please refer to the RELEASE_NOTES file for descriptions of the RPI transport layers.

    --with-rsh=RSH

    Use RSH as the remote shell command. For example if you want to use the secure shell ssh then specify --with-rsh="ssh -x" (note that the "-x" is necessary to prevent the ssh 1.x series of clients from sending its standard banner information to standard error, which will cause recon/lamboot/etc. to fail). This shell command will be used to launch commands on remote nodes from binaries such as lamboot, wipe, etc. The command can be one or more shell words, such as a command and multiple command line switches.

    This value can be overridden at recon/lamboot/etc. run time with the LAMRSH environment variable. See the RELEASE_NOTES file for more details.

    --with-select-yield

    Force the use of select() to yield the processor.

    --with-shm-maxalloc=BYTES

    Use BYTES as the size of the maximum allocation from the shared memory pool. If no value is specified, configure will set the size according to the value of shm-poolsize (below). See "Usysv and Sysv transports", below.

    --with-shm-poolsize=BYTES

    Use BYTES as the size of the shared memory pool. If no size is specified, configure will determine a suitably large size to use. See "Ususv and Sysv transports", below.

    --with-shm-short=BYTES

    Use BYTES as the maximum size of a short message when communicating via shared memory. Default is 8 KB.

    --without-shortcircuit

    Disable the send/receive short circuiting optimization. The short circuit optimization has proven to be fairly stable, and this option is not usually necessary. It remains for hysterical raisins.

    --with-signal=SIGNAL

    Use SIGNAL as the signal used internally by LAM. The default value is "SIGUSR2". To set the signal to "SIGUSR1" for example, specify --with-signal=SIGUSR1.

    --with-tcp-short=BYTES

    Use BYTES as the maximum size of a short message when communicating over TCP. Default is 64 KB. This is relevant to all RPIs, since the shared memory RPIs are multi-protocol -- they will use TCP when communicating with MPI ranks that are not in the same node.

    --with-thread

    This option is not yet supported. Do not use it.

    --with-trillium

    Build and install the Trillium support executables, header files, and man pages. These extra Trillium executables, header files, and man pages are not necessary for normal MPI operation; they are intended for Trillium developers and certain third party products that interact with the lower layer of LAM/MPI. Building XMPI (http://www.lam-mpi.org/software/xmpi/), for example, requires that all the Trillium header files were previously installed. Hence, if you intend to compile XMPI after installing LAM/MPI, you should use this option.

    Building the extra Trillium executables and installing the Trillium header files and man pages used to be the default in prior versions of LAM/MPI. However, since few users actually used them, it has been relegated to an option.

    Example:

     % ./configure --with-rpi=usysv --with-cc=/bin/cc \
             --with-cflags=-O4 -without-fc
    

    Compile for the usysv RPI using the C compiler /bin/cc with options -O4 and disable Fortran support.


    64 bit LAM

    LAM has been verified as being 64 bit clean under Solaris 7, AIX 4.3.3, IRIX 6.5, and Alpha/Linux 2.2.x. To compile LAM with the 64 bit architecture, you will likely need to add compiler and linker flags with configure. For example, if you are using the Solaris Workshop 5.0 compilers on Solaris 7, you can use the following:

         % ./configure --with-cflags='-xarch=v9' --with-ldflags='-xarch=v9'
    

    Other compilers/architectures will have their own flags to enable 64 bit compilation; consult the documentation for your compiler. Of course, you can also add in any debugging/optimization flags in the cflags and ldflags strings as well.


    Building LAM

    Once the configuration step has completed, build LAM by doing:

         % make
    

    in the top level LAM directory. This will build the LAM binaries and libraries within the distribution source tree. Once they have compiled properly, you can install them with:

         % make install
    

    NOTE: Previous version of LAM included "make install" in the default "make". THIS IS NO LONGER TRUE. You must execute "make install" to install the LAM executables, libraries, and header files to the location specified by the --prefix option to configure.

    Building LAM, ROMIO, and MPI 2 C++ examples

    LAM and the ROMIO and MPI-2 C++ packages all include example code that can be built with a single top-level "make examples". Note that the examples can only be built after a successful "make install", and $prefix/bin has been placed in your $path.

         % make examples
    

    This will do the following (where TOPDIR is the top-level directory of the LAM source tree):

    1. Build the LAM examples. They are located in:
              TOPDIR/examples
      
    2. If LAM was configured to build the C++ examples (i.e., if you did not configure with --without-mpi2cpp), the MPI 2 C++ examples will be built. They are located in:
              TOPDIR/mpi2c++/contrib
      
    3. If you configured LAM with ROMIO support (i.e., if you did not configure with --without-romio), the ROMIO examples will be built. See the notes about ROMIO in the RELEASE_NOTES file. They are located in:
              TOPDIR/romio/test
      

    Additionally, the following three commands can be used to build each of the packages' examples separately (provided that support for each was compiled in to LAM) from TOPDIR:

         % make lam-examples
         % make romio-examples
         % make mpi2c++-examples
    


    Boot schema

    A boot schema is a description of a multicomputer on which LAM will be run. You can create boot schema files (see bhost(5) for syntax) for typical configurations of the local multicomputer(s). Place these files under etc/ in the installation directory. They will be found by LAM tools such as lamboot(1), recon(1) and wipe(1) if you do not specify a filename on the command line to use instead of the default.

    The default etc/lam-bhost.def file comes with a single line:

    	localhost
    

    So that if you simply do "lamboot", you will get a LAM with one node (the localhost) booted.

    You can re-write the etc/lam-bhost.def file if you are frequently going to boot LAM to the same configuration. For example, if you frequently use 4 workstations: inky, blinky, pinky, and clyde, you can have a etc/lam-bhost.def files as follows:

    	inky
    	blinky
    	blinky
    	blinky
    	blinky
    	pinky cpu=2
    	clyde user=lamrocks
    

    Note that "blinky" is listed 4 times. This tells LAM/MPI that blinky has 4 CPUs (relevant for the "C" notation to the mpirun command; see mpirun(1)). An alternate (and equivalent) notation is used for pinky -- "cpu=2" specifies that pinky has 2 CPUs.

    You can also specify different remote usernames on the remote nodes; the username "lamrocks" is used on the machine "clyde" in the above example.


    Using LAM

    If the LAM installation directory is moved after it is built, users must set the LAMHOME environment variable to the new location. This is the only case where the LAMHOME environment variable should be set -- otherwise, it should be left unset. See "The LAMHOME and TROLLIUSHOME environment variables", below.

    On each UNIX machine, users must add the LAM executable directory to their shell's search path. LAM executables are found under $prefix/bin. These steps must be taken on each and every machine that might be part of a multicomputer running LAM. Set the variables in the shell's start-up file, not the .login file.

    Typical usage

    LAM is a daemon-based implementation of MPI. This means that a daemon process is launched on each machine that will be in the parallel environment. Once the daemons have been launched, LAM is ready to be used. A typical usage scenario is as follows:

    • Boot LAM on all the nodes
    • Run MPI programs
    • Shut down LAM

    LAM does not need to be booted in order to compile MPI programs.

    LAM is a user-based MPI environment; each user who wishes to use LAM must boot their own LAM environment. LAM is not a client-server environment where a single LAM daemon can service all LAM users on a given machine. There are no future plans to make LAM client-server oriented (unless someone volunteers to write it :-).

    As a side-effect of this design, each user must have an account on each machine that they wish to use LAM on.

    The LAMHOME and TROLLIUSHOME environment variables

    Note that it is typically not necessary to set the LAMHOME and/or TROLLIUSHOME environment variables. These variables are only necessary of the $prefix of the LAM installation is moved after "make install" was run.

    As such, there are very few cases when one would need to set LAMHOME or TROLLIUSHOME. The LAM Team recommends that you leave these variables unset.

    Starting LAM

    The recon(1) tool checks if LAM can be started on the given boot schema. There are several prerequisites that enable LAM to be started on a remote machine:

    • The machine must be reachable and operational.
    • The user must have an account on the machine.
    • The user must be able to rsh(1) to the machine (typically, permissions must be set in the user's .rhosts file on the machine).
    • The user must be able to write to /tmp.
    • The LAM executables must be locatable on that machine, using the shell's search path and possibly the LAMHOME environment variable, as described above.
    • The shell's start-up script must not print anything on standard error. The user can take advantage of the fact that rsh(1) will start the shell non-interactively. The start-up script can exit early in this case, before executing many commands relevant only to interactive sessions and likely to generate output.

    All of these prerequisites must be met before LAM will function properly. If recon does not complete successfully, the "-d" option will give verbose descriptions of what it tried to do, and suggestions to fix the problem.

    Also keep in mind that just because recon works, lamboot itself may still fail. This usually happens when the "hboot" program (that lamboot invokes on remote nodes) fails for some reason. Again, the "-d" option to lamboot will enable extremely verbose output, and suggest solutions to common problems.

    Users should read the lam(7) manual page to get started using LAM tools and libraries.

    Additionally, the University of Notre Dame offers a "Getting Started with LAM" tutorial, that, although somewhat biased towards the LAM Team's computing environment, is a good starting point to getting familiar with LAM.

    http://www.lam-mpi.org/tutorials/lam/

    Common filesystems

    A common environment to run LAM is in a Beowulf-class or other workstation cluster. Simply stated, LAM can run on a group of workstations connected by a network. As mentioned above, there are several prerequisites, however (the user must have an account on all the machines, the user can rsh [or ssh, or whatever other remote shell transport capability is desired -- see above for how to change the underlying remote shell transport] to all the machines, etc.).

    This raises the question for LAM system administrators: where to install the LAM binaries, header files, etc.? There are two main choices:

    1. Have a common filesystem, such as NFS, between all the machines to be used. Install the LAM files such that the LAM executables can be found in the same directory on each node. This will greatly simplify user's .cshrc/.profile scripts -- the value of the $PATH can be set without checking which machine the user is on. It also simplifies the system administrator's job; when the time comes to patch or otherwise upgrade LAM, only one copy needs to be modified.

      For example, consider a cluster of four machines: inky, blinky, pinky, and clyde. If the LAM binaries et al. are installed on inky's local hard drive in the directory /home/lam, the system administrator has two main choices:

      • mount inky:/home/lam on the remaining three machines, such that /home/lam on all machines is effectively "the same". That is, the following directories all contain the LAM binaries:

        • inky:/home/lam
        • blinky:/home/lam
        • pinky:/home/lam
        • clyde:/home/lam

      • mount inky:/usr/local/src/lam-6.5.9 on all four machines in some other common location, such as /home/lam (a symbolic link can be installed on inky instead of a mount point for efficiency). This strategy is typically used for environments where one tree is NFS exported, but another tree is typically used for the location of binaries. For example, the following directories all contain the LAM binaries:

        • inky:/home/lam
        • blinky:/home/lam
        • pinky:/home/lam
        • clyde:/home/lam

      Notice that there are the same four directories as the previous example, but on inky, the directory is actually located in /usr/local/src/lam-6.5.9. There is a bit of a disadvantage in this approach; each of the remote nodes have to incur NFS (or whatever filesystem is used) delays to access the LAM directory tree. However, both the administration ease and low cost (relatively speaking) of using a networked file system usually greatly outweighs the cost.

    2. If you are concerned with networked filesystem costs of accessing the LAM binaries, you can install LAM on the local hard drive of each node in your system. Again, it is highly advisable to install LAM in the same directory on each node so that user's $PATH can be set to the same value, regardless of the node that a user has logged on to.

      This approach will save some network latency of accessing the LAM binaries, but is only used where users are very concerned about squeezing every spare cycle out of their machines.

    Using LAM with AFS

    AFS has some peculiarities, especially with file permissions when using rsh. However, most sites tend to install the Transarc rsh replacement (i.e., the one that passes tokens to the remote machine) as the default rsh, so when you "rsh" to a remote machine (with recon or lamboot), your AFS token will be passed to the remote LAM daemon automatically. If your site does not install the Transarc replacement rsh as the default, consult the documentation on "--with-rsh" (above) to see how to set the path to the rsh that LAM will use.

    Once you use the replacement rsh, you should get a token on the other side. This means that your LAM daemons are running with your AFS token, and you should be able to run any program that you wish, including those that are not system:anyuser accessible. You will even be able to write into your filespace (as you would expect).

    Keep in mind, however, that AFS tokens have limited lives, and will eventually expire. This means that your LAM daemons (and user MPI programs) will lose their AFS permissions after some specified time unless you renew your token (with the "klog" command, for example) on the originating machine before the token runs out. This can play havoc with long-running MPI programs that periodically write out file results; if you lose your AFS token in the middle of a run, and your program tries to write out to a file, it won't have permission to, which may cause Bad Things to happen.

    If you need to run long MPI jobs with LAM on AFS, it is usually advisable to ask your AFS administrator to increase your default token life time to a large value, such as 2 weeks.

    Using LAM with ssh

    Note that you can change the remote transport agent that LAM uses to spawn the LAM daemons. While rsh is the default, it can be changed to other agents, such as ssh.

    ssh is a popular choice because of the added security that it provides over the .rhosts security provided by rsh. And since ssh can pass AFS tokens, it presents an attractive, highly secure, yet fully-AFS-authenticated method, for invoking LAM.

    If you choose to use ssh, the 1.x series of ssh will require the use of the "-x" command line flag to prevent ssh from printing its standard banner information to stderr. lamboot/recon/etc. interprets information on stderr to mean that a remote invocation has failed; ssh's "-x" will prevent this. (We do not have access to SSH 2.x clients -- they may require a similar command line flag).

    Note that using ssh (or any other agent) only changes the way that LAM is invoked. Once LAM is invoked, it sets up its own sockets for communication that are outside of ssh (and are therefore not encrypted). ssh provides stronger security only during lamboot and wipe. Once the LAM daemons are launched, all MPI meta information is passed through separate channels (such as startup of user programs) which are independent of ssh.


    Troubleshooting

    Problems with building LAM

    It is highly recommended that you execute the following steps in order. Many people have similar problems with configuration and initial setup of LAM, and most common problems have already been answered in one way or another.

    1. Check the LAM FAQ:

      http://www.lam-mpi.org/faq/
    2. Check the mailing list archives. Use the "search" features to check old posts and see if others have asked the same question and had it answered:

      http://www.lam-mpi.org/MailArchives/lam/
    3. If you do not find a solution to your problem in the above resources, and your problem specifically has to do with building LAM, send the following information to the LAM mailing list (see the next section below about sending mail to the LAM mailing list):

      • The result of "uname -a" on your system
      • The result of "./config/config.guess" from the top-level LAM source directory.
      • Output from when you ran "./configure" to configure LAM
      • The config.log file from the top-level LAM directory
      • The share/include/lam_config.h file
      • Output from when you ran "make" to build LAM

      To capture the output of the configure and make steps you can use the script command or the following technique if using a csh style shell:

           % ./configure {options} |& tee config.LOG
           % make install          |& tee make.LOG
      
      or if using a Bourne style shell:
           % ./configure {options} 2>&1 | tee config.LOG
           % make install 2>&1          | tee make.LOG
      

    The LAM/MPI Mailing Lists

    There are two mailing lists: one for LAM/MPI announcements, and another for questions and user discussion of LAM/MPI.

    1. Announcement list.

      This is a low-volume list that is used to announce new version of LAM/MPI, important patches, etc. To subscribe to the LAM announcement list, visit its list information page (you can also use that page to unsubscribe or change your subscription options):

      http://www.lam-mpi.org/mailman/listinfo.cgi/lam-announce
    2. General discussion/user list.

      This list is used for general questions and discussion of LAM/MPI. User can post questions, comments, etc. to this list. Due to problems with spam, only subscribers are allowed to post to the list. To subscribe or unsubscribe from the list, visit the list information page:

      http://www.lam-mpi.org/mailman/listinfo.cgi/lam/

      After you have subscribed (and received a confirmation e-mail), you can send mail to the list at the following address:

      YOU MUST BE SUBSCRIBED IN ORDER TO POST TO THE LIST
      lam at lam dash mpi dot org
      YOU MUST BE SUBSCRIBED IN ORDER TO POST TO THE LIST

      NOTE: People tend to only reply to the list; if you subscribe, post, and then unsubscribe from the list, you will likely miss replies.

      Also please be aware that lam at lam dash mpi dot org is a list that goes to several hundred people around the world -- it is not uncommon to move a high-volume exchange off the list, and only post the final resolution of the problem/bug fix to the list. This prevents exchanges like "Did you try X?", "Yes, I tried X, and it did not work.", "Did you try Y?", etc. from cluttering up peoples' inboxes.

      Problems with running LAM and/or user programs

      Check the LAM FAQ and mailing list archive resources mentioned in the previous section (Problems with building LAM). If you do not find the solution to your problem there, send mail to the LAM mailing list: lam at lam dash mpi dot org.

      Some typical problems with rsh include the following:

      • Incorrect permissions on a user's home directory
      • Incorrect permissions on $HOME/.rhosts
      • No entry (or incorrect entry) in $HOME/.rhosts

      Some typical problems with a user's environment include the following:

      • User's .cshrc/.profile does not put $prefix/bin in the path
      • Inaccessible permissions on the program that you are trying to run
      • Inaccessible permissions on the /tmp directory

      Insufficient shared resources

      When using the sysv or usysv RPIs, the operating system may run out of shared memory and/or semaphores. This is typically indicated by failing to run an MPI program, or failing to run more than X copies of an MPI program on a single node.

      To fix this problem, your operating system settings need to be modified to increate the allowable shared semaphores/memory.

      For Linux, teconfiguration can only be done by building a new kernel. First modify the appropriate constants in include/asm-[arch]/shmparam.h or include/linux/shm.h. Increasing SHMMAX will allow larger shared segments and increasing _SHM_ID_BITS allows for more shared memory identifiers (this information is likely from 2.0/2.2 linux kernels; it may or may not have changed in more recent versions).

      For Solaris, reconfiguration can be done by modifying /etc/system and then rebooting. See the Solaris man page system(4).

      For example to set the maximum shared memory segment size to 32 MB put the following in /etc/system:

           set shmsys:shminfo_shmmax=0x2000000
      

      If you are using the sysv transport and are running out of semaphores then the following tunables can be set.

           set semsys:seminfo_semmap=32
           set semsys:seminfo_semmni=128
           set semsys:seminfo_semmns=1024
      

      Please consult your system documentation for help in determining the correct values for your systems.


      Clearing disk space

      After LAM has been built, all of the objects can be removed by running the make(1) utility with the "clean" target in the source directory.

           % make clean 
      

      NOTE: If you are using a really picky version of make (such as OpenBSD's make), you may need to use "make -i clean".

      If you're really desperate for more space, a bit more space can be reclaimed by running:

           % make distclean
      

      NOTE: Again, if you are using a really picky version of make (such as OpenBSD's make), you may need to use "make -i distclean".

      If further space is required, the entire source directory can be taken off-line (indeed, "make distclean" returns the LAM source tree to the same state as it was when it was unpacked from the original distribution tarball). Only the installation directory need be maintained on-line.


      Tuning LAM

      There are various constants defined in the LAM header files which relate to message transfer protocols, shared memory allocation, and so on. Some of these are configurable via the configure script; it is hoped that in time, more and more options will be configurable.

      This section is intended to describe some of these constants so that LAM users can experiment with tuning the MPI library. It also provides some description of the transport layer internals which may help LAM users better understand the behavior and performance they see from the LAM MPI library.

      Short/long protocol

      LAM MPI uses a short/long message protocol. If a message is "short", it is sent together with a header in one transfer to the destination process. If the message is "long", then a header (possibly with some data) is sent to the destination. The sending process then waits for an acknowledgment from the receiver before sending the rest of the message data. The receiving process sends the acknowledgment when a matching receive is posted.

      The crossover point from "short" to "long" message is configurable in each transport. See the transport specific section tcp, sysv, or usysv for further information.

      Shortcircuit send/receive

      Typically, when a message is sent or received, LAM creates a request structure, fills it with information about the message, links the request into a list of messages, and calls a progression "engine" to effect the data transfer.

      When there are no active requests and a blocking (standard mode) send or receive is done, the overhead of creating the request and linking it into the list can be bypassed (shortcircuited) and the progression "engine" called directly to effect the transfer.

      In prior versions of LAM/MPI, this option was not the default. It is now used by default, unless specifically disabled via the configure script.

      TCP transport

      The crossover point from "short" to "long" message is configurable via the constant TCPSHORTMSGLEN in share/include/lam_config.h (relative to the top of the LAM build tree). It can also be set from the configure script via the --with-tcp-short option. The default is 64KB.

      This number is relevant to all the RPIs. The shared memory RPIs are multi-protocol; they will use LAM/MPI use TCP to communicate with ranks that are not on the same node.

      Usysv and sysv transports

      Descriptions of the usysv and sysv transports can be found in the "RPI transport layers" section of the RELEASE_NOTES file.

      Configuration constants for the usysv and sysv transports are found in share/include/rpi.shm.h (from the top of the LAM build directory).

      In these transports, processes on different nodes communicate via TCP sockets. The crossover point from "short" to "long" messages for these communications is configurable via the constant TCPSHORTMSGLEN. It can also be set from the configure script via the --with-tcp-short option. The default is 64KB.

      Processes located on the same node communicate via shared memory. The transport allocates one SYSV shared segment shared by all processes in the tasks which are on the node. This segment is logically divided into two areas.

      The "postbox" area contains postboxes for "short" message communication. A postbox is used for communication one-way between two processes. The space allocated per postbox is SHMSHORTMSGLEN + CACHELINESIZE. SHMSHORTMSGLEN is configurable (via the configure option --with-shm-short). It is the the crossover point from "short" to "long" messages in shared memory communication; the default value is 8 KB.

      CACHELINESIZE must be the size of a cache line or a multiple thereof. The default setting is 64 bytes. You shouldn't need to change it. CACHELINESIZE bytes in the postbox are used for a cache-line sized synchronization location.

      The size of the postbox area is np (np-1) (SHMSHORTMSGLEN + CACHELINESIZE) bytes.

      The rest of the shared memory area is used as a global pool from which space for long message transfers is allocated. Allocation from this pool is locked. The default lock mechanism is a SYSV semaphore but the configure option --with-pthread-lock can be used to change this to a process shared pthread mutex lock. The size of this pool is configurable via the constant LAM_MPI_SHMPOOLSIZE, and by the configure option --with-shm-poolsize.

      The configure script will try to determine a size for the pool if none is explicitly specified. You should always check this to see if it is reasonable. Larger values should improve performance especially when an application passes large messages, but will also increase the system resources used by each task.

      The total size of the shared segment allocated is 2 CACHELINESIZE + LAM_MPI_SHMPOOLSIZE + np (np-1) (SHMSHORTMSGLEN + CACHELINESIZE). The 2 CACHELINESIZE bytes are for the global pool lock.

      Use of the global pool

      When a message larger than 2 SHMSHORTMSGLEN is sent, the transport sends SHMSHORTMSGLEN bytes with the first packet. When the acknowledgment is received, it allocates (message length - SHMSHORTMSGLEN) bytes from the global pool to transfer the rest of the message.

      To prevent a single large message transfer from monopolizing the global pool, allocations from the pool are actually restricted to a maximum of LAM_MPI_SHMMAXALLOC bytes. Even with this restriction, it is possible for the global pool to temporarily become exhausted. In this case, the transport will fall back to using the postbox area to transfer the message. Performance will be degraded, but the application will progress.

      LAM_MPI_SHMMAXALLOC is configurable via the configure option --with-shm-maxalloc or editing rpi.shm.h.

      Synchronization

      The usysv and sysv transports differ only in the mechanism used to synchronize the transfer of messages via shared memory. The usysv transport uses spin locks with back-off, while the sysv transport uses SYSV semaphores.

      Both transports use a few SYSV semaphores for synchronizing the deallocation of shared structures or for synchronizing access to the shared pool.

      The usysv transport should be superior to the sysv transport on multiprocessors. On uniprocessors, which is better depends on the OS and the means used for processor yielding. On a Linux uniprocessor, for example, using semaphores (sysv transport) appears to be vastly superior to spin-locking.

      Usysv transport spin-locks

      The usysv transport uses spin locks with back-off. When a process backs off, it attempts to yield the processor. If the configure script found a system provided yield function such as yield() or sched_yield(), this is used. If no such function is found, then select() on NULL file descriptor sets with a timeout of 10us is used.

      The use of select() to yield can be forced by the --with-select-yield option to the configure script.

      Sysv transport semaphores

      The sysv transport allocates a semaphore set (of size 6) for each process pair communicating via shared memory. On some systems, you may need to reconfigure the system to allow for more semaphore sets if running tasks with many processes communicating via shared memory.