Department of Computing Science Umeå University

Globus Tools Tutorial

These exercises give hands-on experience of Globus tools for accessing the Grid. These exercises are non-mandatory, but strongly suggested. Knowledge of the Globus tools will be beneficial when solving the second assignment.

This tutorial requires that you have a valid certificate, more information about obtaining a certificate can be found here.

If you experience problems using a command, read the help (invoke the command with the -help flag), search the web (see the below links), or ask the teaching assistant.

Environment configuration

Initially, log on to svampbob-1.cs.umu.se
Before you start working with the exercises, the required environment variables must be set up as follows:

For sh and bash:
. /opt/globus/4.0.1/etc/globus-user-env.sh
For csh and tcsh:
source /opt/globus/4.0.1/etc/globus-user-env.csh

This enables your shell to locate the binaries used in the tutorial.

Proxy creation

Proxy creation is a way of to perform single sign-on of the user onto the Grid. Once the proxy is created, it can used for authentication, allowing the user to interact with multiple resources without having to perform manual authentication (retyping the password) for each interaction. A proxy can also be delegated to remote processes, allowing these processes to perform tasks on behalf of the user.

A proxy is created with the grid-proxy-init command. For security reasons, the proxy has a limited lifetime, grid-proxy-info reports information about the proxy, including the remaining lifetime. A proxy that is no longer used can be destroyed with the grid-proxy-destroy tool. Passing the -debug flag to grid-proxy-init gives more insight in the proxy generation process.

Job submission

The command line client for submitting jobs to Globus WS-GRAM is called globusrun-ws and can be used as follows:

globusrun-ws -submit -c /bin/touch touched_it

This command runs the program /bin/touch with touched_it as argument on the local resource.

To submit the job to another machine, the -F flag must be specified:

globusrun-ws -F 'https://svampbob-1.cs.umu.se:8443/wsrf/services/ManagedJobFactoryService' -submit -c /bin/touch touched_it

If you want to submit a more complex job, this can be specified using the Resource Specification Language, RSL. One example of an RSL job description is:

<job> 						
    <executable>/bin/echo</executable>		
    <argument>Hello world</argument>	
    <stdout>my_echo.stdout</stdout>
    <stderr>my_echo.stderr</stderr>
</job> 

To submit a job specified as an RSL file, run:
globusrun-ws -submit -f hello_world.xml

Most typically, a job requires one or more input files, e.g., the program to run. These files can be transferred to the resource using the RSL attribute fileStageIn as examplified below.

  
<fileStageIn>
    <transfer>
        <sourceUrl>gsiftp://svampbob-1:2811/bin/echo</sourceUrl>
        <destinationUrl>file:///pub/anarchy/my_echo</destinationUrl>
    </transfer>
</fileStageIn>
 

When submitting a job that performs file staging, the -S flag must be passed to globusrun-ws in order to delegate the users credentials to the ManagedJobFactoryService, so this service can stage the input files on behalf of the user. If output files from the job should be staged somewhere, these can be specified using the fileStageOut RSL attribute.

Normally, the globusrun-ws command does not terminate until the job has finished executing. For long running jobs, it may be convenient to submit the job and return later to monitor its progress. The following command submits a job, and stores an identifier (endpoint reference) to the job in the file job.epr.
globusrun-ws -submit -f hello_world.xml -o job.epr

This identifier can later be used to retrieve the status of the job:
globusrun-ws -status -job-epr-file job.epr

The identifier can also be used to kill the job:
globusrun-ws -kill -job-epr-file job.epr

Finally, WS-GRAM may use different mechanisms to execute the job. These are specified by the factory-type:

globusrun-ws -submit -Ft Fork ... and
globusrun-ws -submit -Ft PBS ...
are the two options available on our system.

Exercises
  1. How can you submit a job to run a program that is not available on the Grid resource?
  2. What differs between the PBS and Fork factory types? (Hint: submit a job running /bin/hostname)
  3. What GRAM states are possible for a Grid job?
  4. Submit a Grid job that prints 'Hello World'. Store the job output in the file hello_grid.txt in your home directory.

Data management

The globus-url-copy command copies a file from one URL to another, some of the supported URL formats are:
file://path - file stored in the local file system.
gsiftp://host[:port]/path - file stored on a GridFTP server.

The following example shows how the local file /tmp/myfile is copied to /tmp/myfile.copy on the Grid resource svampbob-1.cs.umu.se.

globus-url-copy file:///tmp/myfile \
gsiftp://svampbob-1.cs.umu.se:2811/tmp/myfile.copy

Note that the GridFTP server uses port 2811 instead of the usual FTP ports.

GridFTP extends FTP with performance improvements. Many GridFTP clients support third-party transfers (also known as proxy FTP). Third party transfers allows a client to transfer a file between two remote servers, without forwarding the file through the client host:

[tordsson]svampbob-1:~> globus-url-copy \
gsiftp://svampbob-2.cs.umu.se:2811/Home/staff/tordsson/my_file.txt \
gsiftp://svampbob-3.cs.umu.se:2811/tmp/another_copy.txt

Exercise
  1. Transfer the file gsiftp://svampbob-1.cs.umu.se:2811/scratch/testfile.txt to your home directory.

Information retrieval

Resources in a Globus-based Grid advertise information regarding their configuration and state using the Monitoring and Discovery System, MDS. An IndexService gathers information from various sources (other web services or from executing programs), and makes this information available as resources properties. Each service container runs at least on IndexService, called the DefaultIndexService. The below command retrieves all information stored in the DefaultIndexService of the localhost.

wsrf-query -s 'https://localhost:8443/wsrf/services/DefaultIndexService' -a -z none

The -a flag specifies anonymous authentication and -z none specifies that no authorization of the service should be performed.

In most cases, it is not that convenient to retrieve all available information. For this reason, a subset of the information can be selected using XPath queries. As an example, the following query returns only the attribute(s) named 'OperatingSystem'

wsrf-query -s 'https://localhost:8443/wsrf/services/DefaultIndexService' -a -z none "//*[local-name()='OperatingSystem']"

For more information about XPath, see W3schools XPath tutorial.
See also the Java WS Core Command Line Interface for more details about the wsrf-query command.

Exercises
  1. What local services (running in the same container as the DefaultIndexService) are registered with the DefaultIndexService?
  2. What non-local services are registered with the DefaultIndexService?
  3. For the registered computational resources, how many jobs are queued at each resource?
  4. How many CPUs does each registered computational resource have?

Useful links:

Java WS Core Command Line Interface
RSL Specification

http://www.cs.umu.se/kurser/TDBD20/VT07/lab/exercises.html
Ansvarig för sidan: Erik Elmroth, P-O Östberg, Johan Tordsson
Senast ändrad 2007-01-29