Globus Tools Tutorial
These exercises give hands-on experience of
Globus tools for accessing the Grid.
These exercises are non-mandatory,
but strongly suggested.
Knowledge of the Globus tools will be beneficial when solving the second assignment.
This tutorial requires that you have a valid certificate,
more information about obtaining a certificate can be
found here.
If you experience problems using a command, read the help
(invoke the command with the -help flag),
search the web (see the below links), or ask the teaching assistant.
Environment configuration
Initially, log on to svampbob-1.cs.umu.se
Before you start working with the exercises, the required environment
variables must be set up as follows:
For sh and bash:
. /opt/globus/4.0.1/etc/globus-user-env.sh
For csh and tcsh:
source /opt/globus/4.0.1/etc/globus-user-env.csh
This enables your shell to locate the binaries used in the tutorial.
Proxy creation
Proxy creation is a way of to perform
single sign-on of the user onto the Grid. Once the proxy is created,
it can used for authentication, allowing the user to interact with
multiple resources without having to perform manual authentication (retyping the password)
for each interaction. A proxy can also
be delegated to remote processes, allowing these processes to perform
tasks on behalf of the user.
A proxy is created with the grid-proxy-init command.
For security reasons, the proxy has a limited lifetime, grid-proxy-info
reports information about the proxy, including the remaining lifetime.
A proxy that is no longer used can be destroyed with the grid-proxy-destroy
tool. Passing the -debug flag to grid-proxy-init gives
more insight in the proxy generation process.
Job submission
The command line client for submitting jobs to Globus WS-GRAM is called globusrun-ws and
can be used as follows:
globusrun-ws -submit -c /bin/touch touched_it
This command runs the program /bin/touch with touched_it as argument
on the local resource.
To submit the job to another machine, the -F flag must be specified:
globusrun-ws -F 'https://svampbob-1.cs.umu.se:8443/wsrf/services/ManagedJobFactoryService' -submit -c /bin/touch touched_it
If you want to submit a more complex job, this can be specified using the Resource Specification Language, RSL.
One example of an RSL job description is:
<job>
<executable>/bin/echo</executable>
<argument>Hello world</argument>
<stdout>my_echo.stdout</stdout>
<stderr>my_echo.stderr</stderr>
</job>
To submit a job specified as an RSL file, run:
globusrun-ws -submit -f hello_world.xml
Most typically, a job requires one or more input files, e.g., the program to run. These files can be transferred to the resource
using the RSL attribute fileStageIn as examplified below.
<fileStageIn>
<transfer>
<sourceUrl>gsiftp://svampbob-1:2811/bin/echo</sourceUrl>
<destinationUrl>file:///pub/anarchy/my_echo</destinationUrl>
</transfer>
</fileStageIn>
When submitting a job that performs file staging, the -S flag must be passed to
globusrun-ws in order to delegate the users credentials to the ManagedJobFactoryService, so this service can
stage the input files on behalf of the user. If output files from the job should be staged somewhere,
these can be specified using the fileStageOut RSL attribute.
Normally, the globusrun-ws command does not terminate until the job has finished executing.
For long running jobs, it may be convenient to submit the job and return later to monitor its progress.
The following command submits a job, and stores an identifier (endpoint reference) to the job in the
file job.epr.
globusrun-ws -submit -f hello_world.xml -o job.epr
This identifier can later be used to retrieve the status of the job:
globusrun-ws -status -job-epr-file job.epr
The identifier can also be used to kill the job:
globusrun-ws -kill -job-epr-file job.epr
Finally, WS-GRAM may use different mechanisms to execute the job. These are specified by the factory-type:
globusrun-ws -submit -Ft Fork ... and
globusrun-ws -submit -Ft PBS ...
are the two options available on our system.
Exercises
- How can you submit a job to run a program that is not available on the Grid resource?
- What differs between the PBS and Fork factory types? (Hint: submit a job running /bin/hostname)
- What GRAM states are possible for a Grid job?
- Submit a Grid job that prints 'Hello World'. Store the job output in
the file hello_grid.txt in your home directory.
Data management
The globus-url-copy command copies a file from one URL to another,
some of the supported URL formats are:
file://path - file stored in the local file system.
gsiftp://host[:port]/path - file stored on a GridFTP server.
The following example shows how the local file /tmp/myfile
is copied to /tmp/myfile.copy on the Grid resource svampbob-1.cs.umu.se.
globus-url-copy file:///tmp/myfile \
gsiftp://svampbob-1.cs.umu.se:2811/tmp/myfile.copy
Note that the GridFTP server uses port 2811 instead of the usual FTP ports.
GridFTP extends FTP with performance improvements.
Many GridFTP clients support third-party transfers (also known as proxy FTP).
Third party transfers allows a client to transfer a file
between two remote servers, without forwarding the file through the client host:
[tordsson]svampbob-1:~> globus-url-copy \
gsiftp://svampbob-2.cs.umu.se:2811/Home/staff/tordsson/my_file.txt \
gsiftp://svampbob-3.cs.umu.se:2811/tmp/another_copy.txt
Exercise
- Transfer the file gsiftp://svampbob-1.cs.umu.se:2811/scratch/testfile.txt to
your home directory.
Information retrieval
Resources in a Globus-based Grid advertise information regarding their
configuration and state using the Monitoring and Discovery System, MDS.
An IndexService gathers information from various sources (other web services
or from executing programs), and makes this information available as resources properties.
Each service container runs at least on IndexService, called the DefaultIndexService.
The below command retrieves all information stored in the DefaultIndexService of the localhost.
wsrf-query -s 'https://localhost:8443/wsrf/services/DefaultIndexService' -a -z none
The -a flag specifies anonymous authentication and -z none
specifies that no authorization of the service should be performed.
In most cases, it is not that convenient to retrieve all available information.
For this reason, a subset of the information can be selected using XPath queries.
As an example, the following query returns only the attribute(s) named 'OperatingSystem'
wsrf-query -s 'https://localhost:8443/wsrf/services/DefaultIndexService' -a -z none "//*[local-name()='OperatingSystem']"
For more information about XPath, see W3schools XPath tutorial.
See also the Java WS Core Command Line Interface
for more details about the wsrf-query command.
Exercises
- What local services (running in the same container as the DefaultIndexService) are registered with the DefaultIndexService?
- What non-local services are registered with the DefaultIndexService?
- For the registered computational resources, how many jobs are queued at each resource?
- How many CPUs does each registered computational resource have?
Useful links:
Java WS Core Command Line Interface
RSL Specification
|