AAASwitch_UZH1_Campus
UZH.1 roadmap: update 27.05.2010
Provide login node with SLES10 (ID)
qsub to Schroedinger
same UIDs/users as on Schroedinger
spool/scratch partition shared with all Schroedinger compute nodes
firewall req's: open ports from UZH (possibly the world?)
2135/tcp
2811/tcp
30000--35000/tcp
MEETING on Tuesday June 8th at 14:00 (ID: Christian; GC3: Mike + Riccardo)
prepare filesystem layout (ID + GC3)
sessiondir (spool directory for jobs; also temporary space for trasfering files to/from the client)
runtimedir (hosts application configuration / env. variables)
cachedir (caches repeated transfer of the same file from a client; useful e.g. for BioInf databases or HEP analysis data)
set up special queue (subset of Schroedinger nodes: 1 rack; 24 hours duration)
request hostcert (GC3)
- need to know final hostname: idesl4.uzh.ch
compile/package ARC sw on SLES10 (GC3)
Applications used for testing:
- GAMESS
User setup:
GC3 cluster uses Schroedinger users
Only Schroedinger users will be allowed into Schroedinger via ARC (at least initially), and they will be mapped to their UNIX account.
GC3 to act as single point of contact for enabling SLCS users
discuss with Luzian if GC3 can be delegated the power to enable SLCS
(eventually) enable Nagios sensors on idesl4.uzh.ch
Followup meeting 19.02.2010
- Participants: Christian (CB), vincent (VK), Alexander (AG), Riccardo (RM), Mike (MP), Sergio (SM)
Current status of UZH.1 project
Campus grid cannot reach stated goals.
What is realistic to do:
have the gc3 cluster run the small non-HPC jobs that would otherwise land on Schroedinger
allow seamless submission of job to either cluster: users should not care where the job lands
Heave the solution integrated with SMSCG
We could request an extension to switch: no problem during the calendar year
- We will try to complete it on time, ask extension only if needed
Preconditions on Schroedinger
plan for a solution that is independent of Schroedinger
i.e., don't touch the existing SCH setup
Approach to follow
Make sure SGE on idgc3 can submit to the SGE master on SCH
users on idgc3 need to come from the SCH LDAP
then users which are local to idgc3 will not be accepted by the SCH SGE
applications should sit in similar paths on GC3 and SCH
this is a constraint coming from the Grid middleware
We have to verify that this is indeed a constrain or if there could be a workaround
Lustre availability
Lustre will not be available on the GC3 cluster
so users will have to explicitly request Lustre in submission scripts
or better specify "allow grid" in the submission, in case they don't need SCH
Scratch dir: where is it located?
application-specific: each application can have its own set of environment variables
like to have a generic $SCRATCH env. var. pointing to a generic scratch area
grid apps right now do not need much scratch -- local scratch will do
no local scratch on SCH, need to use Lustre
better name it $GRID_SCRATCH so policies for it can differ from the local jobs policies
Accounting
- Info on the database includes the "cell" (cluster) name, so no problem in sending all acct data to the same DB (assuming no SGE uses the default cluster name of "default")
Issue with architecture/system specific configuration
should users put "if uname == ..." everywhere in their scripts?
no, env. vars will be defined, use those: e.g., call "$APPS/games" instead of "/whatever/path/gamess"
have a predefined set of installed applications
one responsible person for each applications, writes an HOWTO that other sysadmins will use for installing the application on SCH and GC3
users can only use the installed applications -- no submission of arbitrary binaries.
Application deployment: provide two sets of the same binaries
one optimized for SCH
one safe and sound for Grid usage
what about the condor pool?
- postpone until next semester (summer pause)
SCH has two SGE masters (master1 & master2)
not yet connected to IP network (but planned)
but they only have to talk to the SGE master on GC3,
so can only open a hole in the firewall for that one only IP
Preliminary plan
Hooking SGE/SCH into SGE/GC3: MP + CB
users on LDAP on GC3: MP
create "Grid" realm on SGE (for external Grid users): ID
layout for the panasas FS: ID + SM
GAMESS install: MP
Shared filesystem between the two systems ( how to mount Panasas on "idgc3"? )
only frontend node on "idgc3" has public IP
mount Panasas on FE node only and stage from there?
need to think about it; several possible solutions:
put the two clusters in the same mgmt subnet (don't like because of security implications)
re-export via NFS and have nodes on "idgc3" use public IP
stage everything (sounds like the best one)
stage should be enough for the GAMESS use cases
UNIX groups are written in stone in LDAP
each user has one and only one UNIX group, which is its home institution
use "projects" in SGE to allow submission
problem is with the Grid MW: need a shared FS, usually group-writable; if users don't belong all to the same group, need to make it world-writable.
maybe make a local group to the machine
UPDATE 2010-04-08: Since we are creating gridXXX generic grid user accounts, we shall have a grid group, and every UZH user that uses ARC job submission will be a member of this group. (This allows us to keep the ARC sessiondir directory with 770 permissions.)
Technical Solutions
SGE File Staging
Apparently SGE does not support file staging as in PBS where a specific directive is provided ( "-w" ). According to the documentation found at: http://gridengine.sunsource.net/project/gridengine/howto/filestaging/, there are two possible scenario:
DRMAA - but in tis case, the application to be submitted needs to be DRMAA aware (not really feasible for out needs)
Trick SGE with -v (set environmental variables) and prologue/epilogue scripts
For the moment we choose the latter variant and we tested on idgc3grid01 cluster.
The prologue/epilogue scripts will be executed on the compute node (need to make sure the path will be identical on both systems)
Scripts will be executed on behalf of the user (using user's id)
Prologue and Epilogue will be activated when SGE_IN and/or SGE_OUT variables will be set in the job submittion
UZH_SGE_IN UZH_SGE_OUT syntax
The UZH_SGE_IN/UZH_SGE_OUT syntax mimicks TORQUE syntax for file stage-in/out, see http://www.clusterresources.com/torquedocs21/6.3filestaging.shtml
Each of the UZH_SGE_IN and UZH_SGE_OUT environment variables is a comma- separated list of copy-specs. Each copy-spec is a pair of exec-node-path and frontend-node-path, separated by the character @. The exec-node- path is an absolute pathname (e.g., "/tmp/session/123456"). The frontend- node-path is composed of a hostname, a colon character :, and an absolute pathname (e.g., "idgc3grid01:/export/grid/session/1233456").
The UZH_SGE_IN and UZH_SGE_OUT are translated into a series of "rsync" invocations: each copy-spec of the form X@Y triggers the execution of "rsync -a Y X" (on stage-in) or "rsync -a X Y" (on stage-out). In particular, "rsync" conventions on recursive directory copies apply.
All pathnames must be absolute; the effect of passing a relative pathname in the exec-node-path or frontend-node-path is unspecified.
How to install the prologue/epilogue scripts
# qconf -sconf | grep -e prolog -e epilog
prolog /share/apps/bin/file_trans_prolog.sh
epilog /share/apps/bin/file_trans_epilog.sh
Sample prologue/epilogue scripts
cat /share/apps/bin/file_trans_prolog.sh
#!/usr/bin/env perl
# prologue script to stage files in to the exec node file system
# do nothing if UZH_SGE_IN is not defined or empty
exit 0 if not $ENV{'UZH_SGE_IN'};
foreach my $copyspec (split(/,/, $ENV{'UZH_SGE_IN'})) {
my ($local, $remote) = split(/@/, $copyspec);
`rsync -a $remote $local`;
}
cat /share/apps/bin/file_trans_epilog.sh
#!/usr/bin/env perl
# prologue script to stage files in to the exec node file system
# do nothing if UZH_SGE_OUT is not defined or empty
exit 0 if not $ENV{'UZH_SGE_OUT'};
foreach my $copyspec (split(/,/, $ENV{'UZH_SGE_OUT'})) {
my ($local, $remote) = split(/@/, $copyspec);
`rsync -a $local $remote`;
}
Architecture
[UZH.1 campus architecture 20100219.eps][3]
[3]: UZH.1_campus_architecture_20100219.eps (UZH.1 campus architecture 20100219.eps)