ColoFarm
(Colorado Farm computing system)
Farm Structure
ColoFarm is a system of Tcl/Tk, Perl and C-shell scripts intended for
mass processing of Focus data. High throughput is accomplished
by storing the input files on a large files pool and sending
them for processing to all the available machines.
Input files can be retrieved either from the
Fermilab Mass Storage System or from one of the exabytes
tape drives,
and are stored on the "FARMDISK" pool.
While a file is being copied from the input source, it is temporarily
stored in the PREIMPORT directory, then when the copy is over
it is moved to the IMPORT directory.
The IMPORT directory is cyclically monitored for the presence of input
files: when a file is found, it is moved to the PREBATCH
directory on the same disk, before being sent to a specific computing node
for processing. Output files are temporarily stored in the
BATCH area on that machine, and are copied to the OUTPUT
area on the analysis disk ( "ANALDISK" ) at job
completion. For each submitted job, the C-shell submission script and the
batch system standard error are stored in the INPUT directory.
The input file is deleted when processing is done.
Instructions
- Create a launching directory for your application
and copy there the following files:
- your executable and namelist
- ~e831/farm/colofarm
- Edit the file colofarm and choose the
following enviromentals:
- FARM :
Farm launching directory (containing
log, data and kill files for particular
Farm application ).
- FARMDISK :
disk area where imported files will be stored
before being sent to analysis process
- ANALDISK :
disk area containing analysis files
(batch scripts, histograms, log files, binary
files).
- Type colofarm to start the Tcl/Tk main window
Notes
- If not already existing, the directories FARMDISK and
ANALDISK will be created (with their subdirectories)
- If you are importing from FMSS, and you are running the program for the first
time in the current FARM area, make sure to create a new
FMSS files list
- If necessary, create a file called e831_mysetenv.csh in the FARM
directory to have it sourced at the beginning of each analysis job
(for example, if you need personal enviromental definitions)
- For each job, a copy of the executable is copied from the FARM
directory to the directory where the program is run, and it is renamed to
(executable name)-(project name)
- The button DB in the Analysis frame allows to view the current
analysis Data Base, and to insert and remove particular entries.
If you chose to remove an entry from the data Base, all the corresponding
files will be deleted from the OUTPUT directory.
- When spooling from tape to disk, it is suggested to use the disk on
the corresponding machine. Each spooling job will open a separate
TCL/TK window with controlling buttons. It is possible to have
several processes spooling simultaneously on the same disk.
- Each of the three processes (FMSS import, TAPE import, and ANALYSIS)
can be stopped individually through the corresponding button.
Stopping each process may take some time (up to 5-10 minutes),
depending on the settings for the in-between-iterations sleeping time
set for that particular process
- After the first run in a specified directory, the analysis settings
are saved in the file .colofarm_anal_defaults and loaded automatically
at the beginning of the next colofarm call
- Do not destroy the Xterm window which started the ColoFarm application!.
This will cause the colofarm processes to enter a "runaway" status, and if
you quit Colofarm (without killing the jobs) and then restart it,
they new session will not be able to connect to the previous one.
Development version
- Batch Queues: you can either specify a machine que (i.e.,
"antero_fast") or a whole group (i.e., "fast"). If you specify a whole
group and all the machines are enabled, the batch submitting command
issued will be:
qsub -G batch_group 1 your_script
while if one or more machines are not enabled, then a specific
que in that batch group will be chosen, and the submitting
command will be:
qsub -q batch_que your_script
- Job Delete Option : to delete a job from the batch system:
- Pop up the queue status window with the "Queues" button in the
analysis frame
- Double-click on the entry you want to delete
- Press the "Kill Job" button
The job will be deleted from the batch system (it may take a few minutes);
if the corresponding input file is still in the PREBATCH area,
it will be moved back to the import area for later submission
- Fmss files list can be specified by the user (but it needs to be
located in the FARM directory). The default name
is fmss_files.dat . Template fmss lists can be found in the
area ~e831/fmss (and need to be copied to the FARM directory).
- Run limits can be specified by the user when importing fmss files.
This option relies on the fact that the fmss files name always start
with the run number.
- PARASITE running allows import from another IMPORT directory
where some other user is importing fmss files. This option is intended
to allow two or more users at the Colorado site to use the same pool
of imported fmss files. To run parasitically:
- The primary user (the one importing files) needs to enable
parasite running from their ColoFarm window. NOTE:
allowing parasite running may significantly slow down
the user file transfer from the IMPORT --> PREBATCH directory,
and their job submission frequency
- The parasite user needs to specify the FARMDISK directory
of the primary user (without the "/import" extension),
enable parasite running, and start importing
Software Diagram
Author: Luca Cinquini (
cinquini@pizero.colorado.edu)
Last update: 18 July 1997