User Tools

Site Tools


wiki:secluster

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

wiki:secluster [Monday, 29 September 2008 : 10:23:53] (current)
Line 1: Line 1:
 +====== SE Cluster ======
  
 +This is a short tutorial on how to use the [[http://​secluster-02.se.wtb.tue.nl/​ganglia/​|secluster-02]]. The Cluster consists of
 +
 +  * 1 head node (secluster-02) - this is where you log in
 +  * 20 nodes (compute-0-0 to compute-0-19)
 +  * 40x 2800MHz Dualcore CPUs (2 CPUs per machine, so 80 cores in total)
 +  * 4 GB of RAM memory per node
 +  * 1 shared file system
 +
 +The cluster works with queue’s (just like we like to simulate so much:)). The jobs in the queue will be distributed as evenly as possible if the parameters of the jobs allow it. This example uses 1 CPU core per job, which implies that 80 jobs can run simultaneously. ​
 +
 +Each simulation has to be submitted into the queue separately. The easiest way to start these simulations is to use Python to submit these jobs. Detailed documentation about the commands you should use (''​qsub'',​ ''​qdel'',​ etc.) can found in the {{:​sesystems:​rocks-usersguide-4.3.pdf|Rocks 4.3 user guide}}, section 2.2. 
 +
 +Warning: Each computation node has 2 Dualcore cpu's. This means four jobs can run on one machine simultaneously. Since the stochasticity in chi depends on the moment of starting the simulation, two or more jobs can produce exactly the same result! This is because stochastic functions in chi produce identical results if two simulations are started at the same instance (they have the same seed). To solve this problem a short delay has to be built in.
 +
 +An alternative solution would be to give each simulation a uniq start seed using the (I think) ''​-s''​ option. ​
 +I don't know how to use this option, ping localhost worked fine too [EvdRijt].
 +
 +===== Python script =====
 +
 +.... (more to come soon) ...
 +
 +In this {{:​wiki:​clus.py|example}} a simulation called chisim is started with parameters x and y. Herein runs x from 1 to 10 with steps of 1 and y from 10 to 20 with steps of 2.  Since only a linux script textfile can be used to submit a simulation into the queue, a job script file must be generated for each simulation in the directory called ''​jobs''​. When the jobs scripts and the simulation output directories are made the jobs are submitted into the queue.
 +
 +<code python>
 +# usage: clus.py [# sims]
 +#
 +#   ​Starts simulating [# sims] identical jobs
 +#   on cluster with parameters X and Y
 +#
 +#   ​Warning:​
 +#   Use with care!
 +#
 +#   ​Created: ​ Oct   11, 2006  V1 Emiel van de Rijt
 +#   ​Modified:​
 +#             ​Oct ​  12, 2006  V2 Comments added
 +
 +import time, os, sys
 +
 +# The process which submits the job to the queue
 +def smJob(intI, sX, sY):
 +    # The name of the job
 +    myJobName = "​job."​ + sX + "​-"​ + sY
 +    # the command to submit the job to the queue
 +    myCmd = "qsub -N " + myJobName + "​."​ + str(intI) + \
 +     "​ -e jobs/" + myJobName + "​.err."​ + str(intI) + \
 +     "​ -o jobs/" + myJobName + "​.out."​ + str(intI) + \
 +     "​ -q workq" \
 +     "​ -l nodes=1:​ppn=1,​walltime=24:​00:​00"​ \
 +     "​ jobs/" + myJobName + "​."​ + str(intI)
 +    # -N: ???submit naar nodes
 +    # -e: write error to file
 +    # -o: write screen output to file
 +    # -q: queue name to submit to
 +    # -l: see manual orca
 +    # the actual job script file to execute
 +    # send the command
 +    os.system(myCmd)
 +
 +# ====================================================================================================
 +
 +# Process to make the job scripts
 +def mkJobs(intStart,​ intStop, sX, sY, outputdir):
 +  # read the path from the global variable
 +  global myPath
 +  # filename for the script file in the jobs folder
 +  myFOut = "​job"​
 +  # construct a name for the job
 +  myJobName = sX + "​-"​ + sY
 +  for simnr in range(intStart,​ intStop):
 +    # create/open file for writing
 +    fOut = open(myPath + "​jobs/"​ + myFOut + '​.'​ + myJobName + '​.'​ + str(simnr), '​w'​)
 +    # write the job script
 +    myCmd = "​./​chisim -e 1e3 " + \
 +      sX + " " + \
 +      sY + " " + \
 +      outputdir + "/​out."​ + sX + "​-"​ + sY + "​."​ + str(simnr) + "​.txt"​
 +    # simulation to run with end time
 +    # param 1 for the chi file
 +    # param 2 for the chi file
 +    # param 3 for the chi file
 +
 +    # go to the work directory
 +    # make sure the jobs starts at a random time
 +    # print on screen the cmd to run
 +    # run the command
 +    myTxt = "cd " + myPath + "​\n"​ + \
 +      "ping -c 2 127.0.0.1\n"​ + \
 +      "echo ran: " + myCmd + "​\n"​ + \
 +      myCmd + "​\n"​
 +    fOut.write(myTxt)
 +    fOut.close()
 +
 +# ====================================================================================================
 +
 +# get current path
 +myPath = os.getcwd() + "/"​
 +#print myPath
 +
 +# run x from 1 to 10 with steps of 1
 +xb = 1    # begin
 +xe = 10   # end
 +xs = 1    # step size
 +
 +# run y from 10 to 20 with steps of 2
 +yb = 10   # begin
 +ye = 20   # end
 +ys = 2    # step size
 +
 +# Read number of identical simulations
 +# to do from the prompt
 +intStart = 1
 +# take start + the first argument of the prompt
 +intStop = intStart + int(sys.argv[1])
 +
 +# measure process time and wall time
 +# set start time
 +t0_process_time = time.clock()
 +t0_wall_time = time.time()
 +
 +# make directory for job script files
 +# in the current directory
 +os.system("​mkdir jobs")
 +
 +# do for all x
 +for x in range(xb, xe+1, xs):
 +
 +  # do for all y
 +  for y in range(yb, ye+1, ys):
 +
 +    # convert to string
 +    sX = str(x)
 +    sY = str(y)
 +
 +    # make output dir string
 +    outputdir = str("​sim-"​ + sX + "​-"​ + sY)
 +
 +    # make the output directory
 +    # for a simulation output
 +    os.system("​mkdir " + outputdir)
 +
 +    # make the linux script files
 +    mkJobs(intStart,​ intStop, ​ sX, sY, outputdir)
 +
 +    # start submitting the jobs to the queue
 +    for simnr in range(intStart,​ intStop):
 +      smJob(simnr,​ sX, sY)
 +
 +
 +# measure process time and wall time
 +# calculate time
 +print time.clock() - t0_process_time,​ "​seconds process time"
 +print time.time() - t0_wall_time,​ "​seconds wall time"
 +</​code>​
wiki/secluster.txt · Last modified: Monday, 29 September 2008 : 10:23:53 (external edit)