Actions

Difference between revisions of "Linux Tutorial"

From ALICE Documentation

(Step 1 - Organize your directories)
 
(34 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
[[Category:User Guides]]
  
== Description ==
+
<font color="orange">Work in progress</font>
This tutorial guides you through the process of creating and submitting a batch script on one of our compute clusters. This is a Linux tutorial which uses batch scripting as an example, not a tutorial on writing batch scripts. The primary goal is not to teach you about batch scripting, but for you to become familiar with certain Linux commands. There are other pages on the ALICE wiki that go into the details of submitting a job with a batch script.
 
  
== Prerequisites ==
+
==Description==
 +
This tutorial guides you through some of the basics of using Linux through the command line. The goal is for you to become familiar with certain Linux commands. There are other pages on the ALICE wiki that go into the details of submitting a job with a batch script.
 +
 
 +
==Prerequisites==
 
* Familiarity with a text editor (Emacs, nano, vim)
 
* Familiarity with a text editor (Emacs, nano, vim)
 
* Basic understanding of the UNIX command line
 
* Basic understanding of the UNIX command line
* Linux Command Line Fundamentals tutorial
 
  
== Goals ==
+
==Learning==
 +
* [[Linux_Command-Line_Fundamentals|Linux Command Line Fundamentals tutorial]]
 +
* An excellent start for novice Linux users is the [https://swcarpentry.github.io/shell-novice/ Software Carpentry Course in Lunix]
 +
 
 +
==Goals==
 
* Create subdirectory's to organize information
 
* Create subdirectory's to organize information
 
* Create a batch script with a text editor
 
* Create a batch script with a text editor
* Submit a job
+
* Change the permissions of files
* Check on the progress of the job
 
* Change the permissions of the output files
 
 
* Get familiar with some common UNIX commands
 
* Get familiar with some common UNIX commands
  
== Step 1 - Organize your directories ==
+
==Step 1 - Organize your directories==
When you first log in to our clusters, you are in your home directory. For the purposes of this illustration, we will pretend you are user alice0001 and your project code is PRJ0001, but when you try out commands you must use your own username and project code.
+
When you first log in to our clusters, you are in your home directory. For the purposes of this illustration, we will pretend you are user alice0001
 
  $ pwd
 
  $ pwd
  /users/PRJ0001/alice0001
+
  /home/alice0001
 
 
Note: you will see your user name and a different number after the /users.
 
  
 
It's a good idea to organize your work into separate directories. If you have used Windows or the Mac operating system, you may think of these as folders. Each folder may contain files and sub folders. The sub folders may contain other files and sub folders of their own. In Linux we use the term "directory" instead of "folder." Use directories to organize your work.
 
It's a good idea to organize your work into separate directories. If you have used Windows or the Mac operating system, you may think of these as folders. Each folder may contain files and sub folders. The sub folders may contain other files and sub folders of their own. In Linux we use the term "directory" instead of "folder." Use directories to organize your work.
Line 36: Line 38:
 
The "<code>touch</code>" command just creates an empty file with the name you give it.
 
The "<code>touch</code>" command just creates an empty file with the name you give it.
  
You probably already know that the <kbd>ls</kbd> command shows the contents of the current working directory; that is, the directory you see when you type pwd. But what is the point of the <kbd>"</kbd><code>-l</code><kbd>", "</kbd><code>-lt</code><kbd>" or "</kbd><code>-ltr</code><kbd>"</kbd>? You noticed the difference in the output between just the <kbd>"</kbd><code>ls</code><kbd>"</kbd> command and the <kbd>"</kbd><code>ls -l</code><kbd>"</kbd> command.
+
You probably already know that the <code>ls</code> command shows the contents of the current working directory; that is, the directory you see when you type pwd. But what is the point of the <code>"-l"</code>, <code>"-lt"</code> or <code>"-ltr"</code>? You noticed the difference in the output between just the <code>"ls"</code> command and the <code>"ls -l"</code> command.
  
Most UNIX commands have options you can specify that change the way the command works. The options can be specified by the "<code>-</code>" (minus sign) followed by a single letter. <kbd>"</kbd><code>ls -ltr</code><kbd>"</kbd> is actually specifying three options to the <code>ls</code> command.
+
Most UNIX commands have options you can specify that change the way the command works. The options can be specified by the <code>"-"</code> (minus sign) followed by a single letter. <code>"ls -ltr"</code> is actually specifying three options to the <code>ls</code> command.
  
 
<code>l</code>: I want to see the output in long format -- one file per line with some interesting information about each file
 
<code>l</code>: I want to see the output in long format -- one file per line with some interesting information about each file
Line 49: Line 51:
  
 
Now try this:
 
Now try this:
  $ mkdir BatchTutorial
+
  $ mkdir Tutorial
 
  $ ls -ltr
 
  $ ls -ltr
  
Line 55: Line 57:
  
 
  $ pwd
 
  $ pwd
  /users/PRJ0001/alice0001
+
  /home/alice0001
  
 
Now try this:
 
Now try this:
  $ cd BatchTutorial
+
  $ cd Tutorial
 
  $ pwd
 
  $ pwd
  
Line 66: Line 68:
 
Where are you now?
 
Where are you now?
  
== Step 2 -- Get familiar with some more UNIX commands ==
+
==Step 2 -- Get familiar with some more UNIX commands==
 
Try the following:
 
Try the following:
 
  $ echo where am I?
 
  $ echo where am I?
Line 111: Line 113:
 
  $ cat FooDir
 
  $ cat FooDir
 
  $ ls -ltr
 
  $ ls -ltr
CalDir is a directory, but FooDir is a regular file. You can tell this by the "d" that shows up in the string of letters when you do the "<code>ls -ltr</code>". That's what happens when you try to cp or mv a file to a directory that doesn't exist -- a file gets created with the target name. You can imagine a scenario in which you run a program and want to copy the resulting files to a directory called Output but you forget to create the directory first -- this is a fairly common mistake.
+
CalDir is a directory, but FooDir is a regular file. You can tell this by the "d" that shows up in the string of letters when you do the "<code>ls -ltr</code>". That's what happens when you try to <code>cp</code> or <code>mv</code> a file to a directory that doesn't exist -- a file gets created with the target name. You can imagine a scenario in which you run a program and want to copy the resulting files to a directory called Output but you forget to create the directory first -- this is a fairly common mistake.
  
== Step 3 -- Environment Variables ==
+
Some more information on how to tweak your bashrc: [[.bashrc]]
 +
 
 +
==Step 3 -- Environment Variables==
 
Before we move on to creating a batch script, you need to know more about environment variables. An environment variable is a word that stands for some other text. We have already seen an example of this with the variable HOME. Try this:
 
Before we move on to creating a batch script, you need to know more about environment variables. An environment variable is a word that stands for some other text. We have already seen an example of this with the variable HOME. Try this:
 
  $ MY_ENV_VAR="something I would rather not type over and over"
 
  $ MY_ENV_VAR="something I would rather not type over and over"
Line 124: Line 128:
  
 
Now you are ready to use some of this UNIX knowledge to create and run a script.
 
Now you are ready to use some of this UNIX knowledge to create and run a script.
 
=== Step 4 -- Create and run a script ===
 
Before we create a batch script and submit it to a compute node, we will do something a bit simpler. We will create a regular script file that will be run on the login node. A script is just a file that consists of UNIX commands that will run when you execute the script file. It is a way of gathering together a bunch of commands that you want to execute all at once. You can do some very powerful things with scripting to automate tasks that are tedious to do by hand, but we are just going to create a script that contains a few commands we could easily type in. This is to help you understand what is happening when you submit a batch script to run on a compute node.
 
 
Use a text editor to create a file named "tutorial.sh" which contains the following text (note that with emacs or nano you can use the mouse to select text and then paste it into the editor with the middle mouse button):
 
$ nano tutorial.sh
 
echo ----
 
echo Job started at `date`
 
echo ----
 
echo This job is working on node `hostname`
 
 
SH_WORKDIR=`pwd`
 
echo working directory is $SH_WORKDIR
 
echo ----
 
echo The contents of $SH_WORKDIR
 
ls -ltr
 
echo
 
echo ----
 
echo
 
echo creating a file in SH_WORKDIR
 
whoami > whoami-sh-workdir
 
 
SH_TMPDIR=${SH_WORKDIR}/sh-temp
 
mkdir $SH_TMPDIR
 
cd $SH_TMPDIR
 
echo ----
 
echo TMPDIR IS `pwd`
 
echo ----
 
echo wait for 12 seconds
 
sleep 12
 
echo ----
 
echo creating a file in SH_TMPDIR
 
whoami > whoami-sh-tmpdir
 
 
# copy the file back to the output subdirectory
 
cp ${SH_TMPDIR}/whoami-sh-tmpdir ${SH_WORKDIR}/output
 
 
cd $SH_WORKDIR
 
 
echo ----
 
echo Job ended at `date`
 
To run it:
 
$ chmod u+x tutorial.sh
 
$ ./tutorial.sh
 
Look at the output created on the screen and the changes in your directory to see what the script did.
 
 
== Step 5 -- Create and run a batch job ==
 
Use your favorite text editor to create a file called tutorial.pbs in the BatchTutorial directory which has the following contents (remember, you can use the mouse to cut and paste text):
 
#PBS -l walltime=00:02:00
 
#PBS -l nodes=1:ppn=1
 
#PBS -N foobar
 
#PBS -j oe
 
#PBS -r n
 
 
echo ----
 
echo Job started at `date`
 
echo ----
 
echo This job is working on compute node `cat $PBS_NODEFILE`
 
 
cd $PBS_O_WORKDIR
 
echo show what PBS_O_WORKDIR is
 
echo PBS_O_WORKDIR IS `pwd`
 
echo ----
 
echo The contents of PBS_O_WORKDIR:
 
ls -ltr
 
echo
 
echo ----
 
echo
 
echo creating a file in PBS_O_WORKDIR
 
whoami > whoami-pbs-o-workdir
 
 
cd $TMPDIR
 
echo ----
 
echo TMPDIR IS `pwd`
 
echo ----
 
echo wait for 42 seconds
 
sleep 42
 
echo ----
 
echo creating a file in TMPDIR
 
whoami > whoami-tmpdir
 
 
# copy the file back to the output subdirectory
 
pbsdcp -g $TMPDIR/whoami-tmpdir $PBS_O_WORKDIR/output
 
 
echo ----
 
echo Job ended at `date`
 
 
To submit the batch script, type
 
$ qsub tutorial.pbs
 
Use <code>qstat -u [username]</code> to check on the progress of your job. If you see something like this
 
$ qstat -u alice0001
 
 
                                                                              Req'd  Req'd  Elap
 
Job ID            Username    Queue    Jobname          SessID NDS  TSK    Memory Time  S Time
 
------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
 
458842.oak-batch  alice0001    serial  foobar              --      1      1    --  00:02 Q  --
 
 
this means the job is in the queue -- it hasn't started yet. That is what the "Q" under the S column means.
 
 
If you see something like this:
 
                                                                              Req'd  Req'd  Elap
 
Job ID            Username    Queue    Jobname          SessID NDS  TSK    Memory Time  S Time
 
------------------ ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
 
458842.oak-batch  alice0001    serial  foobar            26276    1      1    --  00:02 R  --
 
this means the job is running and has job id 458842.
 
 
When the output of the <code>qstat</code> command is empty, the job is done.
 
 
After it is done, there should be a file called "foobar.o458842" in the directory.
 
 
Note that your file will end with a different number -- namely the job id number assigned to your job.
 
 
Check this with
 
$ ls -ltr
 
$ cat foobar.oNNNNNN
 
Where (NNNNNN is your job id).
 
 
The name of this file is determined by two things:
 
# The name you give the job in the script file with the header line #PBS -N foobar
 
# The job id number assigned to the job.
 
The name of the script file (tutorial.pbs) has nothing to do with the name of the output file.
 
 
Examine the contents of the output file foobar.oNNNNNN carefully. You should be able to see the results of some of the commands you put in tutorial.pbs. It also shows you the values of the variables PBS_NODEFILE, PBS_O_WORKDIR and TMPDIR. These variables exist only while your job is running. Try
 
$ echo $PBS_O_WORKDIR
 
and you will see it is no longer defined. <code>$PBS_NODEFILE</code> is a file which contains a list of all the nodes your job is running on. Because this script has the line
 
#PBS -l nodes=1:ppn=1
 
the contents of <code>$PBS_NODEFILE</code> is the name of a single compute node.
 
 
Notice that <code>$TMPDIR</code> is /tmp/pbstmp.NNNNNN (again, NNNNNN is the id number for this job.) Try
 
$ ls /tmp/pbstmp.NNNNNN
 
Why doesn't this directory exist? Because it is a directory on the compute node, not on the login node. Each machine in the cluster has its own /tmp directory and they do not contain the same files and subdirectories. The /users directories are shared by all the nodes (login or compute) but each node has its own /tmp directory (as well as other unshared directories.)
 

Latest revision as of 14:44, 21 September 2020


Work in progress

Description

This tutorial guides you through some of the basics of using Linux through the command line. The goal is for you to become familiar with certain Linux commands. There are other pages on the ALICE wiki that go into the details of submitting a job with a batch script.

Prerequisites

  • Familiarity with a text editor (Emacs, nano, vim)
  • Basic understanding of the UNIX command line

Learning

Goals

  • Create subdirectory's to organize information
  • Create a batch script with a text editor
  • Change the permissions of files
  • Get familiar with some common UNIX commands

Step 1 - Organize your directories

When you first log in to our clusters, you are in your home directory. For the purposes of this illustration, we will pretend you are user alice0001

$ pwd
/home/alice0001

It's a good idea to organize your work into separate directories. If you have used Windows or the Mac operating system, you may think of these as folders. Each folder may contain files and sub folders. The sub folders may contain other files and sub folders of their own. In Linux we use the term "directory" instead of "folder." Use directories to organize your work.

Type the following four lines and take note of the output after each one:

$ touch foo1
$ touch foo2
$ ls
$ ls -l
$ ls -lt
$ ls -ltr

The "touch" command just creates an empty file with the name you give it.

You probably already know that the ls command shows the contents of the current working directory; that is, the directory you see when you type pwd. But what is the point of the "-l", "-lt" or "-ltr"? You noticed the difference in the output between just the "ls" command and the "ls -l" command.

Most UNIX commands have options you can specify that change the way the command works. The options can be specified by the "-" (minus sign) followed by a single letter. "ls -ltr" is actually specifying three options to the ls command.

l: I want to see the output in long format -- one file per line with some interesting information about each file

t: sort the display of files by when they were last modified, most-recently modified first

r: reverse the order of display (combined with -t this displays the most-recently modified file last -- it should be BatchTutorial in this case.)

I like using "ls -ltr" because I find it convenient to see the most recently modified file at the end of the list.

Now try this:

$ mkdir Tutorial
$ ls -ltr

The "mkdir" command makes a new directory with the name you give it. This is a sub folder of the current working directory. The current working directory is where your current focus is in the hierarchy of directories. The 'pwd' command shows you are in your home directory:

$ pwd
/home/alice0001

Now try this:

$ cd Tutorial
$ pwd

What is the output of 'pwd' now? "cd" is short for "change directory" -- think of it as moving you into a different place in the hierarchy of directories. Now do

$ cd ..
$ pwd

Where are you now?

Step 2 -- Get familiar with some more UNIX commands

Try the following:

$ echo where am I?
$ echo I am in `pwd`
$ echo my home directory is $HOME
$ echo HOME
$ echo this directory contains `ls -l`

These examples show what the echo command does and how to do some interesting things with it. The `pwd` means the result of issuing the command pwd. HOME is an example of an environment variable. These are strings that stand for other strings. HOME is defined when you log in to a UNIX system. $HOME means the string the variable HOME stands for. Notice that the result of "echo HOME" does not do the substitution. Also notice that the last example shows things don't always get formatted the way you would like.

Some more commands to try:

$ cal
$ cal > foo3
$ cat foo3
$ whoami
$ date

Using the ">" after a command puts the output of the command into a file with the name you specify. The "cat" command prints the contents of a file to the screen.

Two very important UNIX commands are the cp and mv commands. Assume you have a file called foo3 in your current directory created by the "cal > foo3" command. Suppose you want to make a copy of foo3 called foo4. You would do this with the following command:

$ cp foo3 foo4
$ ls -ltr

Now suppose you want to rename the file 'foo4' to 'foo5'. You do this with:

$ mv foo4 foo5
$ ls -ltr

'mv' is short for 'move' and it is used for renaming files. It can also be used to move a file to a different directory.

$ mkdir CalDir
$ mv foo5 CalDir
$ ls
$ ls CalDir

Notice that if you give a directory with the "ls" command is shows you what is in that directory rather than the current working directory.

Now try the following:

$ ls CalDir
$ cd CalDir
$ ls
$ cd ..
$ cp foo3 CalDir
$ ls CalDir

Notice that you can use the "cp" command to copy a file to a different directory -- the copy will have the same name as the original file. What if you forget to do the mkdir first?

$ cp foo3 FooDir

Now what happens when you do the following:

$ ls FooDir
$ cd FooDir
$ cat CalDir
$ cat FooDir
$ ls -ltr

CalDir is a directory, but FooDir is a regular file. You can tell this by the "d" that shows up in the string of letters when you do the "ls -ltr". That's what happens when you try to cp or mv a file to a directory that doesn't exist -- a file gets created with the target name. You can imagine a scenario in which you run a program and want to copy the resulting files to a directory called Output but you forget to create the directory first -- this is a fairly common mistake.

Some more information on how to tweak your bashrc: .bashrc

Step 3 -- Environment Variables

Before we move on to creating a batch script, you need to know more about environment variables. An environment variable is a word that stands for some other text. We have already seen an example of this with the variable HOME. Try this:

$ MY_ENV_VAR="something I would rather not type over and over"
$ echo MY_ENV_VAR
$ echo $MY_ENV_VAR
$ echo "MY_ENV_VAR stands for $MY_ENV_VAR"

You define an environment variable by assigning some text to it with the equals sign. That's what the first line above does. When you use '$' followed by the name of your environment variable in a command line, UNIX makes the substitution. If you forget the '$' the substitution will not be made.

There are some environment variables that come pre-defined when you log in. Try using 'echo' to see the values of the following variables: HOME, HOSTNAME, SHELL, TERM, PATH.

Now you are ready to use some of this UNIX knowledge to create and run a script.