Nagios: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
No edit summary
 
(6 intermediate revisions by 3 users not shown)
Line 26: Line 26:
   tar xvf nagios-2.6.tar
   tar xvf nagios-2.6.tar
   cd '/usr/local/src/nagios-2.6'
   cd '/usr/local/src/nagios-2.6'
  su nagios
                  NO !!! su nagios
   ./configure --prefix=/www/nagios2.6 --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd
   ./configure --prefix=/www/nagios2.6 --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd
      ??did on RHEL6:  ./configure --prefix=/www/nagios-3.4.1 --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagcmd --with-command-group=nagcmd
   make all
   make all
   make install
   make install
Line 49: Line 50:




Plugins:
Install plugins and nrpe -  see corresponding sections.
 
  cd /usr/local/src/nagios-plugins-1.4.5
  su nagios
  ./configure --prefix=/www/nagios2.6
  make
  make install
 
Extra plagins (nrpe):
 
  cd /usr/local/src/nrpe-2.6
  ./configure
  make all
  cp src/check_nrpe /www/nagios2.6/libexec
  chown nagios.nagios /www/nagios2.6/libexec/check_nrpe
 
  Create a command definition in your Nagios config
  file for the NRPE client. See the README file for
  more info on doing this.
 
  NOTE: remote host(s) must be running the NRPE daemon !!!
    - Copy the nrpe daemon to /usr/sbin, /apps/nagios2.6
      or wherever you feel it fits best.
    - Copy the sample nrpe.cfg config file to /etc,
      /apps/nagios2.6 or wherever you feel it fits best.
    - Modify the /etc/services file and configure NRPE to
      run under inetd, xinetd, or as a standalone daemon.
      See the README file for more info on doing this.




Line 116: Line 90:


   chown nagios:nagcmd /www/nagios2.6/var/log/nagios.cmd
   chown nagios:nagcmd /www/nagios2.6/var/log/nagios.cmd
'''ADDITIONAL INFO (COPIED FROM http://klickitat.ee.washington.edu/medg/software/nagios-install-notes.txt):'''
some notes on installing nagios - these are a supplement to the basic
nagios documentation that comes with the software:
Download and install the nagios and nagios-plugin packages. For whatever reason
www.nagios.org seems to be hosed now (7/23/2004), but look on google. There
is also sourceforge.nagios.net, which seems to be another nagios homepage.
Create a nagios user.  Compile and install the packages as described in the
documentation.  Redhat
has all necessary libraries already installed.  I just went with the defaults
in the compilation.  The default is for nagios to install itself in
/usr/local/nagios; everything should be owned by user nagios.
To enable the web interface, edit httpd.conf to add the Alias and
ScriptAlias directives as
described in the nagios documentation.  This works for both apache 1.3 and
apache 2.0.  Restart apache.  At this point you should be able to go to
http://www.whatever/nagios/ and see the nagios page and access the
documentation.  CGIs probably won't work.
You need to set up the config files; this is the real heart of installing
nagios and unfortunately is much easier to show than to describe.  The first
step is to copy the *.cfg-sample files that should be in /usr/local/nagios/etc
to *.cfg.  Then you need to edit these files to describe your setup.
Basically, hosts.cfg describes the hosts you want to monitor, services.cfg
describes the services you want to monitor on each host, checkcommands.cfg
is the check commands used by services.cfg to check the services; if you want
to check a service you probably have to add a command to do so; contacts.cfg
is the people who will be contacted in case of a problem, contactgroups
is the groups of people, hostgroups.cfg is the groups of hosts (rrsl-machines,
for example).  nagios.cfg is the master config file.  Probably you can get
by just by copying and pasting the stuff already in these files and tweaking
it.
To add a new machine you will need to edit hosts.cfg (add the machine),
hostgroups.cfg (put it in a hostgroup), services.cfg (add the services to
be checked on the machine).
To add a new administrator, you will need to edit contacts.cfg (add the new
person) and contactgroups.cfg (put them in a contact group or create one).
In misccommands.cfg I needed to change /usr/bin/mail to /bin/mail on redhat
-- but not on slackware!  Otherwise it was not able to mail messages.
At this point you can check your config using the
'/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg' command,
which does a check of the configuration and will warn if there are errors.
Fix the errors and repeat until it is happy.
Now you need to enable authorization so that the cgi scripts will work.  To
do this first create the
.htaccess file in the nagios/sbin directory as described in the
documentation - it must be world-readable.
Next create the htpasswd.users file in the nagios/etc directory as described in
the documentation - it must be world-readable!  I added only one user: rrsl
At this point you should be able to start nagios using:
/usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
After nagios is started you should be able to go to the web page and
use the cgis to display info.  The final step is enabling the external command
cgis, which let you change the behavior of nagios from the web.  To do this
you have to follow the steps in the documentation to enable external commands
This involves enabling external commands in the config file and specifying
an external commands file....
The permissions on the file seem to be a source of problems.
To enable external commands you have to first create a group containing
the nagios user and the user httpd runs as (apache for us).  Then you create
the directory /usr/local/nagios/var/rw with permissions:
drwxrwsr-x    2 nagios  nagioscmd    4096 Aug  3 13:55 rw
That's rwx permissions for the user and rws permissions for the group
chmod gu+rwx nagios.nagioscmd rw
chmod g+s rw
Then you have to restart both apache and nagios, or it doesn't work!
The documentation describes some other gotchas..
To run checks for services on remote machines, you need to set up ssh to
log in without a password.  To do this run
$ ssh-keygen -t rsa
to create a public/private key pair.  copy the public key into the
~/.ssh/authorized_keys file of the user nagios on the remote host.  This file
must only be rw by nagios or ssh will not work.  The directory .ssh must also
be only rwx by nagios.  Then you should be able
to ssh to the remote host as nagios without giving a password.
Copy the plugins you want to run on the remote host to the remote host,
and then set up checks in checkcommands.cfg, and services.cfg.  See the
check-host-radar command for an example.  The plugins can be any kind of
program.
Many plugins were timing out after ten seconds when checking on heimdal and
umtanum.  Although there claimed to be a command line option to change this,
in practice there was not.  Therefore, I changed the source to set the timeout
to 30sec and recompiled the plugins.  This appears to work. (7/26/04)
nagios comes with the file /etc/rc.d/init.d/nagios, which is a script for
starting nagios from rc.local or from the command line as root (or by
sudo).  This seems to be the best way of starting or stopping the program.
sudo /etc/rc.d/init.d/nagios start
sudo /etc/rc.d/init.d/nagios stop
etc.
nagios has the ability to acknowledge a host condition, so if a host
is down, you can "acknowledge" through the web interface, and it will
stop sending email unless the host changes state.  This is useful.
You can also disable notification for a service, which is handy
Adding a user to the web interface:
1) edit cgi.cfg to add the user to the actions that you want them to do
2) edit sbin/.htaccess to add the user to the list of ok users, eg:
    require user rrsl radar
    for users rrsl and radar being able to access the web interface
    Keep in mind this file must be world readable...
3) issue htpasswd /usr/local/nagios/etc/htpasswd.users <new user> as root, to
    create the new user and password
4) stop nagios
5) restart the web server
6  start nagios
It should work, you should be able to log in and do stuff.

Latest revision as of 13:50, 28 September 2012

Nagios is main monitoring tool for CLON cluster.

Download from web following files and place them to '/usr/local/downloads':

 nagios-2.6.tar.gz
 nagios-images_0.3.tar.gz
 nagios-plugins-1.4.5.tar.gz
 nagiosmib-1.0.0.tar.gz (?????????)
 create user 'nagios', private group 'nagios'
 mkdir /www/nagios2.6
 chown nagios.nagios /www/nagios2.6

Add command file group and put appropriate users in (we assume that apache is running as user 'apache'):

 /usr/sbin/groupadd nagcmd
 /usr/sbin/usermod -G nagcmd apache
 /usr/sbin/usermod -G nagcmd nagios
 to check, see file /etc/group

Build and install:

 cp /usr/local/downloads/nagios-2.6.tar.gz /usr/local/src
 cd /usr/local/src
 gunzip nagios-2.6.tar.gz
 tar xvf nagios-2.6.tar
 cd '/usr/local/src/nagios-2.6'
                  NO !!! su nagios
 ./configure --prefix=/www/nagios2.6 --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagcmd
     ??did on RHEL6:  ./configure --prefix=/www/nagios-3.4.1 --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios --with-nagios-user=nagios --with-nagios-group=nagcmd --with-command-group=nagcmd
 make all
 make install

Install init script /etc/init.d/nagios:

 make install-init (as 'root' !!!)

Modify /etc/init.d/nagios script as following (bug ?):

 ###NagiosRunFile=${prefix}/var/nagios.lock
 NagiosRunFile=${prefix}/var/run/nagios.pid


 #to install sample /etc directory
 make install-config
 #???This installs and configures permissions on the
 #???directory for holding the external command file
 make install-commandmode


Install plugins and nrpe - see corresponding sections.


Fix apache configuration file:

 add contents of /usr/local/src/nagios-2.6/sample-config/httpd.conf
 to /www/apache2.2.3/conf/httpd.conf

Copy 'etc' directory from old Nagios (if any) to /www/nagios2.6. Go through files cgi.cfg, nagios.cfg and private/* and fix pathes making them point to /www/nagios2.6.

Add several directories for output files:

 mkdir /www/nagios2.6/var/log
 mkdir /www/nagios2.6/var/run
 mkdir /www/nagios2.6/var/rw (?????)

Install icons:

 cd /usr/local/src/nagios-images-0.3/base
 cp * /www/nagios2.6/share/images/logos

To check configuration run following commands:

 /www/nagios2.6/bin/nagios -v /www/nagios2.6/etc/nagios.cfg

To start/stop (restart need to be fixed ..):

 /etc/init.d/nagios start/stop

Add Nagios to services:

 chkconfig --add nagios
 chkconfig --level 3 nagios off
 chkconfig --level 4 nagios off
 chkconfig --list nagios

NOTE: following command was executed to let browser to disable host checks; it should be investigated ...

 chown nagios:nagcmd /www/nagios2.6/var/log/nagios.cmd

ADDITIONAL INFO (COPIED FROM http://klickitat.ee.washington.edu/medg/software/nagios-install-notes.txt):

some notes on installing nagios - these are a supplement to the basic nagios documentation that comes with the software:

Download and install the nagios and nagios-plugin packages. For whatever reason www.nagios.org seems to be hosed now (7/23/2004), but look on google. There is also sourceforge.nagios.net, which seems to be another nagios homepage.

Create a nagios user. Compile and install the packages as described in the documentation. Redhat has all necessary libraries already installed. I just went with the defaults in the compilation. The default is for nagios to install itself in /usr/local/nagios; everything should be owned by user nagios.

To enable the web interface, edit httpd.conf to add the Alias and ScriptAlias directives as described in the nagios documentation. This works for both apache 1.3 and apache 2.0. Restart apache. At this point you should be able to go to http://www.whatever/nagios/ and see the nagios page and access the documentation. CGIs probably won't work.

You need to set up the config files; this is the real heart of installing nagios and unfortunately is much easier to show than to describe. The first step is to copy the *.cfg-sample files that should be in /usr/local/nagios/etc to *.cfg. Then you need to edit these files to describe your setup.

Basically, hosts.cfg describes the hosts you want to monitor, services.cfg describes the services you want to monitor on each host, checkcommands.cfg is the check commands used by services.cfg to check the services; if you want to check a service you probably have to add a command to do so; contacts.cfg is the people who will be contacted in case of a problem, contactgroups is the groups of people, hostgroups.cfg is the groups of hosts (rrsl-machines, for example). nagios.cfg is the master config file. Probably you can get by just by copying and pasting the stuff already in these files and tweaking it.

To add a new machine you will need to edit hosts.cfg (add the machine), hostgroups.cfg (put it in a hostgroup), services.cfg (add the services to be checked on the machine).

To add a new administrator, you will need to edit contacts.cfg (add the new person) and contactgroups.cfg (put them in a contact group or create one).

In misccommands.cfg I needed to change /usr/bin/mail to /bin/mail on redhat -- but not on slackware! Otherwise it was not able to mail messages.

At this point you can check your config using the '/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg' command, which does a check of the configuration and will warn if there are errors. Fix the errors and repeat until it is happy.

Now you need to enable authorization so that the cgi scripts will work. To do this first create the .htaccess file in the nagios/sbin directory as described in the documentation - it must be world-readable.

Next create the htpasswd.users file in the nagios/etc directory as described in the documentation - it must be world-readable! I added only one user: rrsl

At this point you should be able to start nagios using: /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg

After nagios is started you should be able to go to the web page and use the cgis to display info. The final step is enabling the external command cgis, which let you change the behavior of nagios from the web. To do this you have to follow the steps in the documentation to enable external commands This involves enabling external commands in the config file and specifying an external commands file....

The permissions on the file seem to be a source of problems. To enable external commands you have to first create a group containing the nagios user and the user httpd runs as (apache for us). Then you create the directory /usr/local/nagios/var/rw with permissions: drwxrwsr-x 2 nagios nagioscmd 4096 Aug 3 13:55 rw

That's rwx permissions for the user and rws permissions for the group chmod gu+rwx nagios.nagioscmd rw chmod g+s rw Then you have to restart both apache and nagios, or it doesn't work!

The documentation describes some other gotchas..

To run checks for services on remote machines, you need to set up ssh to log in without a password. To do this run $ ssh-keygen -t rsa to create a public/private key pair. copy the public key into the ~/.ssh/authorized_keys file of the user nagios on the remote host. This file must only be rw by nagios or ssh will not work. The directory .ssh must also be only rwx by nagios. Then you should be able to ssh to the remote host as nagios without giving a password.

Copy the plugins you want to run on the remote host to the remote host, and then set up checks in checkcommands.cfg, and services.cfg. See the check-host-radar command for an example. The plugins can be any kind of program.

Many plugins were timing out after ten seconds when checking on heimdal and umtanum. Although there claimed to be a command line option to change this, in practice there was not. Therefore, I changed the source to set the timeout to 30sec and recompiled the plugins. This appears to work. (7/26/04)

nagios comes with the file /etc/rc.d/init.d/nagios, which is a script for starting nagios from rc.local or from the command line as root (or by sudo). This seems to be the best way of starting or stopping the program.

sudo /etc/rc.d/init.d/nagios start sudo /etc/rc.d/init.d/nagios stop etc.

nagios has the ability to acknowledge a host condition, so if a host is down, you can "acknowledge" through the web interface, and it will stop sending email unless the host changes state. This is useful. You can also disable notification for a service, which is handy


Adding a user to the web interface:

1) edit cgi.cfg to add the user to the actions that you want them to do
2) edit sbin/.htaccess to add the user to the list of ok users, eg:
   require user rrsl radar
   for users rrsl and radar being able to access the web interface
   Keep in mind this file must be world readable...
3) issue htpasswd /usr/local/nagios/etc/htpasswd.users <new user> as root, to
   create the new user and password
4) stop nagios
5) restart the web server
6  start nagios

It should work, you should be able to log in and do stuff.