Nrpe

From CLONWiki
Revision as of 09:57, 7 September 2007 by Boiarino (talk | contribs)
Jump to navigation Jump to search

We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.

Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:

 cd /usr/local/src/nrpe-2.6
 ./configure
 make all
 cp src/check_nrpe /www/nagios2.6/libexec
 chown nagios.nagios /www/nagios2.6/libexec/check_nrpe

Generic installation (all machines, including clonweb if it is not done yet):

 cd /usr/local/src
 cp ../downloads/nrpe-2.6.tar.gz .
 gunzip nrpe-2.6.tar.gz
 tar xvf nrpe-2.6.tar
 rm nrpe-2.6.tar
 cd /usr/local/src/nrpe-2.6
 ./configure --prefix=/apps/nrpe2.6 --enable-command-args

Compiling:

 make all

On clonweb (where Nagios is running):

 cp src/check_nrpe /www/nagios2.6/libexec

On any other machine which suppose to be remotely monitored by clonweb:

 mkdir /apps/nrpe2.6
 mkdir /apps/nrpe2.6/libexec
 mkdir /apps/nrpe2.6/etc
 mkdir /apps/nrpe2.6/bin
 cp sample-config/nrpe.cfg /apps/nrpe2.6/etc/
 cp src/nrpe /apps/nrpe2.6/bin
 cp src/check_nrpe /apps/nrpe2.6/libexec/
 cp init-script /etc/init.d/nrpe
 emacs /etc/init.d/nrpe:
 # config: /apps/nrpe2.6/etc/nrpe.cfg
 NrpeBin=/apps/nrpe2.6/bin/nrpe
 NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg

emacs /apps/nrpe2.6/etc/nrpe.cfg:

 dont_blame_nrpe=1
 command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /
 command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

on clonweb: copy plugins we want to execute remotely

(remote machines will not see /www, only /apps !!!) cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/


To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed. Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):

on clonweb:

 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test
     must return something like that:
DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 

on non-clonweb:

 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test
 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /

Testing 'nrpe':

NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!

To greate group 'nagios' on Solaris:

 groupadd -g 9997 nagios

Modify 'nagios..' line in /etc/group file as follwoing:

 nagios::9997:nagios

To create user 'nagios' on Solaris:

 useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).

If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc). To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).

On Linux use '/usr/bin/system-config-users' utility.

To start/stop/restart 'nrpe' daemon on Linux (as 'root'):

 /etc/init.d/nrpe start
 /etc/init.d/nrpe stop
 /etc/init.d/nrpe restart

To start 'nrpe' daemon on Solaris (as 'root'):

 /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 ps -ef | grep nrpe
 nagios  3051     1   0 13:09:24 ?           0:00  ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 more /var/run/nrpe.pid
 3051

In both systems it is running under user 'nagios' in according to it's config file.

Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured by following steps:

Add following line to /etc/services:

 nrpe            5666/tcp        # NRPE

Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:

 # default: on
 # description: NRPE
 service nrpe
 {
         flags           = REUSE
         socket_type     = stream        
         wait            = no
         user            = nagios
         group           = nagios
         server          = /apps/nrpe2.6/bin/nrpe
         server_args     = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
         log_on_failure  += USERID
         disable         = no
 ###        only_from       = 129.57.167.42
 }

Solaris (inet): add following line to /etc/inetd.conf:

 nrpe    stream  tcp     nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd

Linux: restart xinet service:

 /etc/init.d/xinetd restart

Solaris:

 inetconv -i /etc/inet/inetd.conf
 svcadm restart /network/inetd

Run mentioned above tests from another machine. Check for possible errors:

  tail -100 /var/log/messages | grep nrpe

Solaris:

 svcs | grep nrpe
 online          0:41:40 svc:/network/nrpe/tcp:default


Useful commands:

 netstat -lp
 more /var/log/messages | grep nrpe
     Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: 
     Permission denied (errno = 13)
 clon10:src> inetadm -l svc:/network/nrpe/tcp:default
   SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
   default  bind_addr=""
   default  bind_fail_max=-1
   default  bind_fail_interval=-1
   default  max_con_rate=-1
   default  max_copies=-1
   default  con_rate_offline=-1
   default  failrate_cnt=40
   default  failrate_interval=60
   default  inherit_env=TRUE
   default  tcp_trace=FALSE
   default  tcp_wrappers=FALSE
   clon10:src>

IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started not from 'inetd' but as separate service using following procedure:

Create 'nrpe' manifest file /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, remove first column from every line, otherwise svccfg: couldn't parse document message will appeares on svccfg import command):

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>

Add following line to /etc/user_attr:

 nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User

Add following line to /etc/security/auth_attr:

 solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::

(two last actions allows the nagios user to start and stop services).

Import service configuration and enable service:

 svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
 svcadm enable application/management/nagios/nrpe

Check if it is running:

 svcs | grep nrpe

If status is not 'online', type 'svcs -x' and look at specified log file.

NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:

 /etc/user_attr:
    'clasrun::::' instead of 'nagios::::'
 /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
 /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners