Nrpe: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
No edit summary
 
(16 intermediate revisions by one other user not shown)
Line 77: Line 77:
NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!
NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!


To greate group 'nagios' add following line to /etc/group file:
To greate group 'nagios' on Solaris:
 
  groupadd -g 9997 nagios
 
Modify 'nagios..' line in /etc/group file as follwoing:
 
   nagios::9997:nagios
   nagios::9997:nagios


To create user on Solaris:
To create user 'nagios' on Solaris:


   useradd -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
   useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
     (add flag '-m' if want to force home directory creation).
     (add flag '-m' if want to force home directory creation).


Line 88: Line 93:
To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).
To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).


On Linux use '/usr/bin/system-config-users' utility.
On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run ''/usr/sbin/pwconv'', it will update ''/etc/shadow'' using information from ''/etc/passwd''.
 
NOTE: to add existing user 'xxx' to the group 'yyy' do following:
 
usermod -G yyy xxx


To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
Line 107: Line 116:


In both systems it is running under user 'nagios' in according to it's config file.
In both systems it is running under user 'nagios' in according to it's config file.
'''SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS'''


Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured
Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured
Line 188: Line 202:
not from 'inetd' but as separate service using following procedure:
not from 'inetd' but as separate service using following procedure:


Create 'nrpe' manifest file  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml.
Create 'nrpe' manifest file  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise ''svccfg: couldn't parse document'' message will appeares on ''svccfg import'' command):
 
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>


Add following line to /etc/user_attr:
Add following line to /etc/user_attr:


   nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic
   nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User
Solaris User


Add following line to /etc/security/auth_attr:
Add following line to /etc/security/auth_attr:
Line 200: Line 245:


(two last actions allows the nagios user to start and stop services).
(two last actions allows the nagios user to start and stop services).
Import service configuration and enable service:
  svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
  svcadm enable application/management/nagios/nrpe
Check if it is running:
  svcs | grep nrpe
If status is not 'online', type 'svcs -x' and look at specified log file.
NOTE: error message shows up:
clon10:/root> svcadm enable application/management/nagios/nrpe
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'
NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:
  /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
  /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners
'''NOTE:''' not sure if we need line in ''/etc/user_attr' for clasrun at all ..
'''NOTE''': 'openssl' required by 'nrpe' was installed into ''/usr/local/ssl'' area, so that directory must be mounted.

Latest revision as of 20:35, 2 October 2010

We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.

Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:

 cd /usr/local/src/nrpe-2.6
 ./configure
 make all
 cp src/check_nrpe /www/nagios2.6/libexec
 chown nagios.nagios /www/nagios2.6/libexec/check_nrpe

Generic installation (all machines, including clonweb if it is not done yet):

 cd /usr/local/src
 cp ../downloads/nrpe-2.6.tar.gz .
 gunzip nrpe-2.6.tar.gz
 tar xvf nrpe-2.6.tar
 rm nrpe-2.6.tar
 cd /usr/local/src/nrpe-2.6
 ./configure --prefix=/apps/nrpe2.6 --enable-command-args

Compiling:

 make all

On clonweb (where Nagios is running):

 cp src/check_nrpe /www/nagios2.6/libexec

On any other machine which suppose to be remotely monitored by clonweb:

 mkdir /apps/nrpe2.6
 mkdir /apps/nrpe2.6/libexec
 mkdir /apps/nrpe2.6/etc
 mkdir /apps/nrpe2.6/bin
 cp sample-config/nrpe.cfg /apps/nrpe2.6/etc/
 cp src/nrpe /apps/nrpe2.6/bin
 cp src/check_nrpe /apps/nrpe2.6/libexec/
 cp init-script /etc/init.d/nrpe
 emacs /etc/init.d/nrpe:
 # config: /apps/nrpe2.6/etc/nrpe.cfg
 NrpeBin=/apps/nrpe2.6/bin/nrpe
 NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg

emacs /apps/nrpe2.6/etc/nrpe.cfg:

 dont_blame_nrpe=1
 command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /
 command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

on clonweb: copy plugins we want to execute remotely

(remote machines will not see /www, only /apps !!!) cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/


To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed. Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):

on clonweb:

 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test
     must return something like that:
DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 

on non-clonweb:

 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test
 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /

Testing 'nrpe':

NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!

To greate group 'nagios' on Solaris:

 groupadd -g 9997 nagios

Modify 'nagios..' line in /etc/group file as follwoing:

 nagios::9997:nagios

To create user 'nagios' on Solaris:

 useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).

If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc). To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).

On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run /usr/sbin/pwconv, it will update /etc/shadow using information from /etc/passwd.

NOTE: to add existing user 'xxx' to the group 'yyy' do following:

usermod -G yyy xxx

To start/stop/restart 'nrpe' daemon on Linux (as 'root'):

 /etc/init.d/nrpe start
 /etc/init.d/nrpe stop
 /etc/init.d/nrpe restart

To start 'nrpe' daemon on Solaris (as 'root'):

 /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 ps -ef | grep nrpe
 nagios  3051     1   0 13:09:24 ?           0:00  ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 more /var/run/nrpe.pid
 3051

In both systems it is running under user 'nagios' in according to it's config file.



SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS

Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured by following steps:

Add following line to /etc/services:

 nrpe            5666/tcp        # NRPE

Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:

 # default: on
 # description: NRPE
 service nrpe
 {
         flags           = REUSE
         socket_type     = stream        
         wait            = no
         user            = nagios
         group           = nagios
         server          = /apps/nrpe2.6/bin/nrpe
         server_args     = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
         log_on_failure  += USERID
         disable         = no
 ###        only_from       = 129.57.167.42
 }

Solaris (inet): add following line to /etc/inetd.conf:

 nrpe    stream  tcp     nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd

Linux: restart xinet service:

 /etc/init.d/xinetd restart

Solaris:

 inetconv -i /etc/inet/inetd.conf
 svcadm restart /network/inetd

Run mentioned above tests from another machine. Check for possible errors:

  tail -100 /var/log/messages | grep nrpe

Solaris:

 svcs | grep nrpe
 online          0:41:40 svc:/network/nrpe/tcp:default


Useful commands:

 netstat -lp
 more /var/log/messages | grep nrpe
     Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: 
     Permission denied (errno = 13)
 clon10:src> inetadm -l svc:/network/nrpe/tcp:default
   SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
   default  bind_addr=""
   default  bind_fail_max=-1
   default  bind_fail_interval=-1
   default  max_con_rate=-1
   default  max_copies=-1
   default  con_rate_offline=-1
   default  failrate_cnt=40
   default  failrate_interval=60
   default  inherit_env=TRUE
   default  tcp_trace=FALSE
   default  tcp_wrappers=FALSE
   clon10:src>

IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started not from 'inetd' but as separate service using following procedure:

Create 'nrpe' manifest file /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise svccfg: couldn't parse document message will appeares on svccfg import command):

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>

Add following line to /etc/user_attr:

 nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User

Add following line to /etc/security/auth_attr:

 solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::

(two last actions allows the nagios user to start and stop services).

Import service configuration and enable service:

 svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
 svcadm enable application/management/nagios/nrpe

Check if it is running:

 svcs | grep nrpe

If status is not 'online', type 'svcs -x' and look at specified log file.

NOTE: error message shows up:

clon10:/root> svcadm enable application/management/nagios/nrpe
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.

but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'

NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:

 /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
 /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
 /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners

NOTE: not sure if we need line in /etc/user_attr' for clasrun at all ..

NOTE: 'openssl' required by 'nrpe' was installed into /usr/local/ssl area, so that directory must be mounted.