Nrpe: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
No edit summary
 
(26 intermediate revisions by one other user not shown)
Line 76: Line 76:


NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!
NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!
To create user on Solaris: 'useradd -d /home/nagios -m -c "Nagios" nagios' (-m will force home directory creation).
 
To greate group 'nagios' on Solaris:
 
  groupadd -g 9997 nagios
 
Modify 'nagios..' line in /etc/group file as follwoing:
 
  nagios::9997:nagios
 
To create user 'nagios' on Solaris:
 
  useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).
 
If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc).
If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc).
To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).
To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).
On Linux use '/usr/bin/system-config-users' utility.
 
On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run ''/usr/sbin/pwconv'', it will update ''/etc/shadow'' using information from ''/etc/passwd''.
 
NOTE: to add existing user 'xxx' to the group 'yyy' do following:
 
usermod -G yyy xxx


To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
Line 98: Line 116:


In both systems it is running under user 'nagios' in according to it's config file.
In both systems it is running under user 'nagios' in according to it's config file.
'''SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS'''


Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured
Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured
by following steps:
by following steps:


   ...
Add following line to /etc/services:
 
   nrpe            5666/tcp        # NRPE
 
Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:


  # default: on
  # description: NRPE
  service nrpe
  {
          flags          = REUSE
          socket_type    = stream       
          wait            = no
          user            = nagios
          group          = nagios
          server          = /apps/nrpe2.6/bin/nrpe
          server_args    = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
          log_on_failure  += USERID
          disable        = no
  ###        only_from      = 129.57.167.42
  }
Solaris (inet): add following line to /etc/inetd.conf:
  nrpe    stream  tcp    nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
Linux: restart xinet service:
  /etc/init.d/xinetd restart
Solaris:
  inetconv -i /etc/inet/inetd.conf
  svcadm restart /network/inetd
Run mentioned above tests from another machine. Check for possible errors:
  tail -100 /var/log/messages | grep nrpe
Solaris:
  svcs | grep nrpe
  online          0:41:40 svc:/network/nrpe/tcp:default
Useful commands:


   netstat -lp
   netstat -lp
  more /var/log/messages | grep nrpe
      Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed:
      Permission denied (errno = 13)
  clon10:src> inetadm -l svc:/network/nrpe/tcp:default
    SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
    default  bind_addr=""
    default  bind_fail_max=-1
    default  bind_fail_interval=-1
    default  max_con_rate=-1
    default  max_copies=-1
    default  con_rate_offline=-1
    default  failrate_cnt=40
    default  failrate_interval=60
    default  inherit_env=TRUE
    default  tcp_trace=FALSE
    default  tcp_wrappers=FALSE
    clon10:src>
IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started
not from 'inetd' but as separate service using following procedure:
Create 'nrpe' manifest file  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise ''svccfg: couldn't parse document'' message will appeares on ''svccfg import'' command):
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>
Add following line to /etc/user_attr:
  nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User
Add following line to /etc/security/auth_attr:
  solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::
(two last actions allows the nagios user to start and stop services).


   more /var/log/messages | grep nrpe
Import service configuration and enable service:
      Jan 7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: Permission denied (errno = 13)
 
   svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
  svcadm enable application/management/nagios/nrpe
 
Check if it is running:
 
  svcs | grep nrpe
 
If status is not 'online', type 'svcs -x' and look at specified log file.
 
NOTE: error message shows up:
clon10:/root> svcadm enable application/management/nagios/nrpe
  clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'
 
NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:


  /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
  /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners


  Create a command definition in your Nagios config
'''NOTE:''' not sure if we need line in ''/etc/user_attr' for clasrun at all ..
  file for the NRPE client. See the README file for
  more info on doing this.


  NOTE: remote host(s) must be running the NRPE daemon !!!
'''NOTE''': 'openssl' required by 'nrpe' was installed into ''/usr/local/ssl'' area, so that directory must be mounted.
    - Copy the nrpe daemon to /usr/sbin, /apps/nagios2.6
      or wherever you feel it fits best.
    - Copy the sample nrpe.cfg config file to /etc,
      /apps/nagios2.6 or wherever you feel it fits best.
    - Modify the /etc/services file and configure NRPE to
      run under inetd, xinetd, or as a standalone daemon.
      See the README file for more info on doing this.

Latest revision as of 20:35, 2 October 2010

We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.

Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:

 cd /usr/local/src/nrpe-2.6
 ./configure
 make all
 cp src/check_nrpe /www/nagios2.6/libexec
 chown nagios.nagios /www/nagios2.6/libexec/check_nrpe

Generic installation (all machines, including clonweb if it is not done yet):

 cd /usr/local/src
 cp ../downloads/nrpe-2.6.tar.gz .
 gunzip nrpe-2.6.tar.gz
 tar xvf nrpe-2.6.tar
 rm nrpe-2.6.tar
 cd /usr/local/src/nrpe-2.6
 ./configure --prefix=/apps/nrpe2.6 --enable-command-args

Compiling:

 make all

On clonweb (where Nagios is running):

 cp src/check_nrpe /www/nagios2.6/libexec

On any other machine which suppose to be remotely monitored by clonweb:

 mkdir /apps/nrpe2.6
 mkdir /apps/nrpe2.6/libexec
 mkdir /apps/nrpe2.6/etc
 mkdir /apps/nrpe2.6/bin
 cp sample-config/nrpe.cfg /apps/nrpe2.6/etc/
 cp src/nrpe /apps/nrpe2.6/bin
 cp src/check_nrpe /apps/nrpe2.6/libexec/
 cp init-script /etc/init.d/nrpe
 emacs /etc/init.d/nrpe:
 # config: /apps/nrpe2.6/etc/nrpe.cfg
 NrpeBin=/apps/nrpe2.6/bin/nrpe
 NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg

emacs /apps/nrpe2.6/etc/nrpe.cfg:

 dont_blame_nrpe=1
 command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /
 command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

on clonweb: copy plugins we want to execute remotely

(remote machines will not see /www, only /apps !!!) cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/


To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed. Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):

on clonweb:

 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test
     must return something like that:
DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 

on non-clonweb:

 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test
 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /

Testing 'nrpe':

NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!

To greate group 'nagios' on Solaris:

 groupadd -g 9997 nagios

Modify 'nagios..' line in /etc/group file as follwoing:

 nagios::9997:nagios

To create user 'nagios' on Solaris:

 useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).

If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc). To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).

On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run /usr/sbin/pwconv, it will update /etc/shadow using information from /etc/passwd.

NOTE: to add existing user 'xxx' to the group 'yyy' do following:

usermod -G yyy xxx

To start/stop/restart 'nrpe' daemon on Linux (as 'root'):

 /etc/init.d/nrpe start
 /etc/init.d/nrpe stop
 /etc/init.d/nrpe restart

To start 'nrpe' daemon on Solaris (as 'root'):

 /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 ps -ef | grep nrpe
 nagios  3051     1   0 13:09:24 ?           0:00  ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 more /var/run/nrpe.pid
 3051

In both systems it is running under user 'nagios' in according to it's config file.



SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS

Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured by following steps:

Add following line to /etc/services:

 nrpe            5666/tcp        # NRPE

Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:

 # default: on
 # description: NRPE
 service nrpe
 {
         flags           = REUSE
         socket_type     = stream        
         wait            = no
         user            = nagios
         group           = nagios
         server          = /apps/nrpe2.6/bin/nrpe
         server_args     = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
         log_on_failure  += USERID
         disable         = no
 ###        only_from       = 129.57.167.42
 }

Solaris (inet): add following line to /etc/inetd.conf:

 nrpe    stream  tcp     nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd

Linux: restart xinet service:

 /etc/init.d/xinetd restart

Solaris:

 inetconv -i /etc/inet/inetd.conf
 svcadm restart /network/inetd

Run mentioned above tests from another machine. Check for possible errors:

  tail -100 /var/log/messages | grep nrpe

Solaris:

 svcs | grep nrpe
 online          0:41:40 svc:/network/nrpe/tcp:default


Useful commands:

 netstat -lp
 more /var/log/messages | grep nrpe
     Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: 
     Permission denied (errno = 13)
 clon10:src> inetadm -l svc:/network/nrpe/tcp:default
   SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
   default  bind_addr=""
   default  bind_fail_max=-1
   default  bind_fail_interval=-1
   default  max_con_rate=-1
   default  max_copies=-1
   default  con_rate_offline=-1
   default  failrate_cnt=40
   default  failrate_interval=60
   default  inherit_env=TRUE
   default  tcp_trace=FALSE
   default  tcp_wrappers=FALSE
   clon10:src>

IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started not from 'inetd' but as separate service using following procedure:

Create 'nrpe' manifest file /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise svccfg: couldn't parse document message will appeares on svccfg import command):

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>

Add following line to /etc/user_attr:

 nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User

Add following line to /etc/security/auth_attr:

 solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::

(two last actions allows the nagios user to start and stop services).

Import service configuration and enable service:

 svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
 svcadm enable application/management/nagios/nrpe

Check if it is running:

 svcs | grep nrpe

If status is not 'online', type 'svcs -x' and look at specified log file.

NOTE: error message shows up:

clon10:/root> svcadm enable application/management/nagios/nrpe
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.

but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'

NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:

 /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
 /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
 /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners

NOTE: not sure if we need line in /etc/user_attr' for clasrun at all ..

NOTE: 'openssl' required by 'nrpe' was installed into /usr/local/ssl area, so that directory must be mounted.