Nrpe: Difference between revisions

From CLONWiki
Jump to navigation Jump to search
Boiarino (talk | contribs)
No edit summary
No edit summary
 
(45 intermediate revisions by one other user not shown)
Line 1: Line 1:
On clonweb (where nagios is running);
We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called
by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.
 
Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:


   cd /usr/local/src/nrpe-2.6
   cd /usr/local/src/nrpe-2.6
   ./configure
   ./configure
   make all
   make all
   cp src/check_nrpe /apps/nagios2.6/libexec
   cp src/check_nrpe /www/nagios2.6/libexec
  chown nagios.nagios /www/nagios2.6/libexec/check_nrpe
 
Generic installation (all machines, including clonweb if it is not done yet):
 
  cd /usr/local/src
  cp ../downloads/nrpe-2.6.tar.gz .
  gunzip nrpe-2.6.tar.gz
  tar xvf nrpe-2.6.tar
  rm nrpe-2.6.tar
  cd /usr/local/src/nrpe-2.6
  ./configure --prefix=/apps/nrpe2.6 --enable-command-args
 
Compiling:
 
  make all
 
On clonweb (where Nagios is running):
 
  cp src/check_nrpe /www/nagios2.6/libexec


On any other machine which suppose to be remotely monitored by clonweb:
On any other machine which suppose to be remotely monitored by clonweb:
Line 22: Line 44:
   NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg
   NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg


  emacs /apps/nrpe2.6/etc/nrpe.cfg: for example
emacs /apps/nrpe2.6/etc/nrpe.cfg:
  command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /dev/sda5


on clonweb: copy pluging we want to execute remotely
  dont_blame_nrpe=1
  command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /
  command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
 
on clonweb: copy plugins we want to execute remotely


(remote machines will not see /www, only /apps !!!)
(remote machines will not see /www, only /apps !!!)
cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/
cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/


/etc/init.d/nrpe start
 
To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed.
Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):


on clonweb:
on clonweb:
/www/nagios2.6/libexec/check_nrpe -H clonpc1 -c check_disk1
must return:
DISK OK - free space: / 7762 MB (54% inode=84%);| /=6411MB;14912;14922;0;14932


/etc/init.d/nrpe stop
  /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test
      must return something like that:
DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
 
  /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 
 
on non-clonweb:
 
  /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test
  /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 
Testing 'nrpe':
 
NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!
 
To greate group 'nagios' on Solaris:
 
  groupadd -g 9997 nagios
 
Modify 'nagios..' line in /etc/group file as follwoing:
 
  nagios::9997:nagios
 
To create user 'nagios' on Solaris:
 
  useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).
 
If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc).
To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).
 
On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run ''/usr/sbin/pwconv'', it will update ''/etc/shadow'' using information from ''/etc/passwd''.
 
NOTE: to add existing user 'xxx' to the group 'yyy' do following:
 
usermod -G yyy xxx
 
To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
 
  /etc/init.d/nrpe start
  /etc/init.d/nrpe stop
  /etc/init.d/nrpe restart
 
To start 'nrpe' daemon on Solaris (as 'root'):
 
  /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 
  ps -ef | grep nrpe
  nagios  3051    1  0 13:09:24 ?          0:00  ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 
  more /var/run/nrpe.pid
  3051
 
In both systems it is running under user 'nagios' in according to it's config file.
 
 
 
 
'''SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS'''
 
Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured
by following steps:
 
Add following line to /etc/services:
 
  nrpe            5666/tcp        # NRPE
 
Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:
 
  # default: on
  # description: NRPE
  service nrpe
  {
          flags          = REUSE
          socket_type    = stream       
          wait            = no
          user            = nagios
          group          = nagios
          server          = /apps/nrpe2.6/bin/nrpe
          server_args    = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
          log_on_failure  += USERID
          disable        = no
  ###        only_from      = 129.57.167.42
  }
 
Solaris (inet): add following line to /etc/inetd.conf:
 
  nrpe    stream  tcp    nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
 
Linux: restart xinet service:
 
  /etc/init.d/xinetd restart
 
Solaris:
 
  inetconv -i /etc/inet/inetd.conf
  svcadm restart /network/inetd
 
Run mentioned above tests from another machine. Check for possible errors:
 
  tail -100 /var/log/messages | grep nrpe
 
Solaris:
 
  svcs | grep nrpe
  online          0:41:40 svc:/network/nrpe/tcp:default
 
 
Useful commands:
 
  netstat -lp
  more /var/log/messages | grep nrpe
      Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed:
      Permission denied (errno = 13)
 
  clon10:src> inetadm -l svc:/network/nrpe/tcp:default
    SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
    default  bind_addr=""
    default  bind_fail_max=-1
    default  bind_fail_interval=-1
    default  max_con_rate=-1
    default  max_copies=-1
    default  con_rate_offline=-1
    default  failrate_cnt=40
    default  failrate_interval=60
    default  inherit_env=TRUE
    default  tcp_trace=FALSE
    default  tcp_wrappers=FALSE
    clon10:src>
 
IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started
not from 'inetd' but as separate service using following procedure:
 
Create 'nrpe' manifest file  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise ''svccfg: couldn't parse document'' message will appeares on ''svccfg import'' command):
 
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>
 
Add following line to /etc/user_attr:
 
  nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User
 
Add following line to /etc/security/auth_attr:
 
  solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::
 
(two last actions allows the nagios user to start and stop services).
 
Import service configuration and enable service:
 
  svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
  svcadm enable application/management/nagios/nrpe
 
Check if it is running:
 
  svcs | grep nrpe
 
If status is not 'online', type 'svcs -x' and look at specified log file.
 
NOTE: error message shows up:
clon10:/root> svcadm enable application/management/nagios/nrpe
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'
 
NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:
 
  /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
  /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
  /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners
 
'''NOTE:''' not sure if we need line in ''/etc/user_attr' for clasrun at all ..


IF WANT TO RUN AS DAEMON,
'''NOTE''': 'openssl' required by 'nrpe' was installed into ''/usr/local/ssl'' area, so that directory must be mounted.
create user 'nagios', private group 'nagios'

Latest revision as of 20:35, 2 October 2010

We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.

Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:

 cd /usr/local/src/nrpe-2.6
 ./configure
 make all
 cp src/check_nrpe /www/nagios2.6/libexec
 chown nagios.nagios /www/nagios2.6/libexec/check_nrpe

Generic installation (all machines, including clonweb if it is not done yet):

 cd /usr/local/src
 cp ../downloads/nrpe-2.6.tar.gz .
 gunzip nrpe-2.6.tar.gz
 tar xvf nrpe-2.6.tar
 rm nrpe-2.6.tar
 cd /usr/local/src/nrpe-2.6
 ./configure --prefix=/apps/nrpe2.6 --enable-command-args

Compiling:

 make all

On clonweb (where Nagios is running):

 cp src/check_nrpe /www/nagios2.6/libexec

On any other machine which suppose to be remotely monitored by clonweb:

 mkdir /apps/nrpe2.6
 mkdir /apps/nrpe2.6/libexec
 mkdir /apps/nrpe2.6/etc
 mkdir /apps/nrpe2.6/bin
 cp sample-config/nrpe.cfg /apps/nrpe2.6/etc/
 cp src/nrpe /apps/nrpe2.6/bin
 cp src/check_nrpe /apps/nrpe2.6/libexec/
 cp init-script /etc/init.d/nrpe
 emacs /etc/init.d/nrpe:
 # config: /apps/nrpe2.6/etc/nrpe.cfg
 NrpeBin=/apps/nrpe2.6/bin/nrpe
 NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg

emacs /apps/nrpe2.6/etc/nrpe.cfg:

 dont_blame_nrpe=1
 command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p /
 command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

on clonweb: copy plugins we want to execute remotely

(remote machines will not see /www, only /apps !!!) cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/


To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed. Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):

on clonweb:

 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test
     must return something like that:
DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
 /www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
 

on non-clonweb:

 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test
 /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /

Testing 'nrpe':

NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!

To greate group 'nagios' on Solaris:

 groupadd -g 9997 nagios

Modify 'nagios..' line in /etc/group file as follwoing:

 nagios::9997:nagios

To create user 'nagios' on Solaris:

 useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios
    (add flag '-m' if want to force home directory creation).

If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc). To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).

On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run /usr/sbin/pwconv, it will update /etc/shadow using information from /etc/passwd.

NOTE: to add existing user 'xxx' to the group 'yyy' do following:

usermod -G yyy xxx

To start/stop/restart 'nrpe' daemon on Linux (as 'root'):

 /etc/init.d/nrpe start
 /etc/init.d/nrpe stop
 /etc/init.d/nrpe restart

To start 'nrpe' daemon on Solaris (as 'root'):

 /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 ps -ef | grep nrpe
 nagios  3051     1   0 13:09:24 ?           0:00  ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
 more /var/run/nrpe.pid
 3051

In both systems it is running under user 'nagios' in according to it's config file.



SETTING ON ANY CLON MACHINE WHICH MUST BE MONITORED BY NAGIOS

Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured by following steps:

Add following line to /etc/services:

 nrpe            5666/tcp        # NRPE

Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:

 # default: on
 # description: NRPE
 service nrpe
 {
         flags           = REUSE
         socket_type     = stream        
         wait            = no
         user            = nagios
         group           = nagios
         server          = /apps/nrpe2.6/bin/nrpe
         server_args     = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
         log_on_failure  += USERID
         disable         = no
 ###        only_from       = 129.57.167.42
 }

Solaris (inet): add following line to /etc/inetd.conf:

 nrpe    stream  tcp     nowait  nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd

Linux: restart xinet service:

 /etc/init.d/xinetd restart

Solaris:

 inetconv -i /etc/inet/inetd.conf
 svcadm restart /network/inetd

Run mentioned above tests from another machine. Check for possible errors:

  tail -100 /var/log/messages | grep nrpe

Solaris:

 svcs | grep nrpe
 online          0:41:40 svc:/network/nrpe/tcp:default


Useful commands:

 netstat -lp
 more /var/log/messages | grep nrpe
     Jan  7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: 
     Permission denied (errno = 13)
 clon10:src> inetadm -l svc:/network/nrpe/tcp:default
   SCOPE    NAME=VALUE
        name="nrpe"
        endpoint_type="stream"
        proto="tcp"
        isrpc=FALSE
        wait=FALSE
        exec="/usr/sbin/nrpe"
        user="nagios"
   default  bind_addr=""
   default  bind_fail_max=-1
   default  bind_fail_interval=-1
   default  max_con_rate=-1
   default  max_copies=-1
   default  con_rate_offline=-1
   default  failrate_cnt=40
   default  failrate_interval=60
   default  inherit_env=TRUE
   default  tcp_trace=FALSE
   default  tcp_wrappers=FALSE
   clon10:src>

IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started not from 'inetd' but as separate service using following procedure:

Create 'nrpe' manifest file /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise svccfg: couldn't parse document message will appeares on svccfg import command):

<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM
"/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='nagios-nrpe'>
    <service name='application/management/nagios/nrpe' version='1' type='service'>
        <create_default_instance enabled='false' />
        <single_instance />
        <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'>
            <service_fmri value='svc:/milestone/multi-user' />
        </dependency>
        <method_context>
            <method_credential user='nagios' group='nagios'/>
            <method_environment>
                <envvar name='BASEDIR' value='/apps/nrpe2.6'/>
                <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/>
            </method_environment>
        </method_context>
        <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d'
            timeout_seconds='60'/>
        <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/>
        <property_group name='general' type='framework'>
            <propval name='enabled' type='boolean' value='false'/>
            <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
            <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/>
        </property_group>
        <property_group name='startd' type='framework'>
            <propval name='ignore_error' type='astring' value='core,signal' />
        </property_group>
        <stability value='Unstable' />
    </service>
</service_bundle>

Add following line to /etc/user_attr:

 nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User

Add following line to /etc/security/auth_attr:

 solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::

(two last actions allows the nagios user to start and stop services).

Import service configuration and enable service:

 svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml
 svcadm enable application/management/nagios/nrpe

Check if it is running:

 svcs | grep nrpe

If status is not 'online', type 'svcs -x' and look at specified log file.

NOTE: error message shows up:

clon10:/root> svcadm enable application/management/nagios/nrpe
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.

but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'

NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:

 /etc/user_attr:
    'clasrun::::' instead of 'nagios::::type=role;' (if 'type=role;' remains, clasrun could not login)
 /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml:
    method_credential user='clasrun' group='onliners'
    $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d
 /apps/nrpe2.6/etc/nrpe_clasrun.cfg:
    nrpe_user=clasrun
    nrpe_group=onliners

NOTE: not sure if we need line in /etc/user_attr' for clasrun at all ..

NOTE: 'openssl' required by 'nrpe' was installed into /usr/local/ssl area, so that directory must be mounted.