Nrpe: Difference between revisions
No edit summary |
No edit summary |
||
Line 251: | Line 251: | ||
If status is not 'online', type 'svcs -x' and look at specified log file. | If status is not 'online', type 'svcs -x' and look at specified log file. | ||
NOTE: error message shows up: | |||
clon10:/root> svcadm enable application/management/nagios/nrpe | |||
clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges. | |||
but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.' | |||
NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure: | NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure: |
Revision as of 19:43, 22 November 2007
We need two programs: 'nrpe' to be run as daemon or inet service, and 'check_nrpe' to be called by nagios. In general we need 'check_nrpe' on clonweb and 'nrpe' on all other machines.
Clonweb only (where Nagios is running): produce 'check_nrpe' and copy it to 'nagios' area:
cd /usr/local/src/nrpe-2.6 ./configure make all cp src/check_nrpe /www/nagios2.6/libexec chown nagios.nagios /www/nagios2.6/libexec/check_nrpe
Generic installation (all machines, including clonweb if it is not done yet):
cd /usr/local/src cp ../downloads/nrpe-2.6.tar.gz . gunzip nrpe-2.6.tar.gz tar xvf nrpe-2.6.tar rm nrpe-2.6.tar cd /usr/local/src/nrpe-2.6 ./configure --prefix=/apps/nrpe2.6 --enable-command-args
Compiling:
make all
On clonweb (where Nagios is running):
cp src/check_nrpe /www/nagios2.6/libexec
On any other machine which suppose to be remotely monitored by clonweb:
mkdir /apps/nrpe2.6 mkdir /apps/nrpe2.6/libexec mkdir /apps/nrpe2.6/etc mkdir /apps/nrpe2.6/bin cp sample-config/nrpe.cfg /apps/nrpe2.6/etc/ cp src/nrpe /apps/nrpe2.6/bin cp src/check_nrpe /apps/nrpe2.6/libexec/ cp init-script /etc/init.d/nrpe
emacs /etc/init.d/nrpe: # config: /apps/nrpe2.6/etc/nrpe.cfg NrpeBin=/apps/nrpe2.6/bin/nrpe NrpeCfg=/apps/nrpe2.6/etc/nrpe.cfg
emacs /apps/nrpe2.6/etc/nrpe.cfg:
dont_blame_nrpe=1 command[check_disk_test]=/apps/nrpe2.6/libexec/check_disk -w 20 -c 10 -p / command[check_disk]=/apps/nrpe2.6/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
on clonweb: copy plugins we want to execute remotely
(remote machines will not see /www, only /apps !!!) cp /www/nagios2.6/libexec/check_disk /apps/nrpe2.6/libexec/
To test 'check_nrpe' run following commands from another machine where 'check_nrpe' installed.
Requests shell be sent to the machine where 'nrpe' is running (in our examples it is clon10):
on clonweb:
/www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk_test must return something like that: DISK OK - free space: / 1363 MB (16% inode=74%);| /=7054MB;8483;8493;0;8503
/www/nagios2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
on non-clonweb:
/apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk_test /apps/nrpe2.6/libexec/check_nrpe -H clon10 -c check_disk -a 20 20 /
Testing 'nrpe':
NOTE: user 'nagios' and group 'nagios' must exist to run 'nrpe' daemon !!!
To greate group 'nagios' on Solaris:
groupadd -g 9997 nagios
Modify 'nagios..' line in /etc/group file as follwoing:
nagios::9997:nagios
To create user 'nagios' on Solaris:
useradd -u 6246 -g nagios -d /home/nagios -c "Nagios" -s /bin/tcsh nagios (add flag '-m' if want to force home directory creation).
If complains, check passwd file by 'pwconv' (should not be blank lines in the end etc). To add group on Solaris add following line to '/etc/group' file: 'nagios::9997:nagios' (id maybe different of course).
On Linux use '/usr/bin/system-config-users' utility. If it complains about passwd and shadow inconsistency, run /usr/sbin/pwconv, it will update /etc/shadow using information from /etc/passwd.
NOTE: to add existing user 'xxx' to the group 'yyy' do following:
usermod -G yyy xxx
To start/stop/restart 'nrpe' daemon on Linux (as 'root'):
/etc/init.d/nrpe start /etc/init.d/nrpe stop /etc/init.d/nrpe restart
To start 'nrpe' daemon on Solaris (as 'root'):
/apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
ps -ef | grep nrpe nagios 3051 1 0 13:09:24 ? 0:00 ./nrpe -c /apps/nrpe2.6/etc/nrpe.cfg -d
more /var/run/nrpe.pid 3051
In both systems it is running under user 'nagios' in according to it's config file.
Normally we are running 'nrpe' not as a daemon but as part of 'inet' service. It was configured by following steps:
Add following line to /etc/services:
nrpe 5666/tcp # NRPE
Linux (xinet): create file /etc/xinetd.d/nrpe' with following contents:
# default: on # description: NRPE service nrpe { flags = REUSE socket_type = stream wait = no user = nagios group = nagios server = /apps/nrpe2.6/bin/nrpe server_args = -c /apps/nrpe2.6/etc/nrpe.cfg --inetd log_on_failure += USERID disable = no ### only_from = 129.57.167.42 }
Solaris (inet): add following line to /etc/inetd.conf:
nrpe stream tcp nowait nagios /apps/nrpe2.6/bin/nrpe /apps/nrpe2.6/bin/nrpe -c /apps/nrpe2.6/etc/nrpe.cfg --inetd
Linux: restart xinet service:
/etc/init.d/xinetd restart
Solaris:
inetconv -i /etc/inet/inetd.conf svcadm restart /network/inetd
Run mentioned above tests from another machine. Check for possible errors:
tail -100 /var/log/messages | grep nrpe
Solaris:
svcs | grep nrpe online 0:41:40 svc:/network/nrpe/tcp:default
Useful commands:
netstat -lp more /var/log/messages | grep nrpe Jan 7 22:28:29 clonpc2 xinetd[986]: execv( /apps/nrpe2.6/bin/ ) failed: Permission denied (errno = 13)
clon10:src> inetadm -l svc:/network/nrpe/tcp:default SCOPE NAME=VALUE name="nrpe" endpoint_type="stream" proto="tcp" isrpc=FALSE wait=FALSE exec="/usr/sbin/nrpe" user="nagios" default bind_addr="" default bind_fail_max=-1 default bind_fail_interval=-1 default max_con_rate=-1 default max_copies=-1 default con_rate_offline=-1 default failrate_cnt=40 default failrate_interval=60 default inherit_env=TRUE default tcp_trace=FALSE default tcp_wrappers=FALSE clon10:src>
IMPORTANT: the procedure described above did not work on Solaris 10, so 'nrpe' was started not from 'inetd' but as separate service using following procedure:
Create 'nrpe' manifest file /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml (if copying from here to the file, make sure the first line starts from the very first position, otherwise svccfg: couldn't parse document message will appeares on svccfg import command):
<?xml version="1.0"?> <!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1"> <service_bundle type='manifest' name='nagios-nrpe'> <service name='application/management/nagios/nrpe' version='1' type='service'> <create_default_instance enabled='false' /> <single_instance /> <dependency name='multi-user' grouping='require_all' restart_on='none' type='service'> <service_fmri value='svc:/milestone/multi-user' /> </dependency> <method_context> <method_credential user='nagios' group='nagios'/> <method_environment> <envvar name='BASEDIR' value='/apps/nrpe2.6'/> <envvar name='LD_LIBRARY_PATH' value='/lib:/usr/local/lib:/usr/sfw/lib'/> </method_environment> </method_context> <exec_method type='method' name='start' exec='$BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe.cfg -d' timeout_seconds='60'/> <exec_method type='method' name='stop' exec=':kill' timeout_seconds='60'/> <property_group name='general' type='framework'> <propval name='enabled' type='boolean' value='false'/> <propval name='action_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/> <propval name='value_authorization' type='astring' value='solaris.smf.manage.nagios-nrpe'/> </property_group> <property_group name='startd' type='framework'> <propval name='ignore_error' type='astring' value='core,signal' /> </property_group> <stability value='Unstable' /> </service> </service_bundle>
Add following line to /etc/user_attr:
nagios::::type=role;auths=solaris.smf.manage.nagios-nrpe,solaris.smf.manage.nagios;profile=Basic Solaris User
Add following line to /etc/security/auth_attr:
solaris.smf.manage.nagios-nrpe:::Manage Nagios NRPE Service States::
(two last actions allows the nagios user to start and stop services).
Import service configuration and enable service:
svccfg import /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml svcadm enable application/management/nagios/nrpe
Check if it is running:
svcs | grep nrpe
If status is not 'online', type 'svcs -x' and look at specified log file.
NOTE: error message shows up:
clon10:/root> svcadm enable application/management/nagios/nrpe clon10:/root> Nov 22 19:35:08 clon10 nrpe[21826]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
but 'nrpe' seems running fine. Comment in 'nrpe.cfg' says about pid file: 'The file is only written if the NRPE daemon is started by the root user and is running in standalone mode.'
NOTE: if want to run 'nrpe' as 'clasrun' (it is done on 'clon06' for example to be able to execute 'check_quota' for user 'clasrun') following corrections shell be applied to the procedure:
/etc/user_attr: 'clasrun::::' instead of 'nagios::::' /var/svc/manifest/application/management/nagios/nrpe/nagios-nrpe.xml: method_credential user='clasrun' group='onliners' $BASEDIR/bin/nrpe -c $BASEDIR/etc/nrpe_clasrun.cfg -d /apps/nrpe2.6/etc/nrpe_clasrun.cfg: nrpe_user=clasrun nrpe_group=onliners