heart(3erl) Erlang Module Definition heart(3erl)
NAME
heart - Heartbeat monitoring of an Erlang runtime system.
DESCRIPTION
This modules contains the interface to the heart process. heart sends
periodic heartbeats to an external port program, which is also named
heart. The purpose of the heart port program is to check that the Er-
lang runtime system it is supervising is still running. If the port
program has not received any heartbeats within HEART_BEAT_TIMEOUT sec-
onds (defaults to 60 seconds), the system can be rebooted.
An Erlang runtime system to be monitored by a heart program is to be
started with command-line flag -heart (see also erl(1)). The heart
process is then started automatically:
% erl -heart ...
If the system is to be rebooted because of missing heartbeats, or a
terminated Erlang runtime system, environment variable HEART_COMMAND
must be set before the system is started. If this variable is not set,
a warning text is printed but the system does not reboot.
To reboot on Windows, HEART_COMMAND can be set to heart -shutdown (in-
cluded in the Erlang delivery) or to any other suitable program that
can activate a reboot.
The environment variable HEART_BEAT_TIMEOUT can be used to configure
the heart time-outs; it can be set in the operating system shell before
Erlang is started or be specified at the command line:
% erl -heart -env HEART_BEAT_TIMEOUT 30 ...
The value (in seconds) must be in the range 10 < X <= 65535.
When running on OSs lacking support for monotonic time, heart is sus-
ceptible to system clock adjustments of more than HEART_BEAT_TIMEOUT
seconds. When this happens, heart times out and tries to reboot the
system. This can occur, for example, if the system clock is adjusted
automatically by use of the Network Time Protocol (NTP).
If a crash occurs, an erl_crash.dump is not written unless environment
variable ERL_CRASH_DUMP_SECONDS is set:
% erl -heart -env ERL_CRASH_DUMP_SECONDS 10 ...
If a regular core dump is wanted, let heart know by setting the kill
signal to abort using environment variable HEART_KILL_SIGNAL=SIGABRT.
If unset, or not set to SIGABRT, the default behavior is a kill signal
using SIGKILL:
% erl -heart -env HEART_KILL_SIGNAL SIGABRT ...
If heart should not kill the Erlang runtime system, this can be indi-
cated using the environment variable HEART_NO_KILL=TRUE. This can be
useful if the command executed by heart takes care of this, for example
as part of a specific cleanup sequence. If unset, or not set to TRUE,
the default behaviour will be to kill as described above.
% erl -heart -env HEART_NO_KILL 1 ...
Furthermore, ERL_CRASH_DUMP_SECONDS has the following behavior on
heart:
ERL_CRASH_DUMP_SECONDS=0:
Suppresses the writing of a crash dump file entirely, thus reboot-
ing the runtime system immediately. This is the same as not setting
the environment variable.
ERL_CRASH_DUMP_SECONDS=-1:
Setting the environment variable to a negative value does not re-
boot the runtime system until the crash dump file is completly
written.
ERL_CRASH_DUMP_SECONDS=S:
heart waits for S seconds to let the crash dump file be written.
After S seconds, heart reboots the runtime system, whether the
crash dump file is written or not.
In the following descriptions, all functions fail with reason badarg if
heart is not started.
DATA TYPES
heart_option() = check_schedulers
EXPORTS
set_cmd(Cmd) -> ok | {error, {bad_cmd, Cmd}}
Types:
Cmd = string()
Sets a temporary reboot command. This command is used if a
HEART_COMMAND other than the one specified with the environment
variable is to be used to reboot the system. The new Erlang run-
time system uses (if it misbehaves) environment variable
HEART_COMMAND to reboot.
Limitations: Command string Cmd is sent to the heart program as
an ISO Latin-1 or UTF-8 encoded binary, depending on the file-
name encoding mode of the emulator (see file:native_name_encod-
ing/0). The size of the encoded binary must be less than 2047
bytes.
clear_cmd() -> ok
Clears the temporary boot command. If the system terminates, the
normal HEART_COMMAND is used to reboot.
get_cmd() -> {ok, Cmd}
Types:
Cmd = string()
Gets the temporary reboot command. If the command is cleared,
the empty string is returned.
set_callback(Module, Function) ->
ok | {error, {bad_callback, {Module, Function}}}
Types:
Module = Function = atom()
This validation callback will be executed before any heartbeat
is sent to the port program. For the validation to succeed it
needs to return with the value ok.
An exception within the callback will be treated as a validation
failure.
The callback will be removed if the system reboots.
clear_callback() -> ok
Removes the validation callback call before heartbeats.
get_callback() -> {ok, {Module, Function}} | none
Types:
Module = Function = atom()
Get the validation callback. If the callback is cleared, none
will be returned.
set_options(Options) -> ok | {error, {bad_options, Options}}
Types:
Options = [heart_option()]
Valid options set_options are:
check_schedulers:
If enabled, a signal will be sent to each scheduler to check
its responsiveness. The system check occurs before any
heartbeat sent to the port program. If any scheduler is not
responsive enough the heart program will not receive its
heartbeat and thus eventually terminate the node.
Returns with the value ok if the options are valid.
get_options() -> {ok, Options} | none
Types:
Options = [atom()]
Returns {ok, Options} where Options is a list of current options
enabled for heart. If the callback is cleared, none will be re-
turned.
Ericsson AB kernel 7.0 heart(3erl)