Monitoring MSSQL server environments can be a tedious and
demanding task especially when you are dealing with various versions across multiple
servers. It is impossible to dedicate resources to manually monitor each
instance however MSSQL does include out of box functionality to assist called
MSSQL Alerts which is also available even in older versions of MSSQL such as
MSSQL server 2000. MSSQL alerts can significantly assist in letting you know
when a failure is imminent or has just occurred. That is just the start though,
various other event driven functionality that you would like to implement can
also be achieved by the different responses available in MSSQL Alerts. Alerts
can also be manipulated to run when and how you would like them to run and furthermore
the data that they provide can actually be collected and used for reporting.
MSSQL alerts are implemented via the MSSQL agent hence the
agent has to be running in order for your alerts to fire. There’s a vast number
or alerts available and these are listed under three main categories which may
drill down even further.
- SQL Server events - MSSQL has a set of message/error codes which are assigned a severity level, So multiple error numbers may have the same severity. These alerts fire for a specific error number or can fire on a certain severity level which can encompass multiple error codes. For a list of message/error codes, check out the sys.messages system table.
- Performance conditions – Performance condition alerts assist in maintaining a smooth running MSSQL server and can fire when the threshold of a certain performance counter has been breached. So for example if you server CPU usage exceeds 80% an alert can be set to fire and notify you.
- Windows Management Instrumentation (WMI) events – These alerts are set to fire with the use of the WMI monitor for a specific MSSQL server event. WMI allows for local and remote monitoring of servers and contain many MSSQL server events. WMI queries can be used to define alerts. An example of what you can alert on with WMI Alerts would be auditing failed logins.
Sql Alerts can be located under the MSSQL Agent in SSMS.
Simply expand your SQL Agent and right click your alerts folder to create a new
alert. It is possible to restrict your alert to a specific database and further
restrict the alert to only fire if the message contains a certain phrase.
Whilst setting up your alert you have the option of setting
the action to be taken when the alert is triggered. This can be set to notify
an operator via email, pager or net send. Make sure DB Mail is set up and
ticked in your MSSQL agent properties in order to receive mails. There is also
an option to run a job, this opens up a world of possibilities and can be very
useful as this can be used to resolve a problem without any human interaction
like above. The jobs that are run can also be used to capture additional
information that can be used to discover why the problem occurred.
Notifications and job execution may both be ticked.
Lastly Comments can also be included into your alert. These
comments will be included in the delivery method that you chose.
Manipulation
Alerts can be enabled and disabled via Sql queries. This can
be very useful if you would like to only have an alert firing during a certain period
of time or if a certain business rule is met. For example if you wanted your
replication latency alert to fire only during business hours or when a job that
inserts large amounts of data into the system is running, one could make use of
a job to enable the Alert on a specific schedule or add steps to the beginning
and end of a job that you would like to monitor which then contain the sp_update_alert system stored procedure eg
EXEC msdb.dbo.sp_update_alert @name=N'Replication Warning: Transactional replication latency
(Threshold: latency)',
@enabled=1--<Bit to
enable (1) or disable (0)
It is also possible to intentionally create an error which
will trigger an alert by making use of the Raiserror() function. This can be
useful if you would like to trigger an alert within scripts or jobs. Alerts can
thus be further manipulated to include business rules and scheduling on an
error level with dynamic comments. Here is an example of a custom Replication latency
alert script which I wrote to get around a problem of a replication latency
error not being raised in Certain Versions of MSSQL by using the Raiserror()
function. Once the error is created the alert picks it up and acts accordingly.
declare @Latency varchar(8)
set @Latency = '00:00:10'
declare
@LatencySec int
set
@LatencySec = (datepart(hour,@Latency)/3600) + (datepart(minute,@Latency)/60) + (datepart(second,@Latency))
declare @TracerID int
exec sp_posttracertoken @publication
= 'TestPublication',@tracer_token_id =
@TracerID OUTPUT;
WAITFOR DELAY @Latency
declare
@TokenTable table(
distributor_latency int,
subscriber varchar(100),
subscriber_db varchar(100),
subscriber_latency int,
overall_latency int)
Insert into @TokenTable
exec sp_helptracertokenhistory 'TestPublication',@TracerID
if (select isnull(overall_latency,@LatencySec) from @TokenTable) >= @LatencySec
begin
--Further rules can
be added before error is raised
raiserror (14161,10,1, '0', @Latency, 'Custom Comment?')
end
select *,@TracerID from
@TokenTable
exec sp_deletetracertokenhistory
'TestPublication',@TracerID
exec sp_helptracertokens 'TestPublication'
Reporting
A great side effect of all this data being relayed to the
operator is that one can use it for more than just problem solving but also as
a base to report off and discover trends in your database server’s health. We
can achieve this by different methods, Here are a few.
- Errors are recorded in the MSSQL error log which can be read using the command “xp_readerrorlog 0,1,'Error Number'” however lower severity errors have to be enabled to be written to the error log. The sp_altermessage can be used to alter message attributes in order to always write it to the error log. Results from each day can then be written to a table and reported off.
- Alerts could call a job or a step can be added into the current job that the alert calls to write an entry to a history table which is being reported off.
- Alert information is stored in the [msdb].[dbo].[sysalerts] which can also be used for reporting however the table only contains a single line entry per alert. In order to create a historical data table we have to poll this table periodically. Eg,
Table to store polled data.
CREATE TABLE
[dbo].[AlertHistory](
[ID] [int] IDENTITY(1,1) NOT NULL,
[AlertID] [int] NULL,
[name] [sysname] NULL,
[event_source]
[nvarchar](100)
NULL,
[event_category_id]
[int] NULL,
[event_id] [int] NULL,
[message_id] [int] NULL,
[severity] [int] NULL,
[enabled] [tinyint] NULL,
[delay_between_responses]
[int] NULL,
[last_occurrence_date]
[int] NULL,
[last_occurrence_time]
[int] NULL,
[last_response_date]
[int] NULL,
[last_response_time]
[int] NULL,
[notification_message]
[nvarchar](512)
NULL,
[include_event_description]
[tinyint] NULL,
[database_name]
[nvarchar](512)
NULL,
[event_description_keyword]
[nvarchar](100)
NULL,
[occurrence_count]
[int] NULL,
[count_reset_date]
[int] NULL,
[count_reset_time]
[int] NULL,
[job_id]
[uniqueidentifier] NULL,
[has_notification]
[int] NULL,
[flags] [int] NULL,
[performance_condition]
[nvarchar](512)
NULL,
[category_id] [int] NULL,
CONSTRAINT [PK_AlertHistory] PRIMARY
KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED
INDEX [IX_AlertIDLastODLastOT] ON [dbo].[AlertHistory]
(
[AlertID] ASC,
[last_occurrence_date]
ASC,
[last_occurrence_time]
ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS
= ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Query to be placed in a job and
run periodically depending on the data precision required
insert into
AlertHistory
select * from sysalerts sa (nolock)
where not
exists (
select *
from AlertHistory ah (nolock)
where ah.AlertID
= sa.id
and ah.last_occurrence_date
= sa.last_occurrence_date
and ah.last_occurrence_time
= sa.last_occurrence_time
)
Once you have collected an adequate amount of data you can
plug it into SSRS, Excel or whatever reporting mechanism you may fancy and
create reports to analyse the data. Eg
Sql server alerts are a great way to let you know of
problems when they occur so that you can get onto fixing and solving them
before they turn into a disaster and with the added benefits of job runs as
responses, alert manipulation and reporting/trend analysis one can definitely
stay one step ahead…
Remember “Intellectuals solve problems, geniuses prevent them” –
Albert Einstein
No comments:
Post a Comment