Azkaban Monitoring
Overview
Azkaban is an open-source workflow engine for Hadoop eco system. It is a batch job scheduler allowing developers to control job execution inside Java and especially Hadoop projects.
Key Components
- Relational Database (MySQL): Azkaban uses MySQL to store much of its state. Both the AzkabanWebServer and the AzkabanExecutorServer access the DB.
- AzkabanWebServer: The AzkabanWebServer is the main manager to all of Azkaban. It handles project management, authentication, scheduler, and monitoring of executions. It also serves as the web user interface.
- AzkabanExecutorServer: Azkaban Executor Server handles the actual execution of the workflow and jobs. Previous versions of Azkaban had both the AzkabanWebServer and the AzkabanExecutorServer features in a single server. The Executor has since been separated into its own server.
Features
- Compatible with any version of Hadoop
- Easy to use web UI
- Simple web and http workflow uploads
- Project workspaces
- Scheduling of workflows
- Modular and pluginable
- Authentication and Authorization
- Tracking of user actions
- Email alerts on failure and successes
- SLA alerting and auto killing
- Retrying of failed jobs
Monitoring Capabilities
Azkaban Executor Job Stats
Metric | Metric Description |
---|---|
Azkaban Running Jobs | Number of Running Jobs. |
Azkaban Executed Jobs/Sec | Number of executed jobs per second. |
Azkaban Failed Jobs/Sec | Number of failed jobs per second. |
Azkaban Succeeded Jobs/Sec | Number of succeeded jobs per second. |
Azkaban Container Stats
Metric | Metric Description |
---|---|
Azkaban Average Connection’s Duration (Sec) | Average duration of open connections in seconds. |
Azkaban Maximum Connection’s Duration (Sec) | Maximum duration of open connection in seconds. |
Azkaban Minimum Connection’s Duration (Sec) | Minimum duration of connections in seconds. |
Azkaban Total Connection’s Duration (Sec) | Total duration of connections in seconds. |
Azkaban Average Requests/Connection | Average number of requests per connection. |
Azkaban Maximum Requests/Connection | Maximum number of requests per connection. |
Azkaban Minimum Requests/Connection | Minimum number of requests per connection. |
Azkaban Accepted Connections/Sec | Number of connections accepted per second by the server. |
Azkaban Open Connections | Number of connections currently opened. |
Azkaban Maximum Open Connections | Maximum number of connections opened. |
Azkaban Minimum Open Connections | Minimum number of opened connections. |
Azkaban Threads | Number of threads. |
Azkaban Idle Threads | Number of Idle threads. |
Azkaban Flow Stats
Metric | Metric Description |
---|---|
Azkaban Flow Elapsed Time (Sec) | Total time taken by this flow to execute in seconds |
Azkaban Flow Status | Status of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED |
Azkaban Sub Flow Stats
Metric | Metric Description |
---|---|
Azkaban Sub Flow Elapsed Time (Sec) |
Total time taken by this flow to execute in seconds |
Azkaban Sub Flow Status |
Status of flow. Status is 1 = KILLED, 2 = FAILED, 3 = RUNNING and 4 = SUCCEEDED |
Azkaban Sub Flow Map Output Records |
Number of map output records in this sub flow |
Azkaban Flow Runner Manager Stats
Metric | Metric Description |
---|---|
Azkaban Queued Flows | Number of Queued flows. |
Azkaban Maximum Queued Flows | Maximum number of queued flows. |
Azkaban Running Flows | Number of running flows. |
Azkaban Maximum Running Flows | Maximum number of running flows. |
Azkaban Total Executed Flows/Sec | Total number of executed flows per second. |
Azkaban Executor Job Callback Stats
Metric | Metric Description |
---|---|
Azkaban Job Callbacks/Sec | Number of job callbacks per second. |
Azkaban Successful Job Callbacks/Sec | Number of Successful job callbacks per second. |
Azkaban Failed Job Callbacks/Sec | Number of Failed job callbacks per second. |
Azkaban Active Job Callbacks | Number of active job callbacks. |
Azkaban Web Server Executor Manager Stats
Metric | Metric Description |
---|---|
Azkaban Last Successful Executor Info Refresh (Sec) | Last successful executor info refresh time-stamp in seconds. |
Azkaban Thread Active | Status of executor thread.Status is 1=True, 0=False. |
Azkaban Running Flows | Number of running flows. |
Azkaban Last Thread Check Time (Sec) | Check time of last thread in second. |
Azkaban Queue Processor Active | Status of queued processor.Status is 1=True, 0=False. |
Azkaban Web Trigger Manager Stats
Metric | Metric Description |
---|---|
Azkaban Last Runner Thread Check Time (Sec) | Check Time of Last Runner Thread in seconds. |
Azkaban Runner Thread Active | Status of Runner thread. Status is 1=True, 0=False. |
Azkaban Scanner Idle Time (Sec) | Idle time of Scanner in seconds. |
Azkaban Triggers | Number of triggers. |
Azkaban Coordinator Stats
Metric | Metric Description |
---|---|
75thPercentile Service Response Time (ms) | 75th percentile of time taken for service response in millisecond. |
95thPercentile Service Response Time (ms) | 95th percentile of time taken for service response in millisecond. |
98thPercentile Service Response Time (ms) | 98th percentile of time taken for service response in millisecond. |
99thPercentile Service Response Time (ms) | 99th percentile of time taken for service response in millisecond. |
999thPercentile Service Response Time (ms) | 999th percentile of time taken for service response in millisecond. |
Mean Service Response Time (ms) | Mean on response time in milliseconds. |
50thPercentile Service Response Time (ms) | 50th percentile of time taken for service response in millisecond. |
Minimum Response Time (ms) | Minimum time in millisecond for response in server. |
Maximum Response Time (ms) | Maximum time in millisecond for response in server. |
Request/Sec | Number of request per second. |