Microsoft AMA Troubleshooter script

I recently had an issue with a new linux syslog server that was using Arc and had the AMA service enabled by a data collection rule in Sentinel.

I could see the Sentinel DCR (data collection rule) had been pushed out but the AMA agent wasn’t forwarding logs back up to Sentinel.

I suspected traffic was getting blocked but I wasn’t sure how to validate it.

This script will extract the Sentinel Workspace ID and perform a network connection test that simulates the connection from AMA to the data collection point or ODS(operational data store).

If the script fails, it means you need to talk to your firewall admin to open a connection to *.ods.opinsights.azure.com.

If you’re good at reading curl, you don’t need the script, just curl to
https://<workspaceid&gt;.ods.opinsights.azure.com

The script also checks the the AMA service is running and that you’re not out of disk space – 2 other common issues.

Have fun!

#!/bin/bash

# AMA Agent Validation Script
# Checks common issues with Azure Monitor Agent on Linux

set -e

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

echo -e "${BLUE}=== Azure Monitor Agent Validation Script ===${NC}"
echo "Started at: $(date)"
echo

# Function to check endpoint connectivity
check_endpoint() {
    local url=$1
    local description=$2
    echo -n "Testing $description... "
    
    if curl -s --connect-timeout 10 --max-time 30 "$url" >/dev/null 2>&1; then
        echo -e "${GREEN}OK${NC}"
        return 0
    else
        echo -e "${RED}FAILED${NC}"
        return 1
    fi
}

# Function to check SSL handshake specifically
check_ssl_handshake() {
    local host=$1
    local description=$2
    echo -n "Testing SSL handshake for $description... "
    
    if timeout 10 openssl s_client -connect "$host:443" -servername "$host" </dev/null >/dev/null 2>&1; then
        echo -e "${GREEN}OK${NC}"
        return 0
    else
        echo -e "${RED}FAILED${NC}"
        return 1
    fi
}

# 1. Check AMA service status
echo -e "${BLUE}1. AMA Service Status${NC}"
if systemctl is-active --quiet azuremonitoragent; then
    echo -e "Service status: ${GREEN}RUNNING${NC}"
    echo "Service uptime: $(systemctl show azuremonitoragent --property=ActiveEnterTimestamp --value)"
else
    echo -e "Service status: ${RED}NOT RUNNING${NC}"
    echo "Try: systemctl status azuremonitoragent"
fi
echo

# 2. Check disk space
echo -e "${BLUE}2. Disk Space Check${NC}"
AMA_PATH="/var/opt/microsoft/azuremonitoragent"
if [ -d "$AMA_PATH" ]; then
    DISK_USAGE=$(df -h "$AMA_PATH" | awk 'NR==2 {print $5}' | sed 's/%//')
    if [ "$DISK_USAGE" -gt 90 ]; then
        echo -e "Disk usage: ${RED}${DISK_USAGE}% (CRITICAL)${NC}"
        echo "Free space needed in $(df -h "$AMA_PATH" | awk 'NR==2 {print $1}')"
        du -sh "$AMA_PATH/events"/* 2>/dev/null | sort -hr | head -5
    elif [ "$DISK_USAGE" -gt 80 ]; then
        echo -e "Disk usage: ${YELLOW}${DISK_USAGE}% (WARNING)${NC}"
    else
        echo -e "Disk usage: ${GREEN}${DISK_USAGE}% (OK)${NC}"
    fi
else
    echo -e "${RED}AMA directory not found${NC}"
fi
echo

# 3. Extract endpoints from config
echo -e "${BLUE}3. Extracting Configured Endpoints${NC}"
CONFIG_DIR="/etc/opt/microsoft/azuremonitoragent/config-cache"
WORKSPACE_ID=""
ENDPOINTS=()

if [ -d "$CONFIG_DIR" ]; then
    # Extract workspace ID and endpoints
    WORKSPACE_ID=$(grep -r "ods.opinsights.azure.com" "$CONFIG_DIR" 2>/dev/null | head -1 | grep -o '[a-f0-9-]\{36\}\.ods\.opinsights\.azure\.com' | cut -d'.' -f1 || echo "")
    
    if [ -n "$WORKSPACE_ID" ]; then
        echo "Workspace ID: $WORKSPACE_ID"
        ENDPOINTS+=("https://${WORKSPACE_ID}.ods.opinsights.azure.com")
    fi
    
    # Add standard endpoints
    ENDPOINTS+=(
        "https://global.handler.control.monitor.azure.com"
        "https://centralus.monitoring.azure.com"
        "https://management.azure.com"
        "https://login.microsoftonline.com"
        "https://ods.opinsights.azure.com"
    )
else
    echo -e "${RED}Config directory not found${NC}"
    # Use default endpoints
    ENDPOINTS=(
        "https://global.handler.control.monitor.azure.com"
        "https://centralus.monitoring.azure.com"
        "https://management.azure.com"
        "https://login.microsoftonline.com"
        "https://ods.opinsights.azure.com"
    )
fi
echo

# 4. Test endpoint connectivity
echo -e "${BLUE}4. Network Connectivity Tests${NC}"
failed_endpoints=0

for endpoint in "${ENDPOINTS[@]}"; do
    if ! check_endpoint "$endpoint" "$endpoint"; then
        ((failed_endpoints++))
    fi
done
echo

# 5. Test SSL handshakes for critical endpoints
echo -e "${BLUE}5. SSL Handshake Tests${NC}"
ssl_failed=0

if [ -n "$WORKSPACE_ID" ]; then
    if ! check_ssl_handshake "${WORKSPACE_ID}.ods.opinsights.azure.com" "Workspace ODS"; then
        ((ssl_failed++))
    fi
fi

if ! check_ssl_handshake "global.handler.control.monitor.azure.com" "Control Plane"; then
    ((ssl_failed++))
fi
echo

# 6. Check for recent AMA errors
echo -e "${BLUE}6. Recent AMA Errors (last 1 hour)${NC}"
if command -v journalctl >/dev/null; then
    error_count=$(journalctl -u azuremonitoragent --since "1 hour ago" | grep -i "error\|failed\|ssl handshake" -c || echo "0")
    if [ "$error_count" -gt 0 ]; then
        echo -e "Recent errors: ${RED}$error_count${NC}"
        echo "Recent SSL handshake failures:"
        journalctl -u azuremonitoragent --since "1 hour ago" | grep -i "ssl handshake" | tail -3
        echo "Recent disk space errors:"
        journalctl -u azuremonitoragent --since "1 hour ago" | grep -i "no space left" | tail -3
    else
        echo -e "Recent errors: ${GREEN}0${NC}"
    fi
else
    echo "journalctl not available"
fi
echo

# 7. Check listening ports
echo -e "${BLUE}7. AMA Listening Ports${NC}"
if ss -tlnp | grep -q ":28330"; then
    echo -e "Port 28330 (syslog): ${GREEN}LISTENING${NC}"
else
    echo -e "Port 28330 (syslog): ${RED}NOT LISTENING${NC}"
fi
echo

# 8. System time check (critical for SSL)
echo -e "${BLUE}8. System Time Check${NC}"
current_time=$(date +%s)
ntp_time=$(curl -s "http://worldtimeapi.org/api/timezone/UTC" | grep -o '"unixtime":[0-9]*' | cut -d':' -f2 2>/dev/null || echo "$current_time")
time_diff=$((current_time - ntp_time))
time_diff=${time_diff#-}  # absolute value

if [ "$time_diff" -gt 300 ]; then
    echo -e "Time sync: ${RED}OUT OF SYNC (${time_diff}s difference)${NC}"
    echo "Current: $(date)"
    echo "Consider: ntpdate or chrony sync"
else
    echo -e "Time sync: ${GREEN}OK${NC}"
fi
echo

# Summary
echo -e "${BLUE}=== SUMMARY ===${NC}"
if [ "$failed_endpoints" -eq 0 ] && [ "$ssl_failed" -eq 0 ]; then
    echo -e "Overall status: ${GREEN}HEALTHY${NC}"
    echo "All endpoints accessible and SSL working correctly"
elif [ "$ssl_failed" -gt 0 ]; then
    echo -e "Overall status: ${RED}SSL ISSUES${NC}"
    echo "SSL handshake failures detected - check firewall/proxy settings"
    echo "Contact network team to whitelist Azure Monitor endpoints"
elif [ "$failed_endpoints" -gt 0 ]; then
    echo -e "Overall status: ${YELLOW}CONNECTIVITY ISSUES${NC}"
    echo "Some endpoints unreachable - check network connectivity"
else
    echo -e "Overall status: ${YELLOW}CHECK REQUIRED${NC}"
fi

echo
echo "Log locations:"
echo "  - AMA logs: journalctl -u azuremonitoragent"
echo "  - Config: /etc/opt/microsoft/azuremonitoragent/config-cache/"
echo "  - Events: /var/opt/microsoft/azuremonitoragent/events/"
echo
echo "Common fixes:"
echo "  - Disk space: Clean /var/opt/microsoft/azuremonitoragent/events/"
echo "  - SSL issues: Whitelist *.ods.opinsights.azure.com in firewall"
echo "  - Service: systemctl restart azuremonitoragent"

Adventures In Cybersecurity – New Front Page

I used openai to help me build a new front page for my cyber defense tutorials.

If anyone needs help learning any topics in cyber defense just ask!

https://spiderlabs.github.io/zpminternational/

https://www.linkedin.com/in/davidbroggytrustwave/

https://simple-security.ca/

https://mvp.microsoft.com/en-us/PublicProfile/5004963?fullName=David%20%20Broggy

#mvp #mvpbuzz

Adventures in Cybersecurity: The Defender Series. Now Live!

I’ve started a new series of posts on cyber defense architecture, implementation and workflows.

It will also include getting-started labs on over 30 cyber defense topics!

Check it out here and find out about the backstory of ZPM International and their adversary APT42a!

https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/welcome-to-adventures-in-cybersecurity-the-defender-series/

Adventures in Cybersecurity: The Defender Series. Parts 1 to 14

I’m a bit behind on my updates, so if you haven’t seen, Trustwave has posted the first 5 of the posts from my ‘Defender Series’.

Cloud Architecture Frameworks and Benchmarks

Cost Management Tips for Cyber Admins

Cybersecurity Documentation Essentials

Evaluating Your Security Posture: Security Assessment Basics

Zero Trust Essentials

CSPM, CIEM, CWPP Oh My!

The Secret Cipher: Modern Data Loss Prevention Solutions

The Invisible Battleground: Essentials of EASM

EDR – The Multi-Tool of Security Defenses

Protecting Zion: InfoSec Encryption Concepts and Tips

Guardians of the Gateway: Identity and Access Management Best Practices

How to Create the Asset Inventory You Probably Don’t Have

Important Security Defenses to Help Your CISO Sleep at Night

Cyber Exterminators: Monitoring the Shop Floor with OT Security

Enjoy!

Highlights of Microsoft Build 2023!

Although much of Microsoft Build is centred around helping developers, there’s much for you an me as well, like avatars in Teams! Well, read on..

Microsoft MVPs get early access to the full list of topics from Microsoft Build so we can review all of the topics and then share back to the community with some fresh perspectives right after the public release date has passed.

So here’s my choice of interesting topics, hot off the press, in case you decide to check them out at https://build.microsoft.com/en-US/home or https://news.microsoft.com/source

For a full list of Microsoft Build topics go here.

Create A Powerful Threat Hunter for Microsoft 365 Defender

If you’re a user of Microsoft Sentinel, you’re likely familiar with it’s Threat Hunting feature which lets you run hundreds of KQL queries in a matter of seconds.

Unfortunately the Threat Hunting in the M365 Defender portal doesn’t have this feature, so you’re stuck running your hunting queries one at a time.

So I’ve created a proof of concept script that provides some threat hunting automation by taking the 400+ threat hunting queries in the Microsoft Sentinel Github repository and feeding them into the M365 Defender ThreatHunting api.

Requirements
  • a unix shell from which you can run python and git commands
  • python3, git command, and the python modules you see at the top of the scripts attached below
  • admin access to Azure to set up an app registration
  • operational use of the M365 Defender portal (https://security.microsoft.com)
Caveats

There are some known performance limitations to using Defender advanced threat hunting, so although the script may seem to be working, there could be timeouts happening in the background if Defender decides you’re using too many resources. This script doesn’t have any built in error checking so re-running the script or validating the queries within the Defender portal may be required.

The Setup Procedure:
  1. Create an app registration in Azure. Configuring an app registration is out of the scope of this article, but there are plenty of examples on how to do this. What’s important are the permissions you’ll need to allow:
    • ThreatHunting.Read.All
    • APIConnectors.ReadWrite.All
  2. From the app registration created in step #1, copy the tenantID, clientID and the secret key, and paste them into the script I’ve provided below.
  3. Create a directory named ‘queries’. This will be used to store the .yaml files from Github. These files contain the hunting queries.
  4. In the same directory, download the Github repository using this command:
  5. Now it’s time to create and run the first of 2 scripts. This first script should require no changes. Its purpose is to scan the Azure-Sentinel subdirectories (that you downloaded in step #4) for .yaml files and copy them all to the ‘queries’ directory.
    • Here’s the script, name it something like get_yaml.py and then run it like: “python3 get_yaml.py
# python script name: "get_yaml.py"
import os
import shutil

# Set the directory paths
source_dir = 'Azure-Sentinel/Hunting Queries/Microsoft 365 Defender/'
target_dir = 'queries'

# Recursively search the source directory for YAML files
for root, dirs, files in os.walk(source_dir):
    for file in files:
        if file.endswith('.yaml'):
            # Create the target directory if it doesn't already exist
            os.makedirs(target_dir, exist_ok=True)

            # Copy the file to the target directory
            source_file = os.path.join(root, file)
            target_file = os.path.join(target_dir, file)
            shutil.copy2(source_file, target_file)

            # Print a message to indicate that the file has been copied
            print(f'Copied {source_file} to {target_file}')

6. Edit this next script (below) and insert the tenantID, clientID and secret from the Azure app registration, as mentioned above in step #1. Name the script something like threathunt.py and run it with the python (or python3) command: “python3 threathunt.py”.

  • Note that the script below expects the .yaml files to be in a folder named “queries” (from when you ran the first script above).
  • After running the script below, look in the queries folder for any new files with a .json extension.
  • If the query ran successfully and generated results, a .json file will be created with the same filename as the matching .yaml file (eg. test.yaml > test.json)
import requests
import os
import yaml
import json
from azure.identity import ClientSecretCredential

# Replace the values below with your Azure AD tenant ID, client ID, and client secret
client_id = "YOUR APP REGISTRATION CLIENT ID"
client_secret = "YOUR APP REGISTRATION SECRET KEY"
tenant_id = "YOUR TENANT ID"
scope = "https://graph.microsoft.com/.default"

authority = f"https://login.microsoftonline.com/"

# Create a credential object using the client ID and client secret
credential = ClientSecretCredential(
    authority=authority,
    tenant_id=tenant_id,
    client_id=client_id,
    client_secret=client_secret
)

# Use the credential object to obtain an access token
access_token = credential.get_token("https://graph.microsoft.com/.default").token

# Print the access token
print(access_token)

# Define the headers for the API call
headers = {
    "Authorization": f"Bearer {access_token}",
    "Content-Type": "application/json",
    "Accept": "application/json"
}

# Define the API endpoint
url = "https://graph.microsoft.com/v1.0/security/microsoft.graph.security/runHuntingQuery"

# Define the path to the queries directory
directory_path = "queries"

# Loop through all YAML files in the queries directory
for file_name in os.listdir(directory_path):
    if file_name.endswith(".yaml") or file_name.endswith(".yml"):
        # Read the YAML file
        with open(os.path.join(directory_path, file_name)) as file:
            query_data = yaml.load(file, Loader=yaml.FullLoader)
            # Extract the query field from the YAML data
            query = query_data.get("query")
            if query:
                print(query)
                # Define the request body
                body = {
                    "query": query
                }
                # Send the API request
                response = requests.post(url, headers=headers, json=body)
                # Parse the response as JSON
                json_response = response.json()

                if json_response.get('results') and len(json_response['results']) > 0:
                  #print(json.dumps(json_response, indent=4))
                  output_file_name = os.path.splitext(file_name)[0] + ".json"
                    # Write the JSON response to the output file
                  with open(os.path.join(directory_path, output_file_name), "w") as output_file:
                      output_file.write(json.dumps(json_response, indent=4))

Once you have the results from your threat hunt you could:

  • Log into the M365 Defender portal and re-run the hunting queries that generated data. All of these YAML queries from github should be in the Threat Hunting > Queries bookmarks. If not you can manually copy/paste the query from inside the YAML file directly into Query Editor and play around with it.
  • Import the data into Excel or Power BI and play around with the results.
  • Create another python script that does something with the resulting .json files like aggregate the fields and look for commonalities/anomalies.

Happy Hunting!

Microsoft Cloud Licensing and Cost Summary

Here’s a simple high level guide to navigating Microsoft licensing from a security perspective.

This guide won’t go into the details of ‘why’ you need these licenses, and it won’t discuss the operational costs of implementing these security solutions.

Your main reference for Microsoft enterprise licensing is here!
(don’t worry if you’re not in the US, it will ask you to switch)

On the left hand side of this page is a pdf you should download and really get to know:

Budgeting for security in any organization can be a challenge. Let’s assume you’re taking the leap with Microsoft but you want to work it into your budget.

Consider E5 licenses for a subset of users and E3 for the rest.

This will allow you to optimize the use of the security related features for your critical infrastructure and then grow out to the larger cost of protecting everything.

P1 vs P2

Next look at the P1 vs P2 features. If you have that E5 license then you’re mostly set with the P2 features since they’re included with E5.
If you have E3 then consider adding all of the P2 features until it makes more sense cost-wise to switch to E5. The order in which you add the P2 features will depend on your security priorities.

Don’t shrug off the importance of many of these P2 features. Here are some links to look at for more information:

Additional cost considerations:
  • DDoS protection
  • WAF
  • SIEM – Microsoft Sentinel
  • EASM – External Attack Surface Management

See the link for the Pricing Calculator below to dig into the cost of these additional services.

References:

M365 Licensing (includes everything related to E3, E5, P1, P2, etc.)
Defender for Cloud Pricing
Pricing Calculator – select the ‘Security’ side menu and go from there

Performing a Security Audit on Logic Apps

As DevOps move toward no-code apps in the cloud, there becomes a need for security reviews and controls to prevent risky

This is nothing new, but the need for better security reviews is becoming clear as more people try to rush to get their apps done in the easiest way possible.

Here’s a simple approach to identifying security risks in your logic apps:

  1. Create an architecture diagram of your logic app. This can be a simplified version that just shows the high level logic.
  2. Break down the logic app by it’s components:
  • The individual logic app components – you likely won’t find too many security problems here.
  • all the parameters – don’t hardcode passwords into parameters!
  • connectors – Often the culprit of weak security in logic apps. Really understand what these connectors are communicating with. Don’t allow public access. Limit the roles/permissions.
  • app registrations – another culprit of weak security. If app registrations are needed for your logic apps, be sure permissions are set to their most restrictive settings. avoid read.all readwrite.all settings.
  • managed identities – if possible, use managed identities instead of user accounts for your connectors. Many logic apps don’t yet support managed identities, so those apps will require additional monitor and possibly frequent password/secret changes.

3. Use Resource Locks to prevent changes. If someone tries to turn off resource logs be sure it’s logged and alerted on.

4. Restrict user/admin access to your logic apps. Some apps can have really powerful permissions/access, so you don’t want users to ever have the ability to change logic apps unless they’ve been given specific short-term permissions to do so.

5. LOG EVERYTHING – wherever possible, enable logging within logic apps and connectors. Store logs in a Log Analytics Workspace. Use Azure Monitor alerts ore Microsoft Sentinel to monitor/report/alert on all activities.

6. Perform ‘attack simulations‘. Run your logic apps through test conditions which will trigger your alerts. Validate your alerts work as expected.

7. Build a ‘logic app security audit’ spreadsheet. Use this as a template for repeated audits for future logic app security testing. Use the above ideas as the initial framework for your spreadsheet.

I’m a Microsoft Recognized Community Hero!

I’m very excited to have been recognized by Microsoft as an Azure Community Hero!

Having worked with (not employed by..) Microsoft for several years as a Security Solutions Advisor/Developer, in 2021/22 I began taking on more of a volunteer role finding ways to give back to the community in any ways I could find.

After several months of contributing on the Microsoft Q&A sites I was very surprised to receive this badge(r) of recognition which they title ‘Microsoft Azure Community Hero’.

So I hope you don’t mind me sharing in my happiness for this honor.

Isn’t it cute?

https://jumpnet.enjinx.io/eth/asset/68c0000000000065/183?source=EnjinWallet-1.15.1