There is nothing perfect about auto-gpt but like chatgpt it’s another tool that if used creatively can be used to achieve amazing things I wouldn’t have even considered doing 2 months ago.
If you want to read about my odd path of discovery in building this script, see the short story below, otherwise just enjoy the script.
Ramon Gomez on LinkedIn had the idea of using auto-gpt to find threat actor in the new as they relate to the United States Energy sector.
His attempts at using auto-gpt failed but I gave it a try anyways.
Sure enough it failed for me too, but I carefully read the output from auto-gpt and I could see what it was trying to do:
- download the enterprise-attack.json file from Mitre – this is a full ‘database’ of all things Mitre ATT&CK and it includes information about threat actors and some of the industries that they’re associated with.
- create an run a python script that reads enterprise-attack.json and extract the threat actors associated with the US energy sector. – this script had syntax errors so it was never going to run, but it tried…
- find a list of reliable new web sites that are related to cyber news. – this worked so I had a list of possible sites, but they weren’t perfect..
- create another python script that scraped the news sites for information associated with the threat actors – again it tried and failed.
Although auto-gpt tried and failed, it had an excellent approach to the problem.
And using ‘regular’ chatgpt I was able to ask the same sorts of questions and get much better answers.
Finally, as a result, chatgpt (and I) came up with the script you see below.
Note that this script has flaws, like some of the urls aren’t useful (but some are), but it does in fact work! enjoy.
import requests
from bs4 import BeautifulSoup
# Define a dictionary of threat actor names and their aliases
threat_actors = {
'APT1': ['Comment Crew'],
'Lazarus': ['Lazarus'],
'APT29': ['Cozy Bear'],
'APT32': ['OceanLotus Group']
}
# Define the URLs for the news resources
# Loop through the URLs and extract relevant web page URLs
# Define the URLs of the news resources
urls = [
'https://www.fireeye.com/blog/threat-research.html',
'https://www.kaspersky.com/blog/tag/apt',
'https://www.ncsc.gov.uk/news/reports',
'https://thehackernews.com/search/label/apt',
'https://www.recordedfuture.com/apt-group-threat-intelligence/',
'https://www.anomali.com/blog/threat-research'
]
webpage_urls = []
for url in urls:
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
href = link.get('href')
for actor in threat_actors:
if actor in link.text or any(alias in link.text for alias in threat_actors[actor]):
webpage_urls.append(href)
# Print the extracted webpage URLs
for url in webpage_urls:
print(url)
