Cloud security vendor Skyhawk has unveiled a new benchmark for evaluating the ability of generative AI large language models (LLMs) to identify and score cybersecurity threats within cloud logs and telemetries. The free resource analyzes the performance of ChatGPT, Google BARD, Anthropic Claude, and other LLAMA2-based open LLMs to see how accurately they predict the maliciousness of an attack sequence, according to the firm.
Generative AI chatbots and LLMs can be a double-edged sword from a risk perspective, but with proper use, they can help improve an organization’s cybersecurity in key ways. Among these is their potential to identify and dissect potential security threats faster and in higher volumes than human security analysts.
LLM cyberthreat predictions rated in three ways
“The importance of swiftly and effectively detecting cloud security threats cannot be overstated. We firmly believe that harnessing generative AI can greatly benefit security teams in that regard, however, not all LLMs are created equal,” said Amir Shachar, director of AI and research at Skyhawk.
Skyhawk’s benchmark model tests LLM output on an attack sequence extracted and created by the company’s machine-learning models, comparing/scoring it against a sample of hundreds of human-labeled sequences in three ways: precision, recall, and F1 score, Skyhawk said in a press release. The closer to “one” the scores, the more accurate the predictability of the LLM.
“We can’t disclose the specifics of the tagged flows used in the scoring process because we have to protect our customers and our secret sauce,” Shachar tells CSO. “Overall, though, our conclusion is that LLMs can be very powerful and effective in threat detection, if you use them wisely.”