New York, New York--(Newsfile Corp. - May 15, 2025) - The International Conference on Machine Learning (ICML) has officially accepted "ELITE: Enhanced Language-Image Toxicity Evaluation for Safety", a collaborative paper from AIM Intelligence, Seoul National University, Yonsei University, KIST, Kyung Hee University, and Sookmyung Women's University.
The paper proposes ELITE, a high-quality benchmark designed to evaluate the safety of Vision-Language Models (VLMs) with greater precision. At its core is the ELITE evaluator, a rubric-based method that incorporates a toxicity score to measure harmfulness in multimodal contexts-especially where VLMs produce specific, convincing responses that may appear harmless but convey dangerous intent.
"We're incredibly proud that ELITE is being recognized at ICML," said Sangyoon Yu, co-author and CEO of AIM Intelligence. "This framework is designed not just for research, but to meet the demands of real-world deployment."
Going Beyond Refusal Checks
Most safety benchmarks rely on simple refusal detection-whether a model rejects an unsafe prompt. ELITE takes it further by introducing a rubric-based evaluator that assigns a 0-25 score for every response. It assesses four dimensions:
- Refusal
- Specificity
- Convincingness
- Toxicity (0-5 scale)
This scoring system builds on the StrongREJECT framework (NeurIPS 2024) but adds a toxicity axis to better catch implicit harm, especially in safe-safe pairs-cases where both image and prompt appear safe, but the model response is not.
Designed for Real-World Attacks
To test models more thoroughly, the ELITE benchmark includes:
- 4,587 image-text pairs across 11 safety domains (e.g. hate, defamation, privacy, sexual content)
- 1,054 adversarial examples, created using four techniques:
- Blueprints
- Flowcharts
- Fake News
- Role Play
These examples reflect the kinds of prompts that can cause real-world damage-even when they don't look harmful on the surface.
Performance That Exposes the Gaps
ELITE was tested against 18 leading models, including GPT-4o, Gemini-2.0, and Pixtral-12B. The results speak for themselves:
Metric | ELITE Benchmark | Prior Benchmarks |
Attack Success Rate (E-ASR) | 2-3x more effective in detecting failures | Often underreported |
AUROC vs. Human Judgment | 0.77 (ELITE) vs. 0.46 (StrongREJECT) | Weaker evaluator alignment |
Pixtral-12B Failure Rate | >79% | Highest across all models |
GPT-4o Failure Rate | 15.67% | Still vulnerable |
Even the best models showed significant blind spots when tested with ELITE.
From Research to Product: AIM Supervisor
AIM Supervisor - AIM Red Dashboard 1
AIM Supervisor is AIM Intelligence's enterprise AI safety platform designed to support both text-based and multimodal models, including Vision-Language Models (VLMs). It enables continuous evaluation and risk control through real-time scoring, adversarial testing, and policy-based output filtering.
The platform integrates with OpenAI-compatible and HuggingFace-based models via REST API, deployable through a container or one-line API wrapper.
- Inference latency (GPU): under 700ms
- Full evaluation cycle: under 1.5 second
Key components include:
- AIM Red - an automated adversarial engine that generates jailbreak prompts across high-risk taxonomies
- AIM Guard - a real-time output evaluator applying rubric-based filters
- AI Safety Dashboard - a unified console for monitoring, scoring, and policy tuning
AIM Supervisor - AIM Red Dashboard 2
Together, these tools help organizations detect unsafe behavior, enforce policies, and maintain governance-all without slowing development.
Product access: https://suho.aim-intelligence.com/en
Global Adoption and Policy Recognition
AIM Intelligence's safety technologies are gaining traction across both industry and policy communities.
In partnership with LG CNS, AIM conducted red teaming and guardrail implementation for a customer-facing AI assistant at Woori Bank. Within days, ELITE surfaced privacy and financial safety violations, leading to targeted architecture updates.
AIM also collaborated with KT, Korea's largest telecom provider, to evaluate internal AI systems. The assessment revealed system-level vulnerabilities and informed new safety protocols for deployment.
AIM's work is being recognized internationally:
- Meta's Llama Impact Innovation Award - First Korean recipient
- Anthropic Bug Bounty Program - Red teaming frontier models
- TTA Standardization Partner - Helping define national safety guidelines for finance, healthcare, robotics, and public-sector AI
"With ELITE and our broader safety stack, we're giving builders and regulators the confidence they need to deploy AI responsibly," said Yu.
AIM Supervisor - AIM Guard Graph Policy
AI Safety Market: Growing Fast, Under-Regulated
As AI systems become embedded in finance, healthcare, defense, and public infrastructure, trust and accountability are no longer optional.
According to Markets and Markets, the global AI safety market is projected to grow from $1.1 billion in 2024 to $5.4 billion by 2030, with a 30.2% CAGR. With regulation on the rise, organizations are seeking solutions that are both robust and scalable. ELITE and AIM Supervisor meet that demand.
AIM Intelligence Joint Research Team: From left: Ha-eon Park [Seoul National University], Yoo-jin Choi [Sookmyung Women's University], Won-jun Lee [KIST (Yonsei University)], Do-hyun Lee [Seoul National University], Sang-yoon Yoo [Seoul National University]
Contact: Bomi Son
Email: team@aim-intelligence.com
Website: https://www.aim-intelligence.com
Source: Honest Medita
To view the source version of this press release, please visit https://www.newsfilecorp.com/release/252268