research

AI agents are a powerful tool for synthesizing data to accelerate research, summarize information, and help teams make decisions faster. But combining internal...

research agents

Open

High signal Matched: research, agent, agents

AWS Machine Learning Blog · cloud · 2026-06-02

Transforming rare cancer research with Amazon Quick: Integrating biomedical databases for breakthrough discoveries

Score 9

In this post, we walk through how to use Amazon Quick Research to integrate biomedical data sources for rare cancer research. The walkthrough uses pediatric sarcoma as the research domain and draws on publicly available datasets from PubMe...

research

Open

High signal Matched: research

vLLM Project · open-source · 2026-06-01

vLLM on the DGX Spark: Architecture, Configuration, and Local Evaluation

Score 17

A technical deep dive on running vLLM on NVIDIA DGX Spark and GB10 systems, covering sm_121 architecture, unified memory behavior, NVFP4 model serving, Nemotron-3-Super configuration, Docker deployment, Prometheus metrics, and local evalua...

inference serving model-release research evals

Open

High signal Matched: serving, model, evaluation

Nota AI · korea · 2026-05-29

Full-Stack Optimization for Low-Light Video on Jetson Orin NX: From 400 ms to 28 ms

Score 23

  Jaehoon Lee Technical Content Manager, Nota AI   When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...

inference serving benchmark hardware model-release research quantization evals

Open

High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard

Google Research · big-tech · 2026-05-29

A New Era of Discovery: Google Research at I/O 2026

Score 9

General Science

research

Open

High signal Matched: research

AWS Machine Learning Blog · cloud · 2026-05-29

Evaluating Deep Agents using LangSmith on AWS

Score 9

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep...

research cloud evals agents

Open

High signal Matched: evaluation, bedrock, evals, evaluating, agent, agents

AWS Machine Learning Blog · cloud · 2026-05-29

Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore

Score 13

Datasets in AgentCore is in public preview. Agent evaluation is most powerful when you combine fast-moving online signals with stable offline baselines. To understand whether your agent is truly improving over time, you need a fixed benchm...

benchmark research cloud evals agents

Open

High signal Matched: benchmark, evaluation, bedrock, agent

vLLM Project · open-source · 2026-05-26

EAGLE 3.1: Advancing Speculative Decoding Through Collaboration Between the EAGLE Team, vLLM, and TorchSpec

Score 22

The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding algorithms across both research and...

inference speculative-decoding research

Open

High signal Matched: decoding, speculative decoding, eagle, research

NVIDIA Technical Blog · hardware · 2026-05-20

Add a Specialized Deep Research Skill to Agent Harnesses

Score 12

Agent harnesses like Claude Code, Codex, and LangChain Deep Agents are excellent orchestrators. They manage sessions, chain tools, execute code, and respond to...

research agents

Open

High signal Matched: research, agent, agents

Google Research · big-tech · 2026-05-20

Empirical Research Assistance (ERA): From Nature publication to catalyzing Computational Discovery

Score 8

General Science

research

Open

High signal Matched: research

NVIDIA Technical Blog · hardware · 2026-05-19

Mastering Agentic Techniques: AI Agent Evaluation

Score 16

Evaluating an AI model and evaluating an AI agent are related—but they answer fundamentally different questions. A model benchmark tests the capability of a...

benchmark model-release research evals agents

Open

High signal Matched: benchmark, model, evaluation, evaluating, agent, agentic

Microsoft Research · big-tech · 2026-05-16

Further Notes on Our Recent Research on AI Delegation and Long-Horizon Reliability

Score 10

Our recent paper, “LLMs Corrupt Your Documents When You Delegate”, has generated discussion about the reliability of AI systems in delegated workflows. We appreciate the interest in this work and want to clarify several important points ab...

research evals

Open

High signal Matched: paper, research, evaluation

Together AI · inference-infra · 2026-05-15

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Score 24

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.

inference serving benchmark model-release research api

Open

High signal Matched: inference, endpoint, cost, launch, research

Microsoft Research · big-tech · 2026-05-14

mimalloc: A new, high-performance, scalable memory allocator for the modern era

Score 8

mimalloc is an open-source, modern, scalable memory allocator that is a drop-in replacement for malloc and free. It is relatively small (~12K lines), with clear internal data structures, and is easy to build and integrate into other projec...

benchmark research open-source

Open

High signal Matched: performance, research, open-source

Microsoft Research · big-tech · 2026-05-14

GridSFM: A new, small foundation model for the electric grid

Score 12

Introducing GridSFM, a small foundation model that can predict AC optimal power flow in milliseconds, boosting efficiency and unlocking cost savings. Learn how GridSFM gives grid operators direct visibility into congestion, stability, and...

benchmark model-release research

Open

High signal Matched: cost, introducing, model, research

Microsoft Research · big-tech · 2026-05-12

Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

Score 8

MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials w...

model-release research

Open

High signal Matched: model, research

Nota AI · korea · 2026-05-11

[NetsPresso® x AI Agents] Easier to Use, Even More Powerful

Score 52

  Jaehoon Lee Technical Content Manager, Nota AI   NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...

inference serving kernel speculative-decoding moe benchmark hardware model-release research quantization evals agents api

Open

High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api

BAIR · research · 2026-05-08

Adaptive Parallel Reasoning: The Next Paradigm in Efficient Inference Scaling

Score 28

.apr-fig { text-align: center; margin: 1.35em 0; line-height: 1.4; } .apr-fig--wide img { display: inline-block; width: 100%; max-width: 100%; height: auto; vertical-align: middle; } .apr-fig--wide-0-8 { max-width: 80%; margin-left: auto;...

inference serving kv-cache speculative-decoding benchmark model-release research training fine-tuning evals long-context agents frontier-model

Open

High signal Matched: inference, decoding, prefill, generation, serve, throughput, kv cache, verification, performance, latency, cost, model, paper, research, evaluation, training, pretraining, sft, benchmarks, long context, context window, agentic, reasoning model

Together AI · inference-infra · 2026-05-04

Foundational research powering efficient inference at scale

Score 16

As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.

inference research

Open

High signal Matched: inference, research

Google Research · big-tech · 2026-04-30

Four ways Google Research scientists have been using Empirical Research Assistance

Score 8

Data Mining & Modeling

research

Open

High signal Matched: research

Nota AI · korea · 2026-04-29

[NVIDIA Nemotron Hackathon] Grand Prize Among 20 Teams: Behind Two Sleepless Days

Score 32

  Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...

inference moe benchmark model-release research korea training fine-tuning quantization evals agents

Open

High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic

Together AI · inference-infra · 2026-04-29

DeepSeek-V4 Pro now available on Together AI

Score 10

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence, and research synthesis.

research long-context agents

Open

High signal Matched: research, long-context, agents

NVIDIA Technical Blog · hardware · 2026-04-24

Federated Learning Without the Refactoring Overhead Using NVIDIA FLARE

Score 10

Federated learning (FL) is no longer a research curiosity—it’s a practical response to a hard constraint: the most valuable data is often the least movable....

research

Open

High signal Matched: research

Nota AI · korea · 2026-04-22

[Deep Dive: NetsPresso®] From Quantization to Graph Optimization: A Step-by-Step Model Deployment Pipeline

Score 54

  Jaehoon Lee Technical Content Manager, Nota AI   Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...

inference kernel cuda benchmark hardware model-release research korea training quantization evals api open-source

Open

High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source

BAIR · research · 2026-04-20

Gradient-based Planning for World Models at Longer Horizons

Score 16

.grasp-results-table table { font-size: 0.875rem; line-height: 1.35; width: 100%; } .grasp-results-table th, .grasp-results-table td { padding: 0.35rem 0.5rem; } /* Consistent whitespace between major sections (this post is long and hr-hea...

benchmark model-release research training evals

Open

High signal Matched: performance, model, paper, arxiv, evaluation, training

Modal · inference-infra · 2026-04-14

Autoscaling Autoresearch: Give your agents elastic GPUs on Modal

Score 10

Autoresearch automates AI research. Modal automates AI infrastructure.

research agents

Open

High signal Matched: research, agents

SkyPilot · open-source · 2026-04-09

Research-Driven Agents: What Happens When Your Agent Reads Before It Codes

Score 16

Coding agents working from code alone generate shallow hypotheses. Adding a research phase — arxiv papers, competing forks, other backends — produced 5 kernel fusions that made llama.cpp CPU inference 15% faster.

inference kernel research agents

Open

High signal Matched: inference, kernel, arxiv, research, agent, agents

Nota AI · korea · 2026-04-08

[Overview: NetsPresso®] A Platform That Handles Everything from Model Optimization to Target Deployment

Score 36

  Jaehoon Lee Technical Content Manager, Nota AI   AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...

inference distributed kv-cache speculative-decoding benchmark hardware model-release research quantization evals

Open

High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate

Together AI · inference-infra · 2026-04-03

AI for Systems: Using LLMs to Optimize Database Query Execution

Score 10

New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

research

Open

High signal Matched: research

Nota AI · korea · 2026-03-31

The Real Reason TurboQuant Shook the Market: AI Optimization Has Gone Mainstream

Score 46

  Jaehoon Lee Technical Content Manager, Nota AI   In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...

inference serving kv-cache benchmark hardware model-release research training fine-tuning quantization agents frontier-model

Open

High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model

Nota AI · korea · 2026-03-23

[GTC 2026 Recap] The Trillion-Dollar Inference Race Begins: How Nota AI Fills the Gap

Score 42

  Jaehoon Lee Technical Content Manager, Nota AI   GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...

inference serving kernel cuda kv-cache benchmark hardware model-release research cloud training long-context agents open-source

Open

High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source

Nota AI · korea · 2026-03-20

GenAI Everywhere: The Future of Edge AI Optimization with the New NetsPresso®

Score 26

  NP Product Team, Nota AI   The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...

inference kv-cache moe benchmark model-release research korea quantization

Open

High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization

Google Research · big-tech · 2026-03-18

Google Research at The Check Up: from healthcare innovation to real-world care settings

Score 8

Health & Bioscience

research

Open

High signal Matched: research

Google Research · big-tech · 2026-03-17

Testing LLMs on superconductivity research questions

Score 8

Education Innovation

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-03-16

Together AI at NVIDIA GTC 2026: Explore our latest innovations across research and products

Score 14

Together AI arrives at NVIDIA GTC 2026 with new launches in inference, agents, voice AI, and open models — plus technical sessions from its research and engineering leaders.

inference research agents

Open

High signal Matched: inference, research, agents

Nota AI · korea · 2026-03-13

NotaMoEQuantization: An MoE-Specific Quantization Method for Solar-Open-100B

Score 62

  Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...

inference serving moe benchmark hardware model-release research korea training quantization evals long-context open-source

Open

High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context

BAIR · research · 2026-03-13

Identifying Interactions at Scale for LLMs

Score 18

--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...

inference serving benchmark model-release research training evals long-context rag

Open

High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag

SqueezeBits · korea · 2026-03-11

Reliable & Scalable Synthetic Data for Physical AI (Part 2): Making Cosmos 3.1 x Faster for Production

Score 12

Explore why Physical AI deployment needs synthetic data at scale with Squeezebits' research and discover how to overcome inference bottlenecks to accelerate Roboost Agent.

inference research agents

Open

High signal Matched: inference, research, agent

Modular · inference-infra · 2026-03-06

Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem

Score 10

Modverse #53: Community Builds, Research Milestones, and a Growing Ecosystem

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-03-05

Key research and product announcements at the AI Native Conf

Score 18

At AI Native Conf, Together AI announced breakthroughs across kernels, RL, and inference optimization — including FlashAttention-4, ThunderAgent, and together.compile. Research that ships to production. That's the AI Native Cloud.

inference kernel research cloud

Open

High signal Matched: inference, flashattention, research, cloud

vLLM Project · open-source · 2026-03-04

vLLM Triton Attention Backend Deep Dive

Score 14

This article is adapted from a Red Hat hosted vLLM Office Hours session with Burkhard Ringlein from IBM Research, featuring a deep technical walkthrough of the vLLM Triton attention backend....

kernel triton research

Open

High signal Matched: triton, research

Together AI · inference-infra · 2026-03-02

Introducing Together AI’s new look

Score 14

We've refreshed our visual identity — designed with Pentagram to express how Together AI connects open-source innovation, systems research, and builders to unlock new possibilities.

model-release research open-source

Open

High signal Matched: introducing, research, open-source

AI2 · research · 2026-02-27

How do researchers actually use AI-powered science tools? Lessons from 250,000+ queries

Score 8

The Asta Interaction Dataset (AID) contains real researcher queries revealing how scientists actually use AI-powered research tools, and where their habits diverge from what tool builders expect.

research

Open

High signal Matched: research

Nota AI · korea · 2026-02-26

ERGO: Efficient High-Resolution Visual Understanding for Vision-Language Models

Score 24

inference speculative-decoding benchmark model-release research training evals

Open

High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks

Modal · inference-infra · 2026-02-25

Accelerating AI research that accelerates AI research

Score 10

Learn why researchers at Scaling Intelligence, Hazy Research, and other top labs are choosing Modal.

research

Open

High signal Matched: research

Together AI · inference-infra · 2026-02-23

How speech models fail where it matters the most and what to do about it

Score 10

State-of-the-art speech models like Whisper and Deepgram score near-human on benchmarks — then fail 39% of the time on street names. New research from Together AI exposes the gap and a fix.

research evals

Open

High signal Matched: research, benchmarks

Together AI · inference-infra · 2026-02-06

What do LLMs think when you don't tell them what to think about?

Score 10

What do language models generate when you don't tell them what to generate? New research reveals that LLM families have distinct 'knowledge priors'—GPT models default to code and math, Llama favors narratives, DeepSeek generates religious...

research

Open

High signal Matched: research

Hugging Face · open-source · 2026-01-27

Alyah ⭐️: Toward Robust Evaluation of Emirati Dialect Capabilities in Arabic LLMs

Score 10

No feed summary available yet.

research evals

Open

High signal Matched: evaluation

Together AI · inference-infra · 2026-01-26

DSGym: A holistic framework for evaluating and training data science agents

Score 18

Introducing DSGym—a holisti evaluation and training framework for LLM-based data science agents. Features 90+ bioinformatics tasks, 92 Kaggle competitions, and synthetic trajectory generation. Our 4B model achieves state-of-the-art perform...

inference benchmark model-release research training evals agents open-source

Open

High signal Matched: generation, performance, introducing, model, evaluation, training, evaluating, agents, open-source

BAIR · research · 2026-01-10

Information-Driven Design of Imaging Systems

Score 12

An encoder (optical system) maps objects to noiseless images, which noise corrupts into measurements. Our information estimator uses only these noisy measurements and a noise model to quantify how well measurements distinguish objects. Man...

benchmark model-release research training evals

Open

High signal Matched: performance, model, paper, evaluation, training, evaluate

Nota AI · korea · 2025-12-19

NVIDIA Blackwell; The Impact of NVFP4 For LLM Inference

Score 74

  Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...

inference serving kernel cuda distributed benchmark hardware model-release research training quantization evals rag

Open

High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval

Google Research · big-tech · 2025-12-19

Google Research 2025: Bolder breakthroughs, bigger impact

Score 8

Year in Review

research

Open

High signal Matched: research

Hugging Face · open-source · 2025-12-17

The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator

Score 10

No feed summary available yet.

research evals

Open

High signal Matched: evaluation

Together AI · inference-infra · 2025-12-17

Research POV: Yes, AGI Can Happen – A Computational Perspective

Score 14

Dan Fu, our VP of Kernels, has published a new post challenging the idea that AI is hitting a hardware wall. He argues that we are vastly underutilizing current chips and that better software-hardware co-design will unlock the next order o...

benchmark research

Open

High signal Matched: performance, research

Google Research · big-tech · 2025-12-16

Gemini-backed Paper Assistant Tool provides automated feedback for theoretical computer scientists at STOC 2026

Score 8

Algorithms & Theory

research

Open

High signal Matched: paper

Hugging Face · open-source · 2025-11-25

Building Deep Research: How we Achieved State of the Art

Score 10

No feed summary available yet.

research frontier-model

Open

High signal Matched: research, state of the art

AIBrix · open-source · 2025-11-10

AIBrix v0.5.0 Release: Batch API, KVCache v1 Connector, and Enhanced P/D orchestration

Score 22

🚀 AIBrix v0.5.0 Release Today, we’re excited to announce AIBrix v0.5.0, a release that pushes AIBrix closer to a batteries-included control plane for modern LLM workloads. This release introduces an OpenAI-compatible Batch API for hi...

inference benchmark model-release research evals api

Open

High signal Matched: prefill, latency, release, evaluation, api, openai-compatible

Google Research · big-tech · 2025-10-31

Accelerating the magic cycle of research breakthroughs and real-world applications

Score 8

Climate & Sustainability

research

Open

High signal Matched: research

Hugging Face · open-source · 2025-10-01

Introducing RTEB: A New Standard for Retrieval Evaluation

Score 14

No feed summary available yet.

model-release research evals rag

Open

High signal Matched: introducing, evaluation, retrieval

Google Research · big-tech · 2025-10-01

AI as a research partner: Advancing theoretical computer science with AlphaEvolve

Score 8

Algorithms & Theory

research

Open

High signal Matched: research

Google Research · big-tech · 2025-09-25

Towards better health conversations: Research insights on a “wayfinding” AI agent based on Gemini

Score 8

Generative AI

research agents

Open

High signal Matched: research, agent

Google Research · big-tech · 2025-09-10

Accelerating scientific discovery with AI-powered Empirical Research Assistance

Score 8

General Science

research

Open

High signal Matched: research

BAIR · research · 2025-09-01

What exactly does word2vec learn?

Score 14

What exactly does word2vec learn, and how? Answering this question amounts to understanding representation learning in a minimal yet interesting language modeling task. Despite the fact that word2vec is a well-known precursor to modern lan...

benchmark model-release research training

Open

High signal Matched: benchmark, performance, model, weights, paper, training

SqueezeBits · korea · 2025-08-20

[Efficient AI Study] AI Model Compression Community Study and Meetup

Score 12

Efficient AI Study & Meetup recap: SqueezeBits' community study on AI model compression, featuring paper reviews, participant interviews, and networking from the offline meetup.

model-release research

Open

High signal Matched: model, paper

Hugging Face · open-source · 2025-08-18

MCP for Research: How to Connect AI to Research Tools

Score 10

No feed summary available yet.

research agents

Open

High signal Matched: research, mcp

Google Research · big-tech · 2025-08-07

Highly accurate genome polishing with DeepPolisher: Enhancing the foundation of genomic research

Score 8

General Science

research

Open

High signal Matched: research

Nota AI · korea · 2025-07-10

Video Self-Distillation for Single-Image Encoders: Learning Temporal Priors from Unlabeled Video

Score 20

  Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH   SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...

inference benchmark model-release research training fine-tuning evals

Open

High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks

Hugging Face · open-source · 2025-07-04

Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models

Score 10

No feed summary available yet.

research training evals

Open

High signal Matched: evaluation, training

BAIR · research · 2025-07-01

Whole-Body Conditioned Egocentric Video Prediction

Score 10

.modal { display: none; position: fixed; z-index: 9999; padding-top: 50px; left: 0; top: 0; width: 100%; height: 100%; overflow: auto; background-color: rgba(0,0,0,0.9); } .modal-content { margin: auto; display: block; max-width: 90%; max-...

inference benchmark model-release research training evals agents

Open

High signal Matched: inference, generation, performance, model, paper, arxiv, evaluation, training, evaluate, agent, agents

Hugging Face · open-source · 2025-06-06

ScreenSuite - The most comprehensive evaluation suite for GUI Agents!

Score 10

No feed summary available yet.

research evals agents

Open

High signal Matched: evaluation, agents

Nota AI · korea · 2025-05-08

SplitQuant: Layer Splitting for Low-Bit Neural Network Quantization for Edge AI Devices

Score 20

  Jaewoo SongSoftware Engineer, Nota AI   SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...

benchmark model-release research quantization

Open

High signal Matched: performance, model, weights, research, quantization, int8, int4

Nota AI · korea · 2025-05-07

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features</span#x3E;

Score 28

inference kv-cache benchmark model-release research training evals open-source

Open

High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source

Modal · inference-infra · 2025-04-18

How sync. uses Modal to lipsync 100 hours of video a day

Score 8

sync. is a research lab training foundational models to understand and manipulate humans in video. After outgrowing Google Colab, they partnered with Modal for efficient deployment, allowing rapid iteration and scaling to process over 100...

research training

Open

High signal Matched: research, training

BAIR · research · 2025-04-11

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Score 10

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...

benchmark model-release research training fine-tuning evals rag api frontier-model

Open

High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota

BAIR · research · 2025-04-08

Repurposing Protein Folding Models for Generation with Latent Diffusion

Score 20

PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...

inference benchmark model-release research training rag

Open

High signal Matched: inference, generation, cost, model, weights, research, training, retrieval

Nota AI · korea · 2025-04-08

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Score 24

  Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI   SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...

inference kernel benchmark model-release research evals

Open

High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks

SqueezeBits · korea · 2025-02-27

Fits on Chips: Saving LLM Costs Became Easier Than Ever

Score 10

This article introduces Fits on Chips, an LLMOps toolkit for performance evaluation.

benchmark research evals

Open

High signal Matched: performance, evaluation

SkyPilot · open-source · 2025-02-26

Using DeepSeek R1 for RAG: Do's and Don'ts

Score 10

DeepSeek R1 has shown great reasoning capability when it is firstly released. In this blog post, we detail our learnings in using DeepSeek R1 to build a Retrieval-Augmented Generation (RAG) system, tailored for legal documents. We choose l...

inference research rag

Open

High signal Matched: generation, research, rag, retrieval-augmented generation, retrieval

Nota AI · korea · 2025-02-25

A Study on Detecting LLM-Generated Multilingual Content

Score 18

  Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI Jaeyeon KimAI Research Engineer, Nota AI   SummaryIn this study, we propose a method for determining whether given multilingual...

inference benchmark model-release research training fine-tuning

Open

High signal Matched: generation, performance, model, paper, research, training, fine-tuning

SqueezeBits · korea · 2025-02-17

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Score 14

A brief review of the research paper from our team, published at ICML 2024.

speculative-decoding research

Open

High signal Matched: verification, paper, research

Nota AI · korea · 2025-02-10

Where do LLMs Encode the Knowledge to Assess the Ambiguity?

Score 16

  Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI   SummaryIn this study, we present a method for detecting ambiguous samples in natural language understanding (NLU) tasks using...

benchmark research training evals

Open

High signal Matched: performance, paper, research, evaluation, training, evaluate

SqueezeBits · korea · 2025-01-06

[Intel Gaudi] #3. Performance Evaluation with SynapseAI v1.19

Score 18

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

benchmark hardware research evals

Open

High signal Matched: performance, accelerator, evaluation, evaluate

Hugging Face · open-source · 2024-12-10

LeMaterial: an open source initiative to accelerate materials discovery and research

Score 10

No feed summary available yet.

research open-source

Open

High signal Matched: research, open source

Hugging Face · open-source · 2024-12-04

Rethinking LLM Evaluation with 3C3H: AraGen Benchmark and Leaderboard

Score 14

No feed summary available yet.

benchmark research evals

Open

High signal Matched: benchmark, evaluation, leaderboard

SqueezeBits · korea · 2024-12-03

[Intel Gaudi] #2. Graph Compiler and Overall Performance Evaluation

Score 18

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

benchmark hardware research evals

Open

High signal Matched: performance, accelerator, evaluation, evaluate

Hugging Face · open-source · 2024-11-04

Argilla 2.4: Easily Build Fine-Tuning and Evaluation Datasets on the Hub — No Code Required

Score 10

No feed summary available yet.

research fine-tuning evals

Open

High signal Matched: evaluation, fine-tuning

SqueezeBits · korea · 2024-10-01

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Score 22

This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks for serving LLMs, evaluating their performance based on key metrics like throughput, TTFT, and TPOT to offer insights for practitioners in optimizing LLM depl...

inference serving benchmark research evals

Open

High signal Matched: serving, throughput, performance, ttft, tpot, evaluation, evaluating

Modal · inference-infra · 2024-08-05

Beat GPT-4o at Python by searching with 100 dumb LLaMAs

Score 8

Scale up smaller open models with search and evaluation to match frontier capabilities.

research evals

Open

High signal Matched: evaluation

Nota AI · korea · 2024-08-02

Deploying an Efficient Vision-Language Model on Mobile Devices

Score 38

  Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI   IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...

inference benchmark model-release research cloud training fine-tuning evals open-source

Open

High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source

Hugging Face · open-source · 2024-07-25

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Score 10

No feed summary available yet.

research fine-tuning evals

Open

High signal Matched: evaluation, fine-tuning

Nota AI · korea · 2024-06-13

Cluster Self-Refinement for Enhanced Online Multi-Camera People Tracking

Score 8

  Jeongho KimResearch Engineer, Nota AI   SummaryOnline multi-camera system for efficient individual trackingAccurate ID management with Cluster Self-Refinement (CSR)Improved performance with enhanced pose estimation   Intro...

benchmark model-release research evals

Open

High signal Matched: performance, model, paper, research, evaluation, leaderboard

Hugging Face · open-source · 2024-05-24

CyberSecEval 2 - A Comprehensive Evaluation Framework for Cybersecurity Risks and Capabilities of Large Language Models

Score 10

No feed summary available yet.

research evals

Open

High signal Matched: evaluation

Hugging Face · open-source · 2024-04-16

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Score 14

No feed summary available yet.

model-release research evals

Open

High signal Matched: introducing, evaluation, leaderboard

Hugging Face · open-source · 2024-04-04

Hugging Face partners with Wiz Research to Improve AI Security

Score 10

No feed summary available yet.

research

Open

High signal Matched: research

Hugging Face · open-source · 2024-02-20

Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem

Score 18

No feed summary available yet.

model-release research korea evals

Open

High signal Matched: introducing, evaluation, korean, leaderboard

SkyPilot · open-source · 2023-05-02

Analyzing the Whole Mouse Brain Atlas on the Cloud With SkyPilot [User Post]

Score 12

Experience report from Salk Institute on how biologists use SkyPilot to conduct research on the cloud.

research cloud

Open

High signal Matched: research, cloud

Hugging Face · open-source · 2022-11-17

Hugging Face Machine Learning Demos on arXiv

Score 10

No feed summary available yet.

research

Open

High signal Matched: arxiv

Hugging Face · open-source · 2022-08-01

Comments on U.S. National AI Research Resource Interim Report

Score 10

No feed summary available yet.

research

Open

High signal Matched: research

Hugging Face · open-source · 2022-06-28

Announcing Evaluation on the Hub

Score 10

No feed summary available yet.

research evals

Open

High signal Matched: evaluation

Hugging Face · open-source · 2022-05-19

Putting ethical principles at the core of the research lifecycle

Score 10

No feed summary available yet.

research

Open

High signal Matched: research

Hugging Face · open-source · 2022-03-22

Announcing the 🤗 AI Research Residency Program

Score 10

No feed summary available yet.

research

Open

High signal Matched: research

Microsoft Research · big-tech · 2026-05-29

Data Formulator 0.7: AI-powered data analytics for enterprise data

Score 5

Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into...

research agents

Open

Watchlist Matched: research, agents

Microsoft Research · big-tech · 2026-05-28

Extending Human Intelligence Through AI

Score 5

Understanding AI as an extension of human intelligence—not a replacement for it—offers a more grounded path for building trustworthy AI systems. The post Extending Human Intelligence Through AI appeared first on Microsoft Research.

research

Open

Watchlist Matched: research

Microsoft Research · big-tech · 2026-05-22

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

Score 6

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks. The p...

benchmark research agents

Open

Watchlist Matched: performance, research, agentic

Microsoft Research · big-tech · 2026-05-21

Vega: Zero-knowledge proofs for digital identity in the age of AI

Score 6

Vega turns a full credential into a single proof, sharing only what is needed and nothing more, with performance that works in real apps. The post Vega: Zero-knowledge proofs for digital identity in the age of AI appeared first on Microsof...

benchmark research

Open

Watchlist Matched: performance, research

Lambda · cloud · 2026-05-20

Lambda partners with Hudson River Trading to power quantitative research and development

Score 6

HRT turns to Lambda as on-premise infrastructure reaches its ceiling

research

Open

Watchlist Matched: research

Microsoft Research · big-tech · 2026-05-12

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Score 4

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoni...

research agents

Open

Watchlist Matched: research, agents

Microsoft Research · big-tech · 2026-05-09

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

Score 6

Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern...

model-release research

Open

Watchlist Matched: release, research

AI2 · research · 2026-05-07

Open by design: Ai2 brings fully open AI infrastructure online with NSF OMAI

Score 6

Ai2 is bringing NSF OMAI compute online to power a fully open AI research ecosystem, turning national infrastructure investment into reusable models, data, methods, and tools that can accelerate scientific discovery.

research

Open

Watchlist Matched: research

BAIR · research · 2025-11-01

RL without TD learning

Score 4

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalabilit...

benchmark model-release research training

Open

Watchlist Matched: benchmark, performance, model, paper, training

BAIR · research · 2025-03-25

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Score 6

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Our goal is to tackle "stop-and...

serving kernel benchmark model-release research training agents

Open

Watchlist Matched: throughput, kernel, performance, model, paper, training, agent, agents