Lambda · cloud · 2026-06-01
Score 15
When we design large GPU clusters, the network is no longer a background system. It's part of the compute envelope. At the 800G and NVIDIA GB300 NVL72 scale, the back-end fabric accounts for 86% of networking power in a three-layer cluster...
High signal Matched: generation, token generation, throughput, infiniband, gpu, model, retrieval, agentic
LMCache · open-source · 2026-04-04
Score 34
Modern LLM serving workloads are defined by strict latency requirements, high concurrency, and rapidly growing context lengths. Applications such as multi-turn chat, AI agents, and retrieval-augmented generation continuously build on prior...
High signal Matched: inference, serving, decoding, generation, throughput, lmcache, moe, performance, latency, ttft, retrieval-augmented generation, retrieval, agents
BAIR · research · 2026-03-13
Score 18
--> Understanding the behavior of complex machine learning systems, particularly Large Language Models (LLMs), is a critical challenge in modern artificial intelligence. Interpretability research aims to make the decision-making process mo...
High signal Matched: inference, serving, decoding, performance, cost, model, research, training, evaluate, mmlu, long-context, rag
vLLM Project · open-source · 2026-03-10
Score 18
Since v0.1 Iris, vLLM Semantic Router has made a large jump. In one release cycle, the project rebuilt its model stack, expanded routing into safety, semantic caching, memory, retrieval, and...
High signal Matched: router, release, model, retrieval
AIBrix · open-source · 2026-03-03
Score 28
🚀 AIBrix v0.6.0 Release Today we’re excited to announce AIBrix v0.6.0, a release that expands how you deploy and route inference traffic. Key highlights include: Envoy Sidecar Support – Run Envoy alongside the gateway-plugin without...
High signal Matched: inference, prefill, release, model, lora, rerank, api, openai-compatible
Rebellions · hardware · 2025-12-29
Score 10
Summary Challenge 관세청은 매년 방대한 양의 수출입 신고서를 처리하며, 각 품목에 적합한 HS 코드(Harmonized System Code)를 정확하게 분류해야 하는 업무를... The post LLM/RAG 기반 몽골 관세청 물품 분류 코드 AI 추천 챗봇 appeared first on Rebellions.
High signal Matched: rebellions, rag
Nota AI · korea · 2025-12-19
Score 74
Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...
High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval
SkyPilot · open-source · 2025-12-02
Score 10
Scale document OCR batch inference for RAG on multiple clouds and Kubernetes clusters using SkyPilot Pool.
High signal Matched: inference, rag
Hugging Face · open-source · 2025-10-01
Score 14
No feed summary available yet.
High signal Matched: introducing, evaluation, retrieval
BAIR · research · 2025-04-11
Score 10
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated ap...
High signal Matched: cost, model, evaluation, training, dpo, fine-tuning, retrieval, api, sota
BAIR · research · 2025-04-08
Score 20
PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure, by learning the latent space of protein folding models. The awarding of the 2024 Nobel Prize to AlphaFold2 marks an important moment...
High signal Matched: inference, generation, cost, model, weights, research, training, retrieval
SkyPilot · open-source · 2025-02-26
Score 10
DeepSeek R1 has shown great reasoning capability when it is firstly released. In this blog post, we detail our learnings in using DeepSeek R1 to build a Retrieval-Augmented Generation (RAG) system, tailored for legal documents. We choose l...
High signal Matched: generation, research, rag, retrieval-augmented generation, retrieval
Hugging Face · open-source · 2024-05-09
Score 10
No feed summary available yet.
High signal Matched: cost, rag
Replicate · inference-infra · 2023-10-17
Score 10
In this post we'll explore the basics of retrieval augmented generation by creating an example app that uses bge-large-en for embeddings, ChromaDB for vector store, and mistral-7b-instruct for language model generation.
High signal Matched: generation, model, retrieval augmented generation, retrieval
Hugging Face · open-source · 2021-02-10
Score 10
No feed summary available yet.
High signal Matched: generation, retrieval augmented generation, retrieval
Cohere · model-lab · 2026-06-03
Score 3
No feed summary available yet.
Watchlist Matched: retrieval
Hugging Face · open-source · 2026-05-15
Score 1
No feed summary available yet.
Watchlist Matched: retrieval
NVIDIA Technical Blog · hardware · 2026-03-24
Score 3
Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,...
Watchlist Matched: rag, retrieval, agents, agentic
Google Research · big-tech · 2025-10-08
Score 0
Machine Intelligence
Watchlist Matched: retrieval
Modular · inference-infra · 2025-01-23
Score 1
Use MAX with Open WebUI for RAG and Web Search
Watchlist Matched: rag
Hugging Face · open-source · 2025-01-10
Score 1
No feed summary available yet.
Watchlist Matched: retrieval
Hugging Face · open-source · 2024-10-28
Score 1
No feed summary available yet.
Watchlist Matched: rag
Hugging Face · open-source · 2024-03-22
Score 1
No feed summary available yet.
Watchlist Matched: quantization, retrieval