Edge AI optimization blog covering model compression, quantization, graph optimization, NetsPresso deployment, on-device GenAI, and efficient inference.
Nota AI · korea · 2026-05-29
Score 23
Jaehoon Lee Technical Content Manager, Nota AI When enterprises adopt AI, the most common bottleneck is not model development. It is the deployment stage: getting a finished model to run reliably on the actual target device.T...
High signal Matched: inference, throughput, benchmark, performance, latency, cost, gpu, model, evaluation, quantization, int8, benchmarks, leaderboard
Nota AI · korea · 2026-05-11
Score 52
Jaehoon Lee Technical Content Manager, Nota AI NetsPresso® now embraces AI agents. An easy-to-use interface sits on top of the validated pipeline that handles everything from model compression to device deployment.When a user...
High signal Matched: inference, endpoint, kernel, verification, moe, benchmark, latency, cost, gpu, release, model, evaluation, quantization, quantized, int4, evaluate, benchmarks, swe-bench, mmlu, agent, agents, api
Nota AI · korea · 2026-04-29
Score 32
Hancheol Park, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, NetsPresso Tech, Nota AI Geonho LeeEdge AI Engineer Intern, NetsPresso Tech, Nota AI Jaehoon Lee Technical Content Manager,...
High signal Matched: generation, moe, performance, model, weights, paper, research, evaluation, korea, korean, seoul, naver, training, fine-tuning, quantization, agent, agents, agentic
Nota AI · korea · 2026-04-22
Score 54
Jaehoon Lee Technical Content Manager, Nota AI Series Notice: NetsPresso® Technical Blog, Part 2In Part 1, we walked through a scenario of deploying Llama 3.2 1B on an edge device to illustrate the NetsPresso® workflow. The f...
High signal Matched: inference, kernel, cuda, matmul, benchmark, performance, latency, cost, npu, model, weights, paper, research, evaluation, furiosa, training, quantization, int8, int4, awq, gptq, sdk, open-source
Nota AI · korea · 2026-04-08
Score 36
Jaehoon Lee Technical Content Manager, Nota AI AI Model Optimization: Why Models Won't Run on HardwareThe Chip Is Ready, but the Model Won't DeployIf you have ever tried deploying an AI model onto your own chip, the following...
High signal Matched: inference, multi-gpu, kv cache, verification, performance, latency, gpu, model, research, evaluation, quantization, quantized, awq, gptq, evaluate
Nota AI · korea · 2026-03-31
Score 46
Jaehoon Lee Technical Content Manager, Nota AI In March, a single official announcement from Google Research rocked trillions of won in the market capitalization of U.S. infrastructure and semiconductor stocks. The catalyst:...
High signal Matched: inference, serving, generation, throughput, kv cache, benchmark, performance, cost, b200, blackwell, introducing, model, fp8, research, training, fine-tuning, quantization, quantized, agent, agentic, frontier model
Nota AI · korea · 2026-03-23
Score 42
Jaehoon Lee Technical Content Manager, Nota AI GTC has evolved far beyond a technology conference, drawing attention from global economies and financial markets alike. This year, CEO Jensen Huang took the stage in his tradema...
High signal Matched: inference, prefill, generation, throughput, cuda, kv cache, performance, latency, cost, gpu, npu, launch, model, research, cloud, training, long-context, context window, agent, agents, agentic, open-source
Nota AI · korea · 2026-03-20
Score 26
NP Product Team, Nota AI The role of Edge AI is rapidly expanding.Offline voice assistants now carry on conversations in our daily lives, vehicles infer routes in real time, and smartphones generate images without a network c...
High signal Matched: inference, kv cache, moe, benchmark, performance, latency, cost, model, research, seoul, quantization
Nota AI · korea · 2026-03-13
Score 62
Hancheol Park, Ph. D. AI Research Engineer, Nota AI Tairen PiaoAI Research Engineer, Nota AI Tae-Ho KimCTO & Co-Founder, Nota AI ✔️ Resource : The official quantized model of Solar-Open-100B, which passed the first round of Sout...
High signal Matched: inference, serving, prefill, generation, throughput, moe, router, benchmark, performance, latency, ttft, tpot, blackwell, release, model, weights, open model, research, evaluation, korea, korean, upstage, training, post-training, quantization, quantized, int4, evaluate, benchmarks, mmlu, long-context
Nota AI · korea · 2026-02-26
Score 24
Jewon Lee | Wooksu Shin | Seungmin Yang | Ki-Ung Song | Donguk Lim | Jaeyeon Kim | Tae-Ho Kim | Bo-Kyeong KimEdgeFM Team, Nota AI ✔️ Resources for more information: GitHub, ArXiv, Project Page, Demo.✔️ Accepted at ICLR 2026. &...
High signal Matched: inference, generation, verification, benchmark, performance, latency, cost, model, arxiv, evaluation, training, post-training, benchmarks
Nota AI · korea · 2025-12-19
Score 74
Seungmin YangEdgeFM Lead, Nota AI On this page ▾ SummaryWith the introduction of NVFP4—a new 4-bit floating point data type in NVIDIA’s Blackwell GPU architecture—LLM inference achieves markedly improved efficiency.Blackwell’s NVFP4...
High signal Matched: inference, serving, decoding, prefill, generation, token generation, throughput, kernel, gemm, cutlass, distributed, benchmark, performance, latency, ttft, tpot, tokens/sec, cost, gpu, blackwell, launch, model, weights, fp8, research, training, post-training, quantization, quantized, awq, benchmarks, mmlu, retrieval
Nota AI · korea · 2025-07-10
Score 20
Marcel Simon, Ph. D.ML Researcher, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI Seul-Ki Yeom, Ph. D.Research Lead, Nota AI GmbH SummaryProposes a simple next-frame prediction task using unlabeled video to enhance sing...
High signal Matched: inference, performance, model, paper, research, training, fine-tuning, benchmarks
Nota AI · korea · 2025-05-08
Score 20
Jaewoo SongSoftware Engineer, Nota AI SummaryThis study proposes an AI model preprocessing method for improved quantization accuracies on edge AI devices which do not support advanced quantization methods due to their limitat...
High signal Matched: performance, model, weights, research, quantization, int8, int4
Nota AI · korea · 2025-05-07
Score 28
Jewon Lee | Ki-Ung Song | Seungmin Yang | Donguk Lim | Jaeyeon Kim | Wooksu Shin | Bo-Kyeong Kim | Tae-Ho KimEdgeFM Team, Nota AI Yong Jae Lee, Ph. D.Associate Professor, UW-Madison SummaryOur method, Trimmed-Llama, reduces t...
High signal Matched: inference, generation, kv cache, benchmark, performance, latency, model, weights, research, training, benchmarks, open-source
Nota AI · korea · 2025-04-08
Score 24
Seul-Ki Yeom, Ph. D. Research Lead, Nota AI GmbH Tae-Ho KimCTO & Co-Founder, Nota AI SummaryDelivers real-time AI performance on edge devices such as smartphones, IoT devices, and embedded systems.Introduces a novel "Reus...
High signal Matched: inference, kernel, benchmark, performance, cost, introducing, model, paper, research, benchmarks
Nota AI · korea · 2025-02-25
Score 18
Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI Jaeyeon KimAI Research Engineer, Nota AI SummaryIn this study, we propose a method for determining whether given multilingual...
High signal Matched: generation, performance, model, paper, research, training, fine-tuning
Nota AI · korea · 2025-02-10
Score 16
Hancheol Park, Ph. D.AI Research Engineer, Nota AI Geonmin Kim, Ph. D.AI Research Engineer, Nota AI SummaryIn this study, we present a method for detecting ambiguous samples in natural language understanding (NLU) tasks using...
High signal Matched: performance, paper, research, evaluation, training, evaluate
Nota AI · korea · 2024-08-02
Score 38
Jaeyeon KimResearch Engineer, Nota AI Geonmin KimResearch Engineer, Nota AI Hancheol ParkTeam Lead of NetsPresso Application, Nota AI IntroductionRecent large language models (LLMs) have demonstrated unprecedented performance...
High signal Matched: decoding, benchmark, performance, latency, tokens/sec, model, arxiv, research, technical report, evaluation, cloud, training, lora, benchmarks, leaderboard, open-source
Nota AI · korea · 2024-06-13
Score 8
Jeongho KimResearch Engineer, Nota AI SummaryOnline multi-camera system for efficient individual trackingAccurate ID management with Cluster Self-Refinement (CSR)Improved performance with enhanced pose estimation Intro...
High signal Matched: performance, model, paper, research, evaluation, leaderboard