MLSys Radar

SqueezeBits

Korean AI optimization company publishing deep technical posts on model compression, quantization, vLLM, SGLang, TensorRT-LLM, edge inference, and accelerator evaluation.

Country
South Korea
Category
korea
Blog
https://blog.squeezebits.com/
Feed
https://blog.squeezebits.com/rss
Feed discovery status
known

SqueezeBits · korea · 2026-04-14

Recap: 2nd vLLM Korea Meetup 2026

Score 12

Check out highlights from the 2nd vLLM Korea Meetup! open-source use cases and real-world production examples that showcase vLLM's technical maturity!

korea open-source

Open

High signal Matched: korea, open-source

SqueezeBits · korea · 2025-12-24

Introducing rebellions ATOM™-MAX

Score 24

Introducing ATOM™-Max, rebellions’ next-generation NPU designed for high-performance AI inference. Learn how its runtime, profiling tools, and PyTorch-native integrations enable developers to run and serve models efficiently without sacrif...

inference serving benchmark hardware model-release korea

Open

High signal Matched: inference, generation, serve, performance, npu, introducing, rebellions

SqueezeBits · korea · 2025-09-16

Guided Decoding Performance on vLLM and SGLang

Score 16

The guide to LLM guided decoding! This deep-dive benchmark compares XGrammar and LLGuidance on vLLM and SGLang to help you find the optimal setup for generating structured output based on your use case.

inference benchmark

Open

High signal Matched: decoding, benchmark, performance

SqueezeBits · korea · 2025-07-21

GraLoRA: Boosting Fine-Tuning Accuracy Without Extra Cost

Score 20

LoRA excels at efficient fine-tuning but suffers at higher ranks due to gradient entanglement. We introduce GraLoRA, which addresses these issues through finer-grained, block-wise updates, significantly enhancing performance and expressivi...

benchmark fine-tuning

Open

High signal Matched: performance, cost, fine-tuning, lora

SqueezeBits · korea · 2025-03-26

TensorRT-LLM Goes Open Source!

Score 12

With TensorRT-LLM now open source, we can finally take a deep dive into the secret sauce behind its impressive performance.

benchmark open-source

Open

High signal Matched: performance, open source

SqueezeBits · korea · 2024-11-21

[Intel Gaudi] #1. Introduction

Score 12

In this blog series, we thoroughly evaluate Intel's AI accelerator, the Gaudi series, focusing on its performance, features, and usability.

benchmark hardware evals

Open

High signal Matched: performance, accelerator, evaluate

SqueezeBits · korea · 2024-10-01

[vLLM vs TensorRT-LLM] #1. An Overall Evaluation

Score 22

This article provides a comparative analysis of vLLM and TensorRT-LLM frameworks for serving LLMs, evaluating their performance based on key metrics like throughput, TTFT, and TPOT to offer insights for practitioners in optimizing LLM depl...

inference serving benchmark research evals

Open

High signal Matched: serving, throughput, performance, ttft, tpot, evaluation, evaluating

SqueezeBits · korea · 2026-05-28

2026 Efficient AI Offline Meetup

Score 2

Wrap up 8 weeks of online studies and take a look at how SqueezeBits makes an effort to maintain the AI compression community to expand!

Open

Watchlist Matched: none

SqueezeBits · korea · 2026-03-27

Our Experience Running a Booth at GTC 2026

Score 0

Sharing GTC 2026 insights, which is the Largest AI Industry Conference for developers! If you’ve ever wondered what it’s like for an AI startup to run a booth at such a massive event, you won’t want to miss this!

Open

Watchlist Matched: none

SqueezeBits · korea · 2025-03-10

When Should I Use Fits on Chips?

Score 1

This article describes when to use Fits on Chips toolkit with specific use cases.

Open

Watchlist Matched: none

SqueezeBits · korea · 2025-02-06

The Rise and Fall of ONNX (feat. PyTorch 2.0)

Score 1

This article explores the rise and fall of ONNX, from its early success as a unifying stasndard for AI frameworks to its gradual shift into a niche tool in the era of PyTorch 2.0.

Open

Watchlist Matched: none