Technical Intelligence Brief — QUALITY_GATE_PARTIAL

139
candidates

64
GitHub

75
social/dev web

30+
citable signals

72h
fresh window

1Executive Technical Signal

Benchmark shift: Terminal-Bench/SWE-style eval chuyển từ leaderboard sang harness nội bộ; evidence S01,S02,S03 = 3 nguồn độc lập → Action: dựng NEXA eval set 50 task.
OSS coding-agent runtime đang phân mảnh: opencode đạt 165,780 stars nhưng 6,129 issues S04 → adoption mạnh, operational risk cao → Action: trial có sandbox.
Sandbox/security trở thành gate bắt buộc: microsandbox 6,317 stars S05 + FlowLink MCP destructive-command control S09 → Action: SYNCA policy gate cho agent commands.
Cost governance nổi lên: token-budget discussion có 27 pts/32 comments S10 → AI coding ROI cần telemetry, không chỉ seat license → Action: đo cost/PR.
Context layer còn mở: Repowise/ccpocket/gptme tổng 5,887 stars S06,S08,S13 → FARE có cơ hội codebase intelligence.
Multi-agent workflow còn non-standard: Claude workflow composer + HN signal S11 → Action: chỉ trial theo runbook, chưa platform hóa.

2Trend Clusters

Hot Harness/eval: 3 benchmark signals, confidence 76%.

Hot Sandbox/governance: 2 direct signals, confidence 72%.

Emerging Codebase context: 4 repo/HN signals, confidence 68%.

Watch Multi-agent UI: 1 fresh repo signal, confidence 55%.

Noise Vibe-code anecdotes: low reproducibility.

3Must-read Sources

Type	Link	Priority	Why read / Key takeaway / Follow-up
Benchmark	DeepSWE	P0	Contamination-free long-horizon eval → dùng làm mẫu NEXA task hygiene.
Repo	opencode	P0	165,780 stars; validate CLI/runtime UX, issue-risk.
Repo	microsandbox	P0	Sandbox primitive for untrusted agent execution.
HN/GitHub	Dirac	P1	393 pts/148 comments; inspect Terminal-Bench method.
Governance	FlowLink MCP proxy	P1	Policy-control pattern for destructive MCP commands.
Cost	Uber token cost	P1	Move from adoption to unit economics.

4Fabbi Impact Map

Trend	Evidence	Impact	Move	Owner	Urgency
Harness/eval	S01/S02/S03	NEXA quality moat	Build 50-task eval	AI Eng Lead	0-2w
Context engineering	S06/S08/S13	FARE codebase map	Index 3 pilot repos	Solution Architect	0-2w
Governance	S05/S09	SYNCA risk control	Policy-as-code gate	Security Lead	0-2w
Enterprise ops	S10/S04	AIOS telemetry	Cost/PR dashboard	Platform PO	1-2m
Japan/Vietnam/Global	139 candidates	Presales narrative	Offer eval-first SDLC package	Presales Lead	1-2m

5Action Plan

DO THIS WEEK

NEXA eval harness 50 tasks; ROI/time-saving 18-25%; risk 2/5; owner AI Eng Lead; TTV 7 ngày; validate pass@1 + rollback rate.
SYNCA command policy for rm/write/network; ROI 10-15%; risk 2/5; owner Security Lead; TTV 5 ngày; validate blocked destructive command rate.
FARE codebase context pilot on 3 repos; ROI 12-20%; risk 3/5; owner Solution Architect; TTV 10 ngày; validate retrieval precision@10.
AIOS cost telemetry cost/PR + token/task; ROI 8-12%; risk 1/5; owner Platform PO; TTV 5 ngày; validate weekly spend variance.

WATCH NEXT 2-4 WEEKS

Dirac/ForgeCode leaderboard stability; opencode issue burn-down; MCP proxy patterns.

IGNORE / LOW SIGNAL

Fundraising-only, vibe-code anecdotes thiếu metric, consumer chatbot news.

6CTO Evaluation Matrix

Signal	Thesis	Counter	Decision	Next validation
Benchmarks	Eval-first beats demo-first	Public benchmark contamination	trial 76%	50 internal tasks
opencode	OSS runtime demand proven	6,129 issues	watch/trial 65%	2-week POC
Sandbox	Enterprise blocker solved by policy	Integration overhead	adopt 72%	MCP denylist test
Cost	Token spend becomes CFO topic	External story, limited details	adopt 70%	cost/PR baseline

7Detailed Source Appendix

ID	Platform	Source	Metric	Notes
S01	HN	DeepSWE: contamination-free benchmark	33 pts/9 comments	Agentic SDLC / harness
S02	HN	Dirac topped Terminal-Bench on Gemini-3 flash preview	393 pts/148 comments	Agentic SDLC / harness
S03	HN	ForgeCode: open-source coding agent in Terminal-Bench 2.0	4 pts/0 comments	Agentic SDLC / harness
S04	GitHub	anomalyco/opencode	165,780 stars/19,696 forks/6,129 issues	Agentic SDLC / harness
S05	GitHub	superradcompany/microsandbox	6,317 stars/307 forks/50 issues	Agentic SDLC / harness
S06	GitHub	gptme/gptme	4,309 stars/390 forks/14 issues	Agentic SDLC / harness
S07	GitHub	stablyai/orca	3,474 stars/228 forks/201 issues	Agentic SDLC / harness
S08	GitHub	K9i-0/ccpocket	789 stars/63 forks/9 issues	Agentic SDLC / harness
S09	HN	FlowLink MCP proxy blocking destructive commands	1 pt/0 comments	Agentic SDLC / harness
S10	HN	Uber AI budget/token-cost discussion	27 pts/32 comments	Agentic SDLC / harness
S11	HN	Visual composer for Claude Code multi-agent workflows	2 pts/0 comments	Agentic SDLC / harness
S12	HN	Functional programming accelerates agentic feature development	59 pts/31 comments	Agentic SDLC / harness
S13	HN	Repowise codebase intelligence for AI coding agents	1 pt/0 comments	Agentic SDLC / harness
S14	HN	Tracecore deterministic coding-agent benchmark	1 pt/0 comments	Agentic SDLC / harness
S15	GitHub	vercel-labs/zerolang	4,566 stars/291 forks/116 issues	Agentic SDLC / harness

8Data Quality / Scan Health Appendix

QUALITY_GATE_PARTIAL: 139 candidates scanned; dev_web/HN 30, GitHub 64, Reddit 25, YouTube 20, X 0, Facebook public 0, papers_product 0. arXiv hit 429 after bounded retries; X/Facebook public unauthenticated fallback produced no usable links. Confidence reduced, but 30+ cited/summarized technical signals available.