Notes from the Field: When Scheduled Tasks Run 19 Times and Produce Nothing
Notes from the Field: When Scheduled Tasks Run 19 Times and Produce Nothing
This is an abstract of a Chinese-language Synapse case study. While reviewing n8n workflow logs, I noticed pmo-wbs-trigger had fired 19 times over six days, all marked Success — but its downstream message queue had received exactly zero records. No errors, no alerts, no exception traces, just a tidy column of green checks. The full Chinese article walks through how to detect this kind of silent failure from logs, the actual root cause (a renamed upstream field that turned every filter into undefined), and the three-layer Harness guard rails we added afterward. It is aimed at engineers running n8n / Zapier / Airflow style pipelines and at anyone designing observability for AI agents.
Key Takeaways
- “Did it run?” is the wrong question. Ask “Did it produce?”: many frameworks (n8n included) classify “node executed but emitted no data” as Success. Output Signal — trigger count vs. produced messages — must be a first-class observability metric.
- Empty inputs deserve suspicion by default: the bug hid for 19 cycles because no node objected to receiving zero records. An assertion that fires after N consecutive empty inputs would have surfaced it on day one.
- Implicit field contracts must be made explicit: the downstream filter still expected
statusafter upstream renamed ittask_status. Schema declarations turn rename-bombs into compile-time errors instead of silent runtime no-ops. - Three-layer guard rails ship together: an Output Signal probe at the pipeline edge, an empty-input assertion at every filter/aggregator, and a config-drift detector on every upstream save.
Why This Matters
Silent failure is more dangerous than loud failure because dashboards stay green and on-call stays asleep. Days or weeks pass before someone asks “why hasn’t the WBS report updated?” — and by then the damage is cumulative. The Synapse Harness layer treats silent failure as a design-philosophy problem (“default trust + missing verification”), not a bug to be patched once. The Chinese article frames the same issue as a generalizable rule set: prove production, distrust empty, contract everything. These rules now run across all our scheduled agents.
This is an abstract. Read the full article in Chinese → AI 定时任务的静默失败