Skip to content

Site Survivability Planning

Remote systems should be designed around the days when everything is inconvenient: weak coverage, bad weather, no immediate site access, and power behaving worse than the design spreadsheet assumed. Survivability planning is where architecture earns trust.

A site that works only when the link is stable, the battery is healthy, the cabinet stays dry, and a technician is nearby is not truly a remote telemetry site. It is a connected field installation with optimistic assumptions.

A survivable remote telemetry site preserves the local operating story when the network is unavailable, protects its power and cabinet environment, exposes meaningful failure states, and can be recovered by the intended field team without expert improvisation. Survivability is not one feature. It is the combined behavior of power, communications, I/O, buffering, alarm rules, physical installation, and service process.

DimensionWhat to defineCommon failure if skipped
Communications lossWhat the site stores, alarms, retries, and reports after reconnectOperators see current state but lose event sequence
Power instabilityBattery runtime, brownout behavior, restart order, fuse isolationThe site disappears or restarts into an unknown state
Cabinet environmentCondensation, heat, surge paths, grounding, corrosion, water entryIntermittent failures are blamed on the router or carrier
Local signal integritySensor faults, stuck inputs, noisy analogs, failed countersBad data looks like real process behavior
Field serviceReplacement procedure, labeling, remote access, spare partsSimple failures require specialist support
Alarm disciplinePriority, deadband, persistence, latching, stale-data rulesOperators stop trusting the site

The questions to answer before hardware selection

Section titled “The questions to answer before hardware selection”
  1. How long can the site be unreachable before operations loses unacceptable visibility?
  2. Which events must be preserved locally if the network is down?
  3. Which alarms must leave immediately and which can wait?
  4. What should the upstream system show when data is stale?
  5. What happens after battery depletion, brownout, or restart?
  6. Who can safely replace or reset the field device?
  7. What information must be visible from the cabinet door or remote management tool?

These questions usually matter more than a router speed rating.

Local fallback may include:

  • latching critical alarms;
  • preserving event sequence;
  • controlling a local output safely;
  • storing interval data until reconnect;
  • reducing data volume under low power;
  • keeping last-known state with timestamp;
  • or entering a safe local mode when upstream control disappears.

The team should define fallback by consequence. A tank level site, a generator fuel site, a lift station, and a cathodic-protection rectifier do not need the same fallback behavior.

Before calling a remote site ready, simulate:

  • network loss and reconnect;
  • power interruption and restart;
  • bad sensor value;
  • alarm transition during outage;
  • cabinet door service event;
  • antenna or signal degradation;
  • and upstream stale-data display.

The test should prove what operators and technicians will see, not only that packets eventually arrive.