Site Survivability Planning
Site Survivability Planning
Section titled “Site Survivability Planning”Remote systems should be designed around the days when everything is inconvenient: weak coverage, bad weather, no immediate site access, and power behaving worse than the design spreadsheet assumed. Survivability planning is where architecture earns trust.
A site that works only when the link is stable, the battery is healthy, the cabinet stays dry, and a technician is nearby is not truly a remote telemetry site. It is a connected field installation with optimistic assumptions.
Quick answer
Section titled “Quick answer”A survivable remote telemetry site preserves the local operating story when the network is unavailable, protects its power and cabinet environment, exposes meaningful failure states, and can be recovered by the intended field team without expert improvisation. Survivability is not one feature. It is the combined behavior of power, communications, I/O, buffering, alarm rules, physical installation, and service process.
Survivability dimensions
Section titled “Survivability dimensions”| Dimension | What to define | Common failure if skipped |
|---|---|---|
| Communications loss | What the site stores, alarms, retries, and reports after reconnect | Operators see current state but lose event sequence |
| Power instability | Battery runtime, brownout behavior, restart order, fuse isolation | The site disappears or restarts into an unknown state |
| Cabinet environment | Condensation, heat, surge paths, grounding, corrosion, water entry | Intermittent failures are blamed on the router or carrier |
| Local signal integrity | Sensor faults, stuck inputs, noisy analogs, failed counters | Bad data looks like real process behavior |
| Field service | Replacement procedure, labeling, remote access, spare parts | Simple failures require specialist support |
| Alarm discipline | Priority, deadband, persistence, latching, stale-data rules | Operators stop trusting the site |
The questions to answer before hardware selection
Section titled “The questions to answer before hardware selection”- How long can the site be unreachable before operations loses unacceptable visibility?
- Which events must be preserved locally if the network is down?
- Which alarms must leave immediately and which can wait?
- What should the upstream system show when data is stale?
- What happens after battery depletion, brownout, or restart?
- Who can safely replace or reset the field device?
- What information must be visible from the cabinet door or remote management tool?
These questions usually matter more than a router speed rating.
Local fallback is not one thing
Section titled “Local fallback is not one thing”Local fallback may include:
- latching critical alarms;
- preserving event sequence;
- controlling a local output safely;
- storing interval data until reconnect;
- reducing data volume under low power;
- keeping last-known state with timestamp;
- or entering a safe local mode when upstream control disappears.
The team should define fallback by consequence. A tank level site, a generator fuel site, a lift station, and a cathodic-protection rectifier do not need the same fallback behavior.
Minimum acceptance test
Section titled “Minimum acceptance test”Before calling a remote site ready, simulate:
- network loss and reconnect;
- power interruption and restart;
- bad sensor value;
- alarm transition during outage;
- cabinet door service event;
- antenna or signal degradation;
- and upstream stale-data display.
The test should prove what operators and technicians will see, not only that packets eventually arrive.