H2O Datatech — Sustaining the environment through data-driven insights

Knowledge Base

From flood forecasting to flood-smart urban planning

All insights
6 min read

Flooding is no longer just an engineering or emergency response issue. For urban planners, it is becoming a land-use, infrastructure, and development control challenge — and machine learning is emerging as a powerful screening layer to guide flood-smart decisions.

A two-stage machine learning framework

A recent study by H2O Datatech presents a two-stage machine learning framework that can help planners better understand where flood risks are likely to emerge, why certain areas are more vulnerable, and how urban development patterns may influence future flood exposure.

The framework was applied to the Jambatan Sulaiman watershed, covering Kuala Lumpur and parts of Selangor. Instead of looking only at river levels, the model combines climate, land use, soil, and topographic data to produce high-resolution flood inundation insights at a 30-metre grid scale.

Flood event frequency mapped across the Jambatan Sulaiman watershed at a 30-metre grid scale.
Flood event frequency mapped across the Jambatan Sulaiman watershed at a 30-metre grid scale.

How the model works

The first stage predicts streamflow using a Random Forest model. This helps estimate how the river system responds to rainfall, land cover, and catchment conditions.

The second stage uses an XGBoost model to classify whether specific locations are likely to flood. This allows flood risk to be translated into spatial maps that are more useful for planning decisions.

Flood risk is not driven by rainfall alone

For urban planners, the most important insight is that flood risk is not driven by rainfall alone. The study shows that land cover, built-up areas, elevation, slope, topographic wetness, and proximity to rivers all play major roles in determining where floods occur.

This reinforces the need to treat flood risk as part of spatial planning, zoning, infrastructure design, and development approval.

Gini-index feature importance — soil type, elevation, topographic wetness and proximity to rivers rank among the strongest drivers of flood risk.
Gini-index feature importance — soil type, elevation, topographic wetness and proximity to rivers rank among the strongest drivers of flood risk.

The case for blue-green infrastructure

The model also highlights the value of preserving permeable and forested areas. Built-up and impervious surfaces increase runoff, while natural land cover can help slow down and absorb stormwater.

This provides a stronger evidence base for blue-green infrastructure, riparian buffers, detention areas, urban wetlands, and stricter controls on development in low-lying flood-prone zones.

From frequency maps to better questions

Another key contribution is the generation of flood frequency maps. These maps can help local authorities identify recurring hotspots, prioritise drainage upgrades, guide road and infrastructure investments, and assess whether proposed developments may increase downstream flood risks.

In practice, this type of machine learning framework can support more proactive planning. It can help planners ask better questions:

  • Is this site suitable for development?
  • Will new impervious surfaces worsen runoff?
  • Which neighbourhoods require flood mitigation first?
  • Where should nature-based solutions be protected or restored?

A screening layer, not a replacement for judgment

The study demonstrates that machine learning is not a replacement for planning judgment or hydraulic modelling. Rather, it provides a faster, data-driven screening layer that can strengthen urban planning, climate adaptation, and development control.

For rapidly urbanising cities such as Kuala Lumpur and Selangor, this approach offers a practical pathway toward flood-smart planning — where land-use decisions are guided not only by growth potential, but also by long-term resilience.

Key takeaways

  • Flood risk is a land-use and development challenge, not just an engineering one.
  • A Random Forest + XGBoost framework turns climate, land-use, soil and terrain data into 30-metre flood insight.
  • Land cover, elevation, slope, wetness and river proximity — not rainfall alone — drive where floods occur.
  • Machine learning is a fast screening layer that strengthens flood-smart planning and development control.

Ready to transform your environmental data?

Let us help you turn complex water, climate and infrastructure data into clear, actionable strategies. Talk to our team about your challenge today.