Database

Next Topic(s):

Created:
22nd of February 2026
12:33:18 AM
Modified:
22nd of February 2026
09:02:32 PM

Why Data Fails on Indian Construction Projects

Learning Objectives

After completing this chapter, you will be able to:

  1. Identify the six stages of the construction data lifecycle and state where spreadsheet workflows fail in the MORR project.
  2. Differentiate raw data, information, and actionable knowledge using site-office examples from Othakadai.
  3. Define data modelling as deliberate design of entities, attributes, relationships, and rules before SQL writing.
  4. Trace quantity, payroll, and audit discrepancies to missing validation and concurrent flat-file writes.
  5. State the Golden Thread baseline for Chapter 1: morr_db exists and contains no user tables.

Prerequisites

No formal prerequisite chapter applies here. This chapter establishes the baseline problem statement and vocabulary for all chapters that follow.


A Monday Morning at Othakadai

Let us examine a familiar scene. It is Monday, 08:10, at the Othakadai site office. The quantity surveying team is preparing the monthly RA bill for submission to the NHAI Project Implementation Unit in Madurai. Three Excel workbooks are open on one desk. Each workbook claims to be the latest measurement summary for the Vaigai bridge substructure and the adjacent embankment reach between Ch. 14+000 and Ch. 16+000.

The totals do not agree.

The discrepancy starts from one DGPS field observation entered two weeks earlier. Antenna height was typed as 1.382 m instead of 1.832 m. The value passed through every downstream workbook because no validation rule rejected it at entry time. Cross-sections shifted, earthwork volume rose by roughly 1,200 cubic metres, and the draft RA bill reflected an inflated quantity.

Four staff members now spend three days repairing records that should have been stopped in under one second at entry time. This is data rework. It does not break concrete, but it does delay billing, payment, and trust between teams.

๐Ÿ—

AECO Insight: In site practice, most quantity disputes are not arithmetic mistakes. They are modelling failures: a system accepted a value without checking whether the value matched field reality and project rules.

โš 

Common Pitfall: A team labels one file as final and assumes the naming convention solves governance. Filenames such as survey_final_v2_use_this.xlsx do not provide validation, lineage, or access control.

The Six-Stage Data Lifecycle

Data on an Indian highway project moves through six stages. If one stage is weak, every stage after it inherits the defect. Let us examine each stage in order, using the same MORR example:

  1. Collection: Instruments and field registers capture raw observations. Error risk appears when readings are retyped by hand.
  2. Curation: Observations receive context such as chainage, structure ID, MoRTH clause, shift, and crew.
  3. Validation: Rules test whether values are acceptable. Without this gate, impossible values enter storage.
  4. Storage: Accepted records are written to persistent storage with version history and ownership rules.
  5. Retrieval: Teams query one accepted source of truth for billing, quality, planning, and audit review.
  6. Update: Corrections and revisions are applied with traceability, not silent overwrite.

If validation is absent, retrieval becomes argument. If storage has no write controls, update becomes overwrite. If update has no audit trail, payment review becomes forensic work.

๐Ÿ“–

Definition: A single source of truth is a controlled dataset where each business fact has one canonical record path, one ownership path, and one revision history path.

flowchart TD
    A[Collection] --> B[Curation]
    B --> C[Validation]
    C --> D[Storage]
    D --> E[Retrieval]
    E --> F[Update]
    C -. missing rule .-> G[Bad value accepted]
    G --> H[Rework loop]
    H --> F

Concurrency Failure: Payroll Overwrite

Now consider wages. One Friday evening, two engineers updated overtime entries for three subcontractors in the attendance workbook. Engineer A saved first. Engineer B saved five seconds later from a stale local copy. The second save replaced the first changes in full. Overtime rows disappeared. Deductions were then computed from incomplete hours.

No malice. No negligence. The storage medium had no transaction control and no row-level concurrency guard. Last save won, and payroll inherited bad state.

๐Ÿ’ก

Did You Know: PostgreSQL uses Multi-Version Concurrency Control (MVCC) so readers and writers can work concurrently without blind overwrite of committed rows.

โš 

Common Pitfall: Teams often say, "Only one person will edit this sheet." In project reality, deadlines force parallel work. Design for concurrency from day one, or pay for rework later.

Data, Information, and Knowledge

These terms are often mixed in site discussions. We will keep them separate because design choices depend on the distinction.

  • Data: Raw observation without applied project meaning. Example: E=812345.12, N=1098765.43, RL=142.15.
  • Information: Data with context. Example: the coordinate belongs to the sub-grade check at Ch. 14+200 on 12 Nov 2025.
  • Knowledge: Information evaluated against a rule. Example: measured RL is 45 mm below design profile, so GSB laying cannot start.

Why insist on this distinction? Because modelling decisions differ at each level. Data requires type and unit. Information requires entity and relationship. Knowledge requires rule and action.

โœ”

Good Practice: Capture context at source entry time. Chainage, date, shift, and responsible role should not be added later from memory.

What Data Modelling Means in This Resource

Data modelling is not a formatting task. It is engineering design expressed in database form.

๐Ÿ“–

Definition: Data modelling is the deliberate design of entities, attributes, relationships, and constraints so the database stores only acceptable project facts.

Before writing SQL, we ask five questions in sequence:

  1. Which real entity is this record about?
  2. Which attributes are mandatory, optional, and derived?
  3. Which parent entity must exist before this row can exist?
  4. Which value ranges are acceptable under physical and contractual rules?
  5. Who is allowed to create, edit, approve, and view this row?

That sequence is the design method used in this programme. It keeps us from building a schema that looks correct but fails under site pressure.

โš–

Indian Regulatory Note: MORR data structures must support MoRTH measurement practice, wage records required under the BOCW framework and OSHWC Code operations, and audit traceability expected during NHAI and IE review cycles.

๐Ÿ”’

Security Note: Worker identifiers, payroll fields, and attendance records contain personal data. Chapter design must plan access control from the start, not after deployment.

Golden Thread Baseline: morr_db Starts Empty

In this chapter, we do not create tables yet. We establish a testable baseline. The database exists. The schema for user objects does not yet exist. This matters because every later chapter builds from this known point.

-- Baseline check for Chapter 1
SELECT current_database();

SELECT COUNT(*) AS user_tables
FROM pg_catalog.pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

SELECT COUNT(*) AS user_views
FROM pg_catalog.pg_views
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');

Sample Output:

 current_database
------------------
 morr_db
(1 row)

 user_tables
-------------
 0
(1 row)

 user_views
------------
 0
(1 row)

This output is the Chapter 1 Golden Thread checkpoint. No user table exists yet. Chapter 4 will add boq_item. Until then, we keep design work in analysis form and avoid accidental schema drift.


Self-Review Questions

Question: In the antenna-height scenario, which lifecycle stage should have blocked the bad value, and what type of rule should enforce that block?

Answer: Validation should block it. A domain rule should reject values outside acceptable antenna-height bounds for approved field equipment.

Question: Why is the payroll overwrite event a storage-medium problem rather than an individual operator problem?

Answer: Two users performed valid work at the same time. The file system provided no transaction isolation and no conflict handling, so one save replaced another committed change set.

Question: Which comes first in this resource: schema syntax or modelling questions?

Answer: Modelling questions come first. SQL encodes decisions; it does not create sound decisions on its own.

Question: Why do we record that morr_db has zero user tables in this chapter?

Answer: It sets a precise baseline for change control. Every later object introduction can be traced chapter by chapter without ambiguity.


Chapter Summary

  • Data lifecycle: Collection, curation, validation, storage, retrieval, and update; defects propagate when early stages are weak.
  • Concurrency risk: Shared flat files allow blind overwrite under parallel edits, which leads to wage, quantity, and audit errors.
  • Data vs information vs knowledge: Raw observations become useful only after context and rules are applied.
  • Data modelling: Entity, attribute, relationship, and rule design performed before SQL writing.
  • Golden Thread checkpoint: Chapter 1 ends with morr_db present and no user-defined tables or views.