Data Collection & Warehousing Grade 12

Modern databases are fed by many automated data collection methods. Large organisations store historical data in warehouses for analysis and decision-making.

Methods of Data Collection

Data collection is the process of gathering raw facts and figures from many different sources so that they can be stored, processed and turned into useful information. In the old days a person sat with a clipboard and wrote everything down by hand. Today most data is collected automatically — every time you swipe a card, drive under an e-toll gantry or tap a website, data is being captured about you without anyone writing a single thing.

WHY THIS MATTERS

A database is only as good as the data inside it. The more accurate, complete and up-to-date the collected data is, the better the decisions an organisation can make. Automated collection means more data, collected faster, with fewer human mistakes.

Example: When you scan your Pick n Pay Smart Shopper card at the till, the shop instantly records what you bought, when, where and for how much — that single swipe feeds a database that the shop later uses to send you targeted specials.

MethodDescriptionExample
Forms (manual)User types data directly into a formSchool registration, job application
Web formsOnline interactive pages with GUI components (dropdowns, checkboxes)Online survey, account sign-up
RFID tagsWireless chips store ID data, read without contactLibrary books, inventory, e-toll, access cards
Digital sensorsGather environmental data automaticallyTemperature sensors, motion detectors
CookiesSmall files stored by websites to record browsing behaviour and preferencesLogin sessions, shopping carts, ad targeting
Transaction trackingRecords each purchase: date, time, location, product, amountCredit card transactions, loyalty cards
Mobile appsCollect app usage, location, device dataUber location tracking, fitness app steps

Static vs Dynamic Location Data

TypeDescriptionExample
StaticFixed, does not change once recordedAddress of a shop, GPS coordinates of a landmark
DynamicChanges as the object moves, updated continuouslyUber driver location, delivery truck tracking

Data Warehousing

A data warehouse is a large central storage system that collects and consolidates historical data from many separate source databases into one single place, so that it can be analysed for trends and decision-making.

Think of it like this: a normal database is like the till at one Pick n Pay branch — it records today's sales as they happen. A data warehouse is like head office in Cape Town, where the sales from every branch in the country, going back many years, are all poured into one giant store. Head office does not use this to ring up a sale; they use it to spot the big picture — which products sell best in winter, which branches are growing, where to open the next store.

KEY CHARACTERISTICS

Central – one single store that brings many sources together.
Historical – it keeps years of old data, not just today's transactions.
Read-mostly – data is loaded in and then read for reports; it is not constantly changed.
Subject-oriented – organised around topics like "sales" or "customers" for easy analysis.

Note: Because a warehouse holds so much data from so many sources, the data must first be cleaned and put into a consistent format before it is loaded in — otherwise the same customer might appear three different ways and the analysis would be wrong.

DatabaseData Warehouse
PurposeDay-to-day transactionsHistorical analysis and reporting
DataCurrent transactions onlyHistorical data from many sources
OperationsRead + write (INSERT, UPDATE, DELETE)Mostly read (SELECT, reports)
Used byOperational staffAnalysts, management, BI tools

Data Mining

Data mining is the process of analysing very large datasets to discover hidden patterns, trends and useful relationships that a human could never spot by eye. The name is a good clue: just as a gold miner digs through tonnes of rock to find a few grams of gold, data mining digs through millions of records to find the valuable nuggets of knowledge buried inside.

Why Do Companies Mine Data?

Companies collect enormous amounts of data, but raw data on its own is useless — it is just numbers and words. Data mining turns that raw data into insight they can act on. For example, Pick n Pay's Smart Shopper data might reveal that customers who buy nappies on a Friday evening also tend to buy snacks — so the shop places those products near each other and increases sales. Nobody told the computer to look for that link; the pattern was mined out of the data.

Process

  1. Extract relevant data — use SQL to pull only the required subset
  2. Look for patterns — analyse for trends, correlations, anomalies
  3. Discover knowledge — turn patterns into actionable insights

Applications

Caring for Data

Caring for data means protecting the accuracy, security and availability of an organisation's data throughout its life. Data is one of the most valuable assets a company owns — often worth more than its buildings or equipment — because decisions, money and reputations all depend on it. Poor data leads to poor decisions, and lost data can shut a business down completely.

IMPORTANT RULE

"Garbage in, garbage out." If incorrect data goes into the system, only incorrect information can come out — no matter how clever your analysis is. This is why validation and verification at the point of capture are so important.

Below are the main ways an organisation looks after its data. Notice how each one protects against a different danger — wrong data, lost data, or stolen data.