What is a Relational Database?

KS3 Computer Science

11-14 Years Old

48 modules covering EVERY Computer Science topic needed for KS3 level.

GCSE Computer Science

14-16 Years Old

45 modules covering EVERY Computer Science topic needed for GCSE level.

A-Level Computer Science

16-18 Years Old

66 modules covering EVERY Computer Science topic needed for A-Level.

GCSE Relational Databases (14-16 years)

  • An editable PowerPoint lesson presentation
  • Editable revision handouts
  • A glossary which covers the key terminologies of the module
  • Topic mindmaps for visualising the key concepts
  • Printable flashcards to help students engage active recall and confidence-based repetition
  • A quiz with accompanying answer key to test knowledge and understanding of the module

A-Level Relational Databases (16-18 years)

  • An editable PowerPoint lesson presentation
  • Editable revision handouts
  • A glossary which covers the key terminologies of the module
  • Topic mindmaps for visualising the key concepts
  • Printable flashcards to help students engage active recall and confidence-based repetition
  • A quiz with accompanying answer key to test knowledge and understanding of the module

Candidates should be able to:

  • understand the relationship between entities and tables
  • understand the components of a relational database, such as tables, forms, queries, reports and modules
  • understand the use of logical operators in framing database queries
  • explain the use of key fields to connect tables and avoid data redundancy
  • describe methods of validating data as it is input.

What are the limitation of flat-file databases?

In a flat-file database, all the data is held in a single table and there are several problems associated with this. Consider the following scenario:

  • A doctor wishes to use a database to keep track of his patients’ test results. It needs to record:
    • patient name
    • patient age
    • test type
    • test results.
  • A flat-file database table set up to store the data might look like the one below (the key field has been ignored).
Relational Database: Duplicated data - highlights show typing errors made during data entry
Duplicated data – highlights show typing errors made during data entry

The problems that can then occur:

Data redundancy: The limitations of trying to represent this information in a single file quickly become apparent. The first row represents the Pulse reading for patient “Kit Kline”. The second row represents a different test but requires us to duplicate the information (forename, surname, age) about the patient. This continues for each test result. This makes the size of the database unnecessarily large and the process of data entry cumbersome and repetitive. The process of unnecessarily duplicating data is called data redundancy.

Data inconsistency: Note the data entry errors highlighted in the above table. Such data inconsistency occurs when large amounts of duplicate data have to be entered into a database. By duplicating data across records in flat files we introduce a risk of data inconsistency.

SUMMARY: The problems with flat-file database systems are that they can lead to:

  • Data redundancy – the needless duplication of data
  • Data inconsistency – errors in the values or format of data that should be identical

What is a relational database?

The key difference between a relational database and a flat-file database is that in a relational database the data is grouped into entities and stored in multiple linked tables.

What is an entity and how are they related to database tables?

An entity is a “real-world thing” about which data can be held in a database. Examples of entities include:

  • Customers
  • Products
  • Pupils
  • Suppliers
  • Loans
  • Videos/DVD’s
  • Flights
  • Employees
  • Treatments
  • Contracts
  • Library books
  • Cars
  • Orders
  • Zoo animals
  • Rentals

In a relational database, each entity corresponds to a separate table in the database.

What is an attribute and how are they related to database tables?

Attributes are the facts, aspects, properties, or details about an entity. Examples of attributes include:

  • Entity – Library books. Attributes – ISBN number, author, category etc.
  • Entity – Flights. Attributes – Flight No., aircraft type, departure/arrival date/time, destination etc.
  • Entity – employee. Attributes – Name, gender, address, DOB, qualifications, job title etc.

In a database, each attribute corresponds to a separate field in the database.

In general, entities can have attributes but attributes have no smaller parts. An attribute belongs to a single entity.

What is a key field and how are they used to link database tables?

The key field (sometimes called a primary key) is used to store an attribute that makes that particular entity entry in the database unique. For example, an NHS number, a passport number, a vehicle registration, a booking reference, a flight number etc.

The tables in a relational database are linked by the key field in one table being added as a foreign key in another table to form a relationship between the entities.

Consider a library database. Entities such as book, author and library members would all have separate but linked tables.

A diagram of a simplified relational database showing the tables linked by the key fields
A diagram of a simplified relational database showing the tables linked by the key fields

The book table would only have details about the book itself while details of the author would be stored in a separate table. A further table would hold details of the library members who could borrow books.

Another further table would also be needed to store the details of the actual loans, such as the lending data and the return date.

This loan table would not need to store details of the book, the author or the borrower, just the links to such data in the other tables.

The simplified diagram shows how the key fields in each table are used to link each entity.


What are the advantages of relational databases?

  • No data redundancy – in a well-designed relational database there should be no duplicated data (other than the key field).
  • No data inconsistency – as data is not duplicated, there is no risk of the same data item being stored differently in another record.
  • Flexibility – A relational database can be queried with greater flexibility than a flat-file system. Relationships mean that data can be combined in a variety of ways to produce the views that different areas of an organisation require.

What are the components of a relational database?

The components of a relational database are:

  • Tables with Records and Fields
  • Relationships between Entities (tables) using Primary Keys and Foreign keys
  • Forms
  • Query facilities
  • Reporting facilities
  • Modules

What is a database table?

In a relational database, each table represents the attributes of one entity. Database tables are made up of record and fields.

The components of a database table
The components of a database table

What is a database record?

Each record in a relational database table corresponds to an instance of an entity. In the example table of ‘Students’ above there are 5 records (the top row containing the field names is not a record). Each record corresponds to an instance of a Student entity, i.e. each row represents an individual student. Note that although there are two students called Philip Barker with the same date of birth, they have different Student IDs and are therefore considered to be different students.

What is a database field?

An attribute is a piece of information or a characteristic of an entity. Attributes of entities are represented in database tables by fields (columns). A field stores one item of data for a record.

In the table above, each student is represented in the relational database by a record and the student attributes are stored in the following fields:

  • Student ID
  • Forename
  • Surname
  • Date of Birth

For example, the student with Student ID 6777 is called Cynthia Ferguson and she was born on 3rd January 1996.

Fields have the following characteristics:

  • Each field in a table has a unique name.
  • Each field stores a single item of data – For example, a field called Date of Birth would store no more than one date of birth value.
  • Each field has a particular data type – for example, text, Boolean, integer, date/time, etc.
  • Each field can have its own validation rules – these ensure that data recorded meets certain rules.

What is a key field?

In a relational database, each entity must have one or more fields that uniquely identify that entity. Sometimes an existing attribute can be used because it is unique but most of the time some sort of numeric ID is created. This unique identifier is called the key field or primary key. In the Student database table above the Student ID is the Key Field, needed because there is no other unique item of data about each student. Sometimes one or more fields can be combined to make a primary key, for example, a cinema reservation could be uniquely defined by a particular seat number and a particular screening.

How is the key field used to connect tables and avoid data redundancy?

To link tables, the key field in one table must appear as a foreign key field in the other table. This is best explained with an example, in this case, a simplified relational database for a doctor’s surgery.

Two tables in a relational database showing the link between them
Two tables in a relational database showing the link between them

There are two entities, ‘Patient’ and ‘Test Results’ so two tables are created.

  • ‘Forename’, ‘Surname’ etc. are examples of attributes in the ‘Patient’ entity.
  • ‘Test Performed’ and ‘Test Result’ are examples of attributes in the ‘Test Result’ entity.

As neither patients or treatments have any unique attributes, each entity is given a numeric ID as the key field. All the data associated with the patient is now held in one table and all the data associated with test results is held in another.

In order to create a relationship between the two tables, a link is created by adding the key field in the Patient Table as a foreign key field in the Test Result Table. This means that when a new test result is added to the test result table, the only data about the patient that needs to be added is the Patient ID.

The example below shows how the two tables might look with some real data.

Data stored in two tables in a relational database, showing the linked field
Data stored in two tables in a relational database, showing the linked field

As can be seen, the Patient ID key field makes a link between the two tables and, apart from the foreign key field, there is no duplication of data so no data redundancy. If more details of the patient who was tested are needed they can easily be accessed because of the relationship between the two tables of data.

Problems with data inconsistency and data integrity are also avoided because there is only one set of data.

What types of relationships can there be in a relational database?

There are three different types of relationships between entities:

  • One-to-one – e.g. Husband and wife. A husband can have one wife and a wife can have one husband
  • One-to-many – e.g. Political party and politician. A political party has many politicians, but a politician belongs to only one political party
  • Many-to-many – e.g. Actor and film. An actor stars in many films and a film can have many actors

Each of the above relationships can be shown in an Entity Relationship diagram (E-R diagram) such as the ones below:

Diagram showing the three different types of entity relationships
Diagram showing the three different types of entity relationships

The diagram below shows an E-R diagram drawn to represent the entities of a relational database planned to store details of student and the resources that they have borrowed from their school library (books, videos, music, etc).

This diagram shows that there is a one-to-many relationship between students and their library loans.
This diagram shows that there is a one-to-many relationship between students and their library loans.

The Student entity is at the one end of the relationship while the Library Loan entity is at the many end (the end of the link that looks like “crow’s feet”). The relationship between instances of Student entities and instances of Library Loan entities can be stated as:

Database info box

When converted into database tables, each entity needs its own table containing its own fields (attributes). All of the student attributes (name, date of birth, etc) will be held in a ‘Students’ table while a ‘Library Loans’ table will hold all the data associated with students’ library loans. See the tables below:

Database table showing primary keys
Database table showing primary keys

The Students table has a field called Student ID as its key field, while the Library Loans table has a field called Loan ID as its key field. Because the key field is a field that has to have a unique value for each record in the table, the ‘Student ID’ value is used to uniquely identify any student. Similarly, each Library Loan record is uniquely identified by the ‘Loan ID’ field of the Library Loans table.

Note that database tables are often designed so that when a new record is created, the key field entry is generated automatically as an incremental number (e.g. in Microsoft Access you can assign a data type of “AutoNumber” to a field). Sometimes however there is a more natural choice of unique data that can be used instead, e.g. “Car Registration Number”, “National Insurance Number” or “Product Serial Number”.

In order to create a relationship between two tables, the primary key of one of the tables must be added as a field to the other table where it will become the foreign key.

In the case of the library database, the shared field would be the ‘Student ID’ field, i.e. the primary key of the Students table. It is selected because, as a general rule, in one-to-many relationships between entities, the primary key of the entity that is on the one end of the relationship is used as the foreign key field of the entity that is on the many end of the relationship (see the E-R diagram below).

After the Student ID field is added to the Library Loans table, the tables would now look like the one below.

Database relationships showing primary and foreign keys
Database relationships showing primary and foreign keys

By appearing in both tables, the ‘Student ID’ field has created a relationship between the two tables. It acts as the primary key field in the Students table and as a foreign key field in the Library Loans table.

Unlike the ‘Loan ID’ field, the ‘Student ID’ field can occur in more than one record of the Library Loans table because a student may have borrowed more than one resource from the library.

The diagram below shows example data in the Students and Library Loans tables. Notice how the values of the primary key in the Students table match the value of the foreign key in the Library Loans table. This creates a link between the two tables and permits questions such as “What loans has student 6783 borrowed?” to be answered.

Data entries in a relational database
Data entries in a relational database

Notice how every library loan record is associated with a student, however not every student has a library loan. When a user inputs details of a new library loan, their entry will be rejected if they do not also associate the loan with an existing student (i.e. a valid Student ID). This is known as ensuring referential integrity and is one of the methods of validating data on entry.

SUMMARY:

  • The use of foreign keys enables tables to be linked to form relationships. A foreign key is a field in a table that is also a key field in another table.
  • The relationships remove data redundancy, i.e. the need for data to be duplicated (as happens in the case in flat-file databases).
  • To create a relationship between two tables, we use the key field from one of the tables and place it as a foreign key in the other table.
  • A key field is a field that is guaranteed to have a unique value for each record in the table.
  • Foreign key relationships are used to ensure the referential integrity of data in a database.

What are forms?

A form is used for data INPUT and for displaying and editing selected data, rather than having to use a database table directly. When edited, the changes update the database table.

Forms offer many advantages over tables as a user interface:

  • Data can be laid out in a meaningful way with clear labels rather than just columns of data and field names.
  • Font colour, type and size can be customised to improve clarity.
  • Graphics such as boxes and lines can be used to group and separate information.
  • Images can be included on forms, both as fields and as graphics such as logos.
  • Forms can be tailored to accept only certain information or only information in the correct format.
  • Drop-down list boxes can speed up data input as well as reducing errors.
  • Forms can include control buttons to navigate to other records or forms, close forms, open help screens or print out information.
A typical database form
A typical database form

In the GCSE Computing exam, you might well be asked to design a layout for a form. Make sure you take note of the following:

  • always include a title that clearly describes the purpose of the form;
  • arrange the fields in a logical order, grouping similar fields if necessary;
  • make the field labels as clear as possible (they do not have to exactly match the field names);
  • position each field label carefully, either above or to the left of the field;
  • make sure the field label and the field itself are easily distinguished (showing the field as a box with the field name inside is the best option);
  • if the form uses data from multiple tables, make sure you include all the fields relevant to the form;
  • optional – add symbols to show drop down lists and navigation buttons (but label these clearly if you include them);
  • if an image field is required, think about the orientation (portrait or landscape) and don’t forget to add a label;
  • make sure you fill the box you are given to create your design in and spread things out.
A typical examination question and design sketch answeral
A typical examination question and design sketch answer

What is a query?

The interrogation of a database is called querying the database and a question used to interrogate the data is called a query.

There are many reasons why users may wish to query data, including:

  • To identify a group of records that share a certain attribute – e.g. a list of students with nut allergies, products from a particular supplier, etc.
  • To calculate totals based on the information held in records – e.g. calculating the total value of the assets held by a company.
  • To update the details of a specific record or group of records.
  • To view data in particular combinations etc.

The above list shows that queries are a means of producing information from data. This information is used by the decision-makers in organisations to plan strategies and tactics. Databases usually allow users to create and reuse their own queries.

A query design specifies which records the user is searching for and what fields to display out of those records. There are two types of queries:

  • Simple query – looking for data in one field only.
  • Complex query – looking for data in multiple fields and across linked tables.

The example of a simple query below uses a Microsoft Access database table and a query to find the names of all students who are in form 10B. This involves the following steps:

  • Adding the Students table to the query;
  • Selecting the fields to display;
  • Adding the search criteria into the correct field.
A simple query - showing the design view and the result
A simple query – showing the design view and the result

A query can also be written using a query language such as SQL (Structured Query Language). The above example would be written as:

SELECT Surname, Forename, Form FROM Students WHERE Form = "10B"

For the GCSE Computing examination, this could be simplified to

Form = "10B"

Further examples of queries on this data could be:

  • (Form = "10A") AND (Gender = “Female”)
  • (Form = "10A") OR (Form = "10B")
  • (Gender = "Male") AND (Date of Birth < 01/01/1995)
  • (Student ID > 1100) AND (Student ID < 1200)

What are reports?

A simple report from a database
A simple report from a database

A report is used for data OUTPUT. It can either be displayed on a screen or more typically be printed.

A printed report is therefore a snapshot of the data in the database at a particular point in time.

Reports offer a number of advantages over just displaying or printing out a database table or query result:

  • Query results or data from linked tables can be presented in a professional manner, using different fonts and colours;
  • Reports can be used to calculate totals, averages and other aggregated data;
  • Reports can group, sort and summarise data to give information that is useful for the decision makers in organisations;
  • Reports can include images and and charts generated from the data in the database.

Most relational databases include facilities for reports to be generated from table data or from the results of queries on data held in the database.

Reports can include charts and formatting to make their information easy to assimilate. As with forms, reports can be customised to include the logos, colours and font styles of an organisation.

Different types of reports from databases
Different types of reports from databases

In the GCSE Computing exam, you might well be asked to design a layout for a report. Make sure you take note of the following:

  • always include a title that clearly describes the purpose and time/date of the report;
  • each field forms a column, a bit like in a database table. Label each column clearly, then position each field underneath it;
  • if the report uses data from multiple tables, make sure you include all the fields relevant to the report;
  • do not include fields that do not relate to the report, just because they are in the tables described in the question;
  • show any grouping or sorting that you think should be built into the report;
  • make sure you fill the box you are given to create your design in, spread things out.
A typical examination question and design sketch answer
A typical examination question and design sketch answer

What data types are there in a database?

When a database is designed, all the fields are set to accept a particular data type. This helps check for the wrong type of data being entered (validation). For example, a database will not allow you to store text in a field of data type: ‘integer’.

Data types also help to make sure that the data is stored as efficiently as possible. For example, a field of data type ‘integer’ requires less storage space to hold the value 2 compared a field of data type “real”.

Data types also help ensure that data can be sorted correctly. For example, if the integer values 1, 2, 3 and 20 are stored in a number field then they can easily be sorted correctly. However, if the same values were stored in a text/alphanumeric field as text then they would sort in alphabetical order: 1, 2, 20, 3. In the same way, date fields allow data to be sorted by year, then month and then day.

Text/String/Alphanumeric

A field of data type text will accept both numbers and text.

  • When to use: Use text data types if you want to enter text or a mixture of text and numbers into a field.
  • When NOT to use: if you want to store only numbers or only dates. You can enter numbers or dates into text fields but they would get sorted as if they were words and could not easily be used in number or date calculations. They would also be more difficult to validate using range checks.
  • Note, telephone ‘numbers’ are always stored as text, this allows a leading zero, spaces, brackets etc.
Number

Numeric fields are used to store numbers. This is because:

  • The data will be sorted correctly: the numbers 2, 3, 20, 30 would be sorted as: 2, 20, 3, 30 if they were stored in a text field.
  • Validation rules such as range checks can used (e.g. a range check such as BETWEEN 10 and 20)
  • Arithmetic operations can be carried out on number fields.

There are two types of numeric field you need to know about: Real and Integer.

  • Real: This is for numbers which can have decimal places. For example:
    • 132.01
    • 45.7
    • 9.18652493
  • Real numbers can be formatted as currency (e.g. £5.67) and/or to a fixed number of decimal places (e.g. entering 3.1427 into a real number field formatted to two decimal places would mean it appeared as 3.14)
  • When to use the real data type:
    • use the real data type if the field is going to be used to store numbers where precision is important. For example:
      • Height of people (e.g. 1.56m)
      • Length of a room (e.g. 3.7m)
      • Price of goods (e.g. £4.25) (Note: this data would be formatted as currency but would still be a real number)
  • Integer: This for whole numbers with no decimal places. For example:
    • 44
    • -763
    • 9321389
  • When to use the integer data type:
    • Integers are used when accuracy is not of vital importance or when the value allocated is specifically a whole number (e.g. a count of something).
    • For example:
      • Number of car doors
      • Large amounts of money (e.g. Average house price £155000)
      • Number of children
Logical/Boolean
  • A field of data type Boolean is one which can contain only one of two values: true and false. These two values can be used to represent anything containing two states.
  • When to use a Boolean data type – when you need to store a data item that has only one of two possible values. It is often phrased as a question:
    • Has the pet been vaccinated?
    • Does the property have a garden?
    • Are you female?
Date/Time

A field of this type stores days, months and years so that records can be sorted correctly (by year, then month and finally day) and be easily validated using range checks. Date fields can display the date information in different formats such as the full name of the day/month (28th August 1961) or the numerical versions (28/08/61).

  • When to use a date/time field data type – when you need to store dates or times and have the data easily validated and sorted.

What are database modules?

Modules are packages of code that database developers can use to add extra functionality to a relational database. Each module is a collection of procedures that are stored together as a unit in the database. Modules can be associated with specific forms or reports in which case the procedures tend to be specific to the form or report to which they are attached.

Modules can be used to present a company’s business rules in a user-friendly interface. An example business rule might state that no credit check is to be performed on existing customers when processing loan applications. In a large organisation, some modules can contain general procedures so that they can be reused in different sections of the company. This saves different users from having to write their own versions of the same queries on databases and helps to ensure that company standards and practices are followed. Well written modules tend to be optimised to run faster and more efficiently than queries written by end-users.

Modules can be presented to users as applications tailored to the users’ needs. For example, the user interface can be formed for use by a dental receptionist. She can use these to query the appointments database in order to find and book a slot for a patient. By such means, modules save end users from having to understand the tables, fields and relationships of a database.

In Microsoft Access, modules are produced using a language called Visual Basic for Applications. Other database vendors similarly have their own languages for producing modules. Some vendors have specialised in producing professional modules for common business functions. For example, you can purchase individual systems from Oracle that are composed of the databases and modules for:

  • Inventory Management
  • Payroll
  • Human Resources (i.e. staff management)
  • Marketing
  • Sales
  • Purchasing Order Management

How are logical operators used in framing database queries?

A complex query looks for data in two or more fields and uses the logical operators OR, AND or NOT.

The following example uses a complex query to find all of the pupils in Form 10B who were born before 1995. This query uses the logical operator AND and is written as:

(Form = “10B”) AND (Date of Birth < 01/01/1995)

The following operators can be used to refine search:

  • = (equals)
  • < (less than)
  • ≤ (less than or equal to)
  • > (greater than)
  • ≥ (greater than or equal to)
  • <> (not equal to)

A query design in MS Access is shown below. Note that there are two entries in the search criteria row and that the query has been given a meaningful name (“Select Query – Pupils in 10B born before 1995”).

A complex query using more than one field
A complex query using more than one field

Below is a complex query that uses the logical operator OR to find pupils who are in Form 10A OR Form 10C:

(Form = “10A”) OR (Form = “10C”)

This time, in the query definition there will be two criteria rows. The query and its results are shown below:

A complex query using OR
A complex query using OR

What are wildcards in queries?

Wildcard characters can be used in database queries. For example, you may want a list of all pupils born in November or all of the pupils whose surname starts with a ‘C’. Wildcard searches allow you to specify the part of the data that you know and leave the data handling software to fill in the blanks.

Note that wildcard characters do vary between different relational databases. For example, the wildcard characters you would use in a Microsoft Access database differ from the wildcard characters used in an internationally recognised standard called ANSI-92. ANSI-92 is used by several popular relational databases such as ORACLE and MYSQL. The following table shows some wildcards you can use when you compose query criteria using Microsoft Access.

Wildcards in database queries
Wildcards in database queries

How can data be validated?

A validation check is a rule that is built into a database to check that the data entered is:

  • sensible
  • reasonable
  • within acceptable boundaries
  • complete

It does NOT mean that the data is actually correct, that requires verification.

There are a number of different validation rules that can be used in a database:

Type Checks

Field data types provide a basic method of validation. Field data types are assigned to fields during the creation of the database table and data types such as Numeric, Boolean, Date/Time and Image restrict what can be entered.

  • Example – if a user tries to enter text in a date field or alphabetic characters in a numeric field, their entry will be rejected.

Range checks

These are used with numerical data to limit the range of numbers a user can enter.

Examples – The ‘day’ part of a date must be in the range 1 to 31.

Further examples of range check validation rules
RuleWhat is being checkedValid dataInvalid data
>20If a numerical entry is greater than 202120
>=20If a numerical entry is greater than OR equal to 202019
BETWEEN 20 AND 30If a numerical entry is greater than OR equal to 20 AND less than OR equal to 302531
>=20 AND <=30If a numerical entry greater than OR is equal to 20 AND less than OR equal to 302518

Check digits

This type of check is used with numbers. An extra ‘check digit’ is calculated from the numbers to be entered and added to the end. The numbers can then be checked at any stage by re-calculating the check digit from the other numbers and seeing if it matches the one entered. One example where a check digit is used is in the 10 digit ISBN number which uniquely identifies books. The last number of the ISBN is actually the check digit for the other numbers, for example – the ISBN 0192761501.

  • The Modulus-11 system is one example of a check digit system and can apparently detect 99% of input errors (there is still a small chance that more than one error occurs and the resulting number has the same check digit).

How the Modulus-11 system works:

An ISBN stored using a European Article Number (EAN) barcode
An ISBN stored using a European Article Number (EAN) barcode
  • Each number input is ‘weighted’ by multiplying it by it’s position (see the diagram below)
  • The weightings are adding together (the checksum) and this is divided by the prime number 11
  • If there is a remainder then it is subtracted from 11 to get the check digit*, otherwise the check digit is 0.
  • The check digit is then added to the end of the number.
  • To check if the number entered is correct the calculation is repeated but this time the check digit is included in the calculation.
  • Because of this the remainder should now be 0 and it it is then there is around a 99% chance the other numbers were entered correctly.

*if the check digit is a 10 then this is printed as an x to keep it as a single character.

i.e. the number 019276150 would have a check digit of 1 as shown below.

Check
digit
Position10987654321
ISBN0192761501Checksum with
check digit

Checksum with check digit / 11

(Number) x (position)09721442304150118717 remainder 0
  • The weighted checksum without the check digit is 186. 186 divided by 11 = 16 remainder 10. 11 minus the remainder therefore gives a check digit of 1
  • When the check digit is included in the calculation in position 1 then the checksum is 187. 187 divided by 11 = 17 remainder 0 so the check digit matches the rest of the numbers which means they were probably entered correctly.

Presence checks

A presence check
A presence check

These simply check that an entry has been made in a particular field before the user can continue i.e. a null value (empty field) is not permitted. Usually, not every field in a record needs to be filled in, however, there are likely to be some that must have a value and the presence check means that the system will not allow the record to be saved until an entry is made.

  • Example – An application for a passport must have the applicant’s surname.
  • Example – When recording the details of a new employee, presence validation will cause rejection of a record that has nothing entered in the ‘Job Title’ field.

Length Checks

All alphanumeric data has a length. A single character has a length of 1 and a string of text such as “Hello World” has a length of 11 (spaces are counted in text strings). A length check ensures that such data is either an exact length or does not exceed a specified number of characters.

  • Examples – Bank account numbers are stored as text and should be 8 characters in length. Postcodes consist of no more than 8 characters. Mobile phone numbers are stored as text and should be 11 characters in length.

Lookup

A lookup check takes the value entered and compares it against a list of values in a separate table. It can then return confirmation of the value entered or a second list based on the value.

Dropdown list
Dropdown list

One use of lookups restricts users to pre-defined input using drop-down lists. A user is forced to use a list box to select from a predetermined list of valid values. For example, they may have to select Male or Female or an exam grade that should be in the range ‘A’…’G’ or ‘U’. , etc.

An example is frequently encountered when entering a postcode in an online form. The database looks up the postcode and then returns a list of possible addresses for the user to select from.

A database system that manages airline seat bookings could use this check to limit the entries available in a list box selection. Rather than checking for double-bookings after data has been entered, travel agents could be presented with a list box that only shows the seats that are still available. This would eliminate double-bookings and is an example of more complicated, ‘real-time’ lookup validation.

Validation using referential integrity in a database
Validation using referential integrity in a database

Referential integrity is a special form of lookup. It is used in relational databases to enforce valid inputs and involves checking entries in certain fields against values in other tables.

For example, in an Orders table, when a new record is entered, the database could check the ‘Customer_ID’ field of the new order against the existing ‘Customer_ID’ data already in the Customers table to prevent the entry of orders for customers that do not exist.

Input Masks

Certain alphanumeric fields in a database may require entry in a particular format, e.g. a mixture of numbers and letters. Simple examples of this are date structures or text of a specific length. More complex examples include data items such as postcodes, National Insurance Numbers, driving licence numbers or product codes.

Most databases allow formats to be defined for database fields by an input mask, which defines the valid characters permitted in a field. For example, some item codes in a factory may look like:

  • TY856J
  • FS871K
  • AP120N

The first two characters are letters, the next three characters are numbers and the last character is a letter.

This would allow a format check to be applied to the field, for example LLNNNL where L is any upper-case letter and N is any number. This makes sure that only letters can be entered for the first two and the last characters and only numbers entered for the middle three.

Common input mask codes
Na digit between 0 and 9 must be entered
#an entry is optional, but it must be a digit between 0 and 9
La letter between a to z must be entered
an entry is optional, but it must be a letter from a to z
a letter or digit must be entered
an entry is optional, but it must be a letter or digit

How can data be verified?

Typing data into a computer from a data capture form is called transcribing. If an error is made then it is called a transcription error.

Verification is used to eliminate transcription errors. Data that has been verified should therefore be totally correct. There are two main methods of verification:

  • Visual checking – data that has been entered is displayed on a screen or printed out for the user to check it is correct before it is finally submitted for entry. Data that is not correct is then re-entered correctly This is often seen when completing an online order or when transferring funds in online banking. Before the order is confirmed or the transfer is made, all the details are displayed for the user to check.
Pasword verification
Pasword verification

Double Entry – Double entry is when data is entered twice, sometimes by two different people. The computer compares the two versions and does not accept the entry if there is a difference between them. A familiar example of this type of verification is encountered when setting a new password: users are usually asked to key the password a second time to ensure that they did not make a mistake the first time. As passwords are not displayed on the screen, this check is very useful in ensuring that the right password has been set.

SUMMARY

  • Validation cannot check that data is actually correct, only that it is reasonable or sensible data.
  • The process of checking that data is actually correct is called verification.

Further Readings: