Table of Contents
- Developing a Quality Assurance Plan
- Skill Drill: Setting up your Workspace
- The Elements of Geospatial Data Quality
- Creating a quality assurance checklist
- Implementing your Quality Assurance Plan
- Implementing Quality Control with Tabular Data
- Skill Drill: Implementing Quality Control with Tabular Data
Implementing Quality Control with Tabular Data
Even with a quality assurance plan for collecting data, sometimes additional quality control action is necessary. Readers may encounter situations where the data useful, but not quite usable. A little clean-up may be needed. In particular, data tables, sometimes referred to as tabular data, often requires some quality control implementation to make in useable in a GIS.
Recall from previous coursework that a significant component of geospatial data resides in a database. A database is a collection of individual entities stored in a highly structured way. Entities are unique objects or features represented in the database. A database stores entities and attributes as tabular data. For example, a database storing information about land ownership would have entities such as land parcels and owners. The attributes for the land parcels may include the size, zone, and municipality. The attributes for the owners may include first name, last name, and address.
When discussing database tables, one refers to a row in the table as a record. Each record represents a single entity. For example, in a database table representing land parcels, each row in the table represents a single specific piece of land. You would not see a single record representing multiple land parcels. You will also not see the same land parcel appearing more than once in the table. The rule is one record for each individual entity (Figure 1).
When one encounters a table with more than one record for a single entity (Figure 1), the correction would be to merge the two records into one record (Figure 2).
Another common error when working with tabular data relates to the attribute type. An attribute type describes the nature of the attribute in the manner in which the database stores it in memory. Attribute types typically include numbers, strings, and dates. A string is a data type that represents text.
Using the previous example with land parcels, you may encounter a table that has an Area attribute which describes the area in acres in both number format and text format (Figure 3). This issue would violate the rules for database tables because there can be only one attribute type for each attribute.
When one encounters a table with more than one attribute type for a particular attribute (Figure 3), the correction would be to convert the values in the field so that they have a uniform attribute type (Figure 4).
There are also a few other database table constraints that one may need to address when cleaning up tabular data. The top row of the table called a header row must contain the field names. A field name generally describes the attributes stored within the field and has strict limitations such as a maximum of ten characters, no spaces, not starting with a number, and no special characters allowed. Additionally, there must not be any blank rows between records or blank columns between fields.