Entity Modelling

www.entitymodelling.org - entity modelling introduced from first principles - relational database design theory and practice - dependent type theory


When we call to mind data, then we think of names, quantities, monetary values, addresses, dates, temperatures, geographical coordinates, and so on. Now, such items of data as these convey information only within specific contexts and when attributed to subjects at hand. A temperature, a colour, a price, a height, a distance - all these tell us nothing less they be the temperature, the colour, the price, the height or the distance of some thing. We can paraphrase in the language of entity modelling and say they tell us nothing less they be attributed to an entity.

As stored within information systems then the individually represented items - the names, colours, quantities, the monetary amounts, the dates, etc - in the language of entity modelling, are said to be the values of attributes. Thus an actual name like "John Smith" is said to the value of a 'name' attribute of a person entity. That a 'person' entity may be attributed a 'name' and a 'date of birth' we model on the diagram representing type 'person' with 'name' and 'date of birth' attribute annotations, like this:

To show that the name is a mandatory part of a person message then we may use a square in place of the circle:
To show that the person attribute within a message is the identifying attribute we underline the annotation:
If it is nessary to give both a person's name and their data of birth to uniquely identify them then we underline both of these attributes on the diagram:
Generally, systems will hold and communicate many different attributes of each type of entity and these attributes are shown beneath the identifying attributes:

It is clear that computer programs are effective only in so long as the data items they manipulate are intended and understood as attributes of subject entities. It follows that to have an effective information system we must first have agreed types of subject entity and we must first have agreed the attributes of each of these types of subject entity. In this agreement we agree the data content of the program and in turn we agree its subject matter.

Now attribute values are not stored alone and independent of one another but grouped together by the subject entities to which they apply. So it is that to be able to model the data held or communicated within an information system there is third element - that of an attribute. The concept of attribute sits alongside those of entity and relationship to form the basis of entity modelling. Each attribute is posited as a named property associated with a specific or general type of entity (i.e. with a particular species or with all species within a genus).

It is common to distinguish the following types of attributes:

  • Text valued
  • Numeric valued
  • Date valued
  • Boolean valued

It is a rule of entity modelling that for a single attribute an entity may only be attributed a single value. For this reason if a person can have multiple phone numbers then 'phone_number' is not an attribute of a 'person' entity type per se but an attribute of a 'phone' entity type that stands in relation to (is owned by) the 'person' entity:

In any given situation, getting to the right blend of entity types, attributes and relationships may be an iterative process as demonstrated in the following example.

Chemical Elements Example

For this example we start with two entries for chemical elements from a scientific dictionary1:

oxygen (Chem.). A nonmetalic element, symbol O, at. no. 8, r.a.m. 15.994, valency 2. It is a colourless, odourless gas which supports combustion and is essential for the respiration of most forms of life. M.p. -218℃C, b.p.-183℃C, density 1.42904 g/dm3 at s.t.p., formula O2. An unstable form is ozone, O3. Oxygen is the most abundant element, etc.
chlorine (Chem.). Element, symbol Cl, at.no. 17, r.a.m. 35.453, valencies 1-,3+, 5+, 7+, m.p. -101℃, b.p. -34.6℃. The second halogen, chlorine is a geenish yellow gas, with an irritating smell etc.

From these entries, and with some expansion of abbreviations, I surmise that each element has a name, a symbol, an atomic number, a relative atomic mass, one or more valencies, a melting point, a boiling point, and, optionally maybe, a density. I am left unsure whether all elements have a formulae, or whether they may have forms which may have formulae which is probably closer to the truth. I provisionally model this situation on an entity model diagram like so:

I found this preliminary model to be inadequate once I read the entry for sulphur:

sulphur (Chem.). A nonmetalic element occurring in several allotropic forms. Symbol S, at. no. 16, r.a.m. 32.06, valencies 2,4,6. Rhombic(α-) sulphur is a lemon yellow powder; m.p. 112.8 ℃, rel. d. 2.07. Monoclinic (β-) sulphur has a deeper colour than the rhombic form; m.p. 119 ℃, rel.d. 1.96, b.p. 444.6C. Chemically, sulphur resembles oxygen etc.
Reading this third entry, I learn that different forms of sulphur have different melting points and that my provisional model has miss-positioned the 'melting point' attribute in that chemical elements, in and by themsleves, cannot be attributed melting points. After a little further research I learn that these forms taken by chemical elements are, technically speaking, allotropes and that it is these that have melting-points, boiling points and densities. I reach this model:

In tabular form the dictionary entries for oxygen, chlorine and sulphur can be structured like this2 :

name symbol atomic no r.a.m. valency allotrope
name m.p. b.p.
oxygen O 8 15.994
dioxygen -218 -34.6
Ozone ??? ???
chlorine Cl 17 35.453
Dichlorine -101 -34.6
suplhur S 16 32.06
Rhombic 112.8 2.07
Monoclinic 119 1.96

In this table:

  • the columns correspond to attributes - each column heading is the name of an attribute,
  • the rows correspond to subject entities,
  • Each cell presents the value of an attribute of a subject entity.
  • Some rows have other rows inside them representing the parts of subject entities i.e the compositional substructure.
The overall structure of each row, considered as a messages, we will represent by these message structure definitions:
element(name,symbol,atom no,relative atomic mass, valency*, allotrope*)


With knowledge of mathematical foundations what is surprising about the attribute concept is that we need it at all for in Principia Mathematica Russell and Whitehead showed that the entire mathematical corpus could be deduced from a small number of starting concepts, axioms and rules of deduction; other such systems followed, most notably the system of Zermelo-Frankel set theory and in all of these the concepts of relationship and, as a special case function, are logically antecedent to the concept of number; which is to say that, in mathematical foundations, number is a type of entity, like any other, and is modelled as such. Argued in this way, the concept of attribute is redundant and, for example, numeric attributes are relationships with a type number, text-valued attributes are relationships with a type text and so on.

Figure 34
Attributes of Person Entity - this time showed as relationships with a type number.

If modelling data representation is the goal we need to give special account of these relationships with the given types number,text, boolean and so on - we refer to them as attributes and, in fact, use certain of them to code for relationships - for numbers and sequences of numbers rather than relationships are the fundamental means of coding data - sequences of binary digits code for numbers and letters, sequences of codes code for words and text and these in turn code relationships.

Personal Computer Example

In the following example based on the personal computer all files and folders are shown as having 'name' and 'date modified' attributes. In addition files, but not folders, have a 'size' attribute:

Figure 35
Model of a Personal Computer.

Logical versus Physical Models

From the perpective of Information Systems, we distinguish between logical and physical entity models, the difference being in the level of detail that they contain.

A logical entity model is one having sufficient detail to describe the data requirement for some aspect of an information system and having the characteristic that:

  • each attribute of an entity gives information about the entity above and beyond the information given by its relationships.

A physical entity model has more detail than a logical model, in a way that we subsequently describe; it specifies a message structure and a message context for each type of entity.

1Chambers Dictionary of Science and Technology, 1974, ISBN 0 550 13202 3
2The dictionary has in fact such a table as an appendix but only contains a single allotrope for each element.