Entity Modelling

www.entitymodelling.org - entity modelling introduced from first principles - relational database design theory and practice - dependent type theory


When we call to mind data, then we think of names, quantities, monetary values, addresses, dates, temperatures, geographical coordinates, and so on. Now, such items of data as these convey information only within specific contexts and when attributed to subjects at hand. A temperature, a colour, a price, a height, a distance - all these tell us nothing less they be the temperature, the colour, the price, the height or the distance of some thing. We can paraphrase in the language of entity modelling and say they tell us nothing less they be attributed to an entity.

As stored within information systems then the individually represented items - the names, colours, quantities, the monetary amounts, the dates, etc - in the language of entity modelling, are said to be the values of attributes. Thus an actual name like "John Smith" is said to the value of a 'name' attribute of a person entity. We may express that information about a person is communicated or stored as message with two components name and data of birth we write

person => name,
data of birth
or otherwise we might say that a 'person' entity may be attributed a 'name' and a 'date of birth' on show an entity model diagram representing type 'person' with 'name' and 'date of birth' attribute annotations, like this:

Alternatively to say that date of birth is optional we may add a question mark like so

person => name,
data of birth?
or on a diagram we may use a circle in place of the sqaure:
To show that the person attribute within a message is the identifying attribute we underline the annotation in the diagram:
or, equivalently, in the message structure:
person => name,
data of birth
If it is nessary to give both a person's name and their data of birth to uniquely identify them then we underline both of these attributes on the diagram:
Generally, systems will hold and communicate many different attributes of each type of entity and these attributes are shown beneath the identifying attributes:

It is clear that computer programs are effective only in so long as the data items they manipulate are intended and understood as attributes of subject entities. It follows that to have an effective information system we must first have agreed types of subject entity and also what may be attributed to entities of these types. In this agreement we agree the data content of the program or system i.e. its subject matter.

Now attribute values are not stored alone and independent of one another but grouped together by the subject entities to which they apply. So it is that to be able to model the data held or communicated within an information system there is this third element - that of an attribute. The concept of attribute sits alongside those of entity and relationship to form the basis of entity modelling. Each attribute is posited as a named property associated with a specific or general type of entity (i.e. with a particular species or with all species within a genus).

Diagrams that show entity types and relationships but not attributes we shall say are conceptual.

It is common to distinguish the following types of attributes:

  • Text valued
  • Numeric valued
  • Date valued
  • Boolean valued
but these different types of attribute are not particularly germane to the presentation here and so generally we omit mention of them.

In a message about a person two or more phone numbers may be communicated. This is shown like this:

person => name,
data of birth,
phone number*

It is a rule of entity modelling that for a single attribute an entity may only be attributed a single value. For this reason if a person can have multiple phone numbers then 'phone_number' is not an attribute of a 'person' entity type per se but an attribute of a 'phone' entity type that stands in relation to (is owned by) the 'person' entity:

The existence and ownership of particular phone numbers may be communicated or stored independently in which case they have the message structure:

particular_phone_number => phone number,
The world as a whole now is communicated as :
world => person*,
person => name,
data of birth
particular_phone_number => phone number,

This world view given by the message scheme (1) is an example of a relational schema. In summary, we have two possible ways of describing the world view in messages, the relational one (1) and the hierarchic1 one shown in full in (2).

world => person*
person => name,
data of birth,
phone number*


With knowledge of mathematical foundations what is surprising about the attribute concept is that we need it at all for in Principia Mathematica Russell and Whitehead showed that the entire mathematical corpus could be deduced from a small number of starting concepts, axioms and rules of deduction; other such systems followed, most notably the system of Zermelo-Frankel set theory and in all of these the concepts of relationship and, as a special case function, are logically antecedent to the concept of number; which is to say that, in mathematical foundations, number is a type of entity, like any other, and is modelled as such. Argued in this way, the concept of attribute is redundant and, for example, numeric attributes are relationships with a type number, text-valued attributes are relationships with a type text and so on.

Figure 23
Attributes of Person Entity - this time showed as relationships with a type number.

If modelling data representation is the goal we need to give special account of these relationships with the given types number,text, boolean and so on - we refer to them as attributes and, in fact, use certain of them to code for relationships - for numbers and sequences of numbers rather than relationships are the fundamental means of coding data - sequences of binary digits code for numbers and letters, sequences of codes code for words and text and these in turn code relationships.

Logical versus Physical Models

From the perpective of Information Systems, we distinguish between logical and physical entity models, the difference being in the level of detail that they contain.

A logical entity model is one having sufficient detail to describe the data requirement for some aspect of an information system and having the characteristic that:

  • each attribute of an entity gives information about the entity above and beyond the information given by its relationships.

A physical entity model has more detail than a logical model, in a way that we subsequently describe; it specifies a message structure and a message context for each type of entity.

1In this context, the term hierarchical and the term relational are generally used as contrasting terms but in actual fact a relational schema can be seen to be a special case of a hierarchical schema - one in which the hierarchical structure is minimised.