2NF There are two Definitions which is right

Post by Jens Haase
1: A relation R(A,F) is in 2NF, when every attribute not belonging to
the primary key of R is fully functionally dependent on the primary key of R

Can you give a hint as to the significance of the parenthesized "(A,F)";
neither A nor F is referenced in the definition quoted. I don't think
it alters the answer, but I'm curious. (The comma after 2NF should be
omitted too.)

Post by Jens Haase
2: A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R

Assuming 'nonprime attributes' are the attributes that do not belong to
the primary key, it seems to me that the two definitions are the same.

Post by Jens Haase
According to the first definition the following schema would not be in
R = (A,B,C,D)
A->D
AB->C
C->D
D->A
According to the first definition when the primary key is AB, then A->D
violates 2NF, because D is not part of the primary key.

I agree that AB is the primary key - or at least a valid primary key.
A->D itself cannot violate 2NF; it is a functional dependency, and
relation schemas are what satisfy or violate 2NF. And the reasoning the
that D is not part of the primary key is bogus too.

As I see it, AB->C and C->D implies that AB->D so the primary key
determines both C and D, as required. Granted, it is neither in 3NF nor
BCNF - the mnemonic is "the key, the whole key, and nothing but the
key", and the functional dependency A->D means that 'the whole key' part
of the rule is violated. (The C->D part is also problematic, but doesn't
cause the relation schema to violate 2NF.)

Post by Jens Haase
According to the second definition R is in 2NF because the candidate
keys are AB, BC, BD and since that, every attribute is prime attribute,
so there is no violation of 2NF.

Now you introduce a term - candidate key - not mentioned in the
definitions of 2NF that we're dissecting. I would dispute that BC is a
candidate key anyway; likewise BD. Given a relation schema R { A, B, C,
D, E } with candidate key AB, by definition, AB->CDE (and, by the rules
of trivial functional dependencies (or FDs), AB->ABCDE). That is, a
candidate key functionally determines every attribute in the relation
schema. How do you apply Armstrong's Axioms to the relation schema and
list of FDs to deduce that BC->AD?

Post by Jens Haase
Which definition is rigth?

Both - they're the same.

--
Jonathan Leffler #include <disclaimer.h>
Email: ***@earthlink.net, ***@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Jens Haase

2005-07-21 07:40:36 UTC

Post by Jonathan Leffler
Can you give a hint as to the significance of the parenthesized "(A,F)";
neither A nor F is referenced in the definition quoted. I don't think
it alters the answer, but I'm curious. (The comma after 2NF should be
omitted too.)

R(A,F) means a Relation R with attributes A and functional dependencies F

Post by Jonathan Leffler
Assuming 'nonprime attributes' are the attributes that do not belong to
the primary key, it seems to me that the two definitions are the same.

This definition (the second) is from Elmasri/Navate Fundamentals of
Database Systems Second Edition Page 411.
On Page 408 in the same book the authors state:

An attribute of relation schema R is called a prime attribute of R if it
is a member of some candidate key of R. An attribute is called nonprime
if it is not a prime attribute—that is, if it is not a member of any
candidate key.

A superkey of a relation schema is a set of attributes S part of R with
the property that no two tuples t1 and t2 in any legal relation state r
of R will have t1[S] = t2[S]. A key K is a superkey with the additional
property that removal of any attribute from K will cause K not to be a
superkey any more. The difference between a key and a superkey is that a
key has to be minimal; that is, if we have a key K={A1,A2,....Ak} of R,
then K - {Ai} is not a key of R for any i, l <= i <= k.

If a relation schema has more than one key, each is called a candidate
key. One of the candidate keys is arbitrarily designated to be the
primary key, and the others are called secondary keys

According to this definition there is a difference between the two
definitions.

Post by Jonathan Leffler
I agree that AB is the primary key - or at least a valid primary key.
A->D itself cannot violate 2NF; it is a functional dependency, and
relation schemas are what satisfy or violate 2NF. And the reasoning the
that D is not part of the primary key is bogus too.

A->D tells us that D is only dependant on A and so only on a part of the
primary key.

Post by Jonathan Leffler
As I see it, AB->C and C->D implies that AB->D so the primary key
determines both C and D, as required. Granted, it is neither in 3NF nor
BCNF - the mnemonic is "the key, the whole key, and nothing but the
key", and the functional dependency A->D means that 'the whole key' part
of the rule is violated. (The C->D part is also problematic, but doesn't
cause the relation schema to violate 2NF.)

According to Ullmans definition of 3NF:
A relation R is in third normal form in whenever X -> A holds in R and A
is not in X, then either X is a candidate key for R, or A is prime attribute

it is in 3NF since:

A->D holds 3NF since D is prime attribute
AB->C holds 3NF because AB is candidate key
C->D holds 3NF because D is prime attribute
D->A holds 3NF because A is prime attribute

Post by Jonathan Leffler
Now you introduce a term - candidate key - not mentioned in the
definitions of 2NF that we're dissecting. I would dispute that BC is a
candidate key anyway; likewise BD. Given a relation schema R { A, B, C,
D, E } with candidate key AB, by definition, AB->CDE (and, by the rules
of trivial functional dependencies (or FDs), AB->ABCDE). That is, a
candidate key functionally determines every attribute in the relation
schema. How do you apply Armstrong's Axioms to the relation schema and
list of FDs to deduce that BC->AD?

Well
since C->D and D->A with Armstrongs 3. Axiom we get C->A
that gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC
is a candidate key.

since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a
candidate key.

The hole problem i am facing, is that the first definition is from a
german script and the second is from an english book (Elmasri/Navathe)
and my thougt is, that there is a confusion between "Primary key" and
"prime".

Jens

Jonathan Leffler

2005-07-22 07:59:44 UTC

I've included most of Jens original question and rather more of my
initial response to Jens question in this response, so that the
discussion is more nearly complete in one message.

Post by Jens Haase
1: A relation R(A,F) is in 2NF, when every attribute not
belonging to the primary key of R is fully functionally dependent
on the primary key of R

Can you give a hint as to the significance of the parenthesized
"(A,F)"; neither A nor F is referenced in the definition quoted. I
don't think it alters the answer, but I'm curious. (The comma after
2NF should be omitted too.)

A beter version of this definition would be "A relation R is in 2NF when
every attribute that is not part of a candidate key of R is fully
functionally dependent on the candidate key(s) of R".

Post by Jens Haase
R(A,F) means a Relation R with attributes A and functional
dependencies F

Fine - as expected, this makes no difference to the answer.

Post by Jens Haase
2: A relation schema R is in 2NF if every nonprime attribute A in
R is fully functionally dependent on the primary key of R

Assuming 'nonprime attributes' are the attributes that do not
belong to the primary key, it seems to me that the two definitions
are the same.

This definition (the second) is from Elmasri/Navate Fundamentals of
Database Systems Second Edition Page 411.
An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a
member of any candidate key.

It certainly helps to know the extra background. As I noted in a
separate response to Jon Heggland's comment, neither of the definitions
clearly allows for multiple candidate keys, though given this definition
of non-prime attribute, you can argue that this version does. However,
even so, it would be IMNSHO be better phrased if it finished 'dependent
on each candidate key of R'.

Post by Jens Haase
A superkey [...]

Superkeys are not relevant to most of this discussion, I think.

Post by Jens Haase
If a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily designated to
be the primary key, and the others are called secondary keys

Secondary key is not a common term for the non-primary candidate keys
(though it is more logical in some respects than 'alternative key',
which is the term I'm more familiar with). These days there is no
formal reason to distinguish one candidate key as the primary key and
the others by another name. Normalization theory is best defined in
terms of candidate keys (see Date 'The Primacy of Primary Keys: An
Investigation' in 'Relational Database: Writings 1991-1994').

Post by Jens Haase
According to this definition there is a difference between the two
definitions.

Just about - and the second is the better definition, but not perfect.

Post by Jens Haase
According to the first definition the following schema would not
R = (A,B,C,D)
A->D
AB->C
C->D
D->A
According to the first definition when the primary key is AB, then
A->D violates 2NF, because D is not part of the primary key.

I agree that AB is the primary key - or at least a valid primary key.

AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
(union); and AB->ABCD (reflexivity); and therefore AB determines all
attributes of R, which makes it a candidate key.

Post by Jonathan Leffler
A->D itself cannot violate 2NF; it is a functional dependency, and
relation schemas are what satisfy or violate 2NF. And the reasoning
the that D is not part of the primary key is bogus too.

Clearly, 'the that' is a typo - sorry. The first sentence is logic
chopping, and may not be fair since I suspect English is not Jens's
first language; nevertheless, it is also accurate. "According ... AB,
then the FD A->D means that R is not in 2NF" would be concise - but
wrong. However, the trouble is that the first definition precludes the
possibility of multiple candidate keys, and therefore cannot be applied
to this relation schema where, as shown below, there are 3 candidate
keys. If the first definition is adapted to fix the problem, it becomes
equivalent to the second, and the relation R is in 2NF.

I don't think my second sentence makes as much sense as I'd like it to.
However, the reason that R is not in 2NF is not because of the status of
D but because of the status of A -- A on its own is not a candidate key,
and the fact that A alone determines D is what prevents R from being in
2NF.

Post by Jens Haase
A->D tells us that D is only dependant on A and so only on a part of the
primary key.

Yes. (And this is a change in stance from yesterday's post.)

Post by Jonathan Leffler
As I see it, AB->C and C->D implies that AB->D so the primary key
determines both C and D, as required. Granted, it is neither in 3NF
nor BCNF - the mnemonic is "the key, the whole key, and nothing but
the key", and the functional dependency A->D means that 'the whole
key' part of the rule is violated. (The C->D part is also problematic,
but doesn't cause the relation schema to violate 2NF.)

By implication, yesterday I said "but R is in 2NF"; today, I think that
was wrong.

Post by Jens Haase
A relation R is in third normal form in whenever X -> A holds in R
and A is not in X, then either X is a candidate key for R, or A is
[a] prime attribute
A->D holds 3NF since D is prime attribute
AB->C holds 3NF because AB is candidate key
C->D holds 3NF because D is prime attribute
D->A holds 3NF because A is prime attribute

Yes. To understand this, we need to establish that AB, BC and CD are
candidate keys - which is discussed below. One of the strengths of BCNF
over 3NF is its greater simplicity.

Date summarizes this nicely (An Introduction to Database Systems, 7th
Edn, p375):

3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
the attributes of R, and let A be any single attribute of R. Then R is
in 3NF if and only if, for every FD X->A in R, at least one of the
following is true:
1. X contains A (so the FD is trivial);
2. X is a superkey;
3. A is contained in a candidate key of R.

BCNF is obtained from this definition of 3NF by dropping condition 3.
(And I note that superkeys make an appearance here.)

This does not depend on the definition of 2NF; some other
characterizations of 3NF have an "if the relvar [relation] is in 2NF and
..." structure.

Since every single attribute on the RHS of an FD in your set satisfies
requirement 3, R must be in 3NF. It is surely not in BCNF, though.

[
Somewhat tangentially - at p400, Date gives the following neat
characterization of BCNF, 4NF and 5NF:

BCNF: A relvar is in BCNF if and only if every FD satisfied by R
is implied by the candidate keys of R.

4NF: A relvar is in 4NF if and only if every MVD satisfied by R
is implied by the candidate keys of R.

5NF: A relvar is in 5NF if and only if every JD satisfied by R
is implied by the candidate keys of R.
]

Post by Jens Haase
According to the second definition R is in 2NF because the
candidate keys are AB, BC, BD and since that, every attribute is
prime attribute, so there is no violation of 2NF.

Now you introduce a term - candidate key - not mentioned in the
definitions of 2NF that we're dissecting. I would dispute that BC
is a candidate key anyway; likewise BD. Given a relation schema R {
A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
by the rules of trivial functional dependencies (or FDs),
AB->ABCDE). That is, a candidate key functionally determines every
attribute in the relation schema. How do you apply Armstrong's
Axioms to the relation schema and list of FDs to deduce that
BC->AD?

OK; I sit corrected - twice.

So, all the attributes are prime attributes, which means that the
relation is in 2NF by either definition (or, at least, by either
definition as amended) because both definitions only refer to non-prime
attributes and your relation schema has no non-prime attributes.

Post by Jens Haase
The [w]hole problem i am facing, is that the first definition is from
a german script and the second is from an english book
(Elmasri/Navathe) and my thoug[h]t is, that there is a confusion
between "Primary key" and "prime".

I still think the difference between the two definitions is mainly in
that the first implicitly assumes that there is just one candidate key
and the second implicitly permits there to be multiple candidate keys,
and therefore the second definition is more general and better.

--
Jonathan Leffler #include <disclaimer.h>
Email: ***@earthlink.net, ***@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Jonathan Leffler

2005-07-22 08:11:06 UTC

Post by Jonathan Leffler
I've included most of Jens original question and rather more of my
initial response to Jens question in this response, so that the
discussion is more nearly complete in one message.

Darn it - I review and review and still the obvious errors escape!

Post by Jens Haase
1: A relation R(A,F) is in 2NF, when every attribute not
belonging to the primary key of R is fully functionally dependent
on the primary key of R

Can you give a hint as to the significance of the parenthesized
"(A,F)"; neither A nor F is referenced in the definition quoted. I
don't think it alters the answer, but I'm curious. (The comma after
2NF should be omitted too.)

A beter version of this definition would be "A relation R is in 2NF when
every attribute that is not part of a candidate key of R is fully
functionally dependent on the candidate key(s) of R".

Post by Jens Haase
R(A,F) means a Relation R with attributes A and functional
dependencies F

Fine - as expected, this makes no difference to the answer.

Post by Jens Haase
2: A relation schema R is in 2NF if every nonprime attribute A in
R is fully functionally dependent on the primary key of R

Assuming 'nonprime attributes' are the attributes that do not
belong to the primary key, it seems to me that the two definitions
are the same.

One of the 'be's on either side of "IMNSHO" is, IMNSHO, redundant.

Post by Jens Haase
A superkey [...]

Superkeys are not relevant to most of this discussion, I think.

Post by Jens Haase
According to this definition there is a difference between the two
definitions.

Just about - and the second is the better definition, but not perfect.

I agree that AB is the primary key - or at least a valid primary key.

AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
(union); and AB->ABCD (reflexivity); and therefore AB determines all
attributes of R, which makes it a candidate key.

Clearly, 'the that' is a typo - sorry. The first sentence is logic
chopping, and may not be fair since I suspect English is not Jens's
first language; nevertheless, it is also accurate. "According ... AB,
then the FD A->D means that R is not in 2NF" would be concise - but
wrong. However, the trouble is that the first definition precludes the
possibility of multiple candidate keys, and therefore cannot be applied
to this relation schema where, as shown below, there are 3 candidate
keys. If the first definition is adapted to fix the problem, it becomes
equivalent to the second, and the relation R is in 2NF.
I don't think my second sentence makes as much sense as I'd like it to.
However, the reason that R is not in 2NF is not because of the status of
D but because of the status of A -- A on its own is not a candidate key,
and the fact that A alone determines D is what prevents R from being in
2NF.

My 'However' sentence is just wrong. R is in 2NF (as stated at the end
of the preceding paragraph). You can tell I did some circular editing.

Post by Jens Haase
A->D tells us that D is only dependant on A and so only on a part of
the primary key.

Yes. (And this is a change in stance from yesterday's post.)

Post by Jonathan Leffler
As I see it, AB->C and C->D implies that AB->D so the primary key
determines both C and D, as required. Granted, it is neither in 3NF
nor BCNF - the mnemonic is "the key, the whole key, and nothing but
the key", and the functional dependency A->D means that 'the whole
key' part of the rule is violated. (The C->D part is also
problematic, but doesn't cause the relation schema to violate 2NF.)

By implication, yesterday I said "but R is in 2NF"; today, I think that
was wrong.

Yes. To understand this, we need to establish that AB, BC and CD are
candidate keys - which is discussed below. One of the strengths of BCNF
over 3NF is its greater simplicity.
Date summarizes this nicely (An Introduction to Database Systems, 7th
3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
the attributes of R, and let A be any single attribute of R. Then R is
in 3NF if and only if, for every FD X->A in R, at least one of the
1. X contains A (so the FD is trivial);
2. X is a superkey;
3. A is contained in a candidate key of R.
BCNF is obtained from this definition of 3NF by dropping condition 3.
(And I note that superkeys make an appearance here.)
This does not depend on the definition of 2NF; some other
characterizations of 3NF have an "if the relvar [relation] is in 2NF and
..." structure.
Since every single attribute on the RHS of an FD in your set satisfies
requirement 3, R must be in 3NF. It is surely not in BCNF, though.
[
Somewhat tangentially - at p400, Date gives the following neat
BCNF: A relvar is in BCNF if and only if every FD satisfied by R
is implied by the candidate keys of R.
4NF: A relvar is in 4NF if and only if every MVD satisfied by R
is implied by the candidate keys of R.
5NF: A relvar is in 5NF if and only if every JD satisfied by R
is implied by the candidate keys of R.
]

Post by Jens Haase
According to the second definition R is in 2NF because the
candidate keys are AB, BC, BD and since that, every attribute is
prime attribute, so there is no violation of 2NF.

Now you introduce a term - candidate key - not mentioned in the
definitions of 2NF that we're dissecting. I would dispute that BC
is a candidate key anyway; likewise BD. Given a relation schema R {
A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
by the rules of trivial functional dependencies (or FDs),
AB->ABCDE). That is, a candidate key functionally determines every
attribute in the relation schema. How do you apply Armstrong's
Axioms to the relation schema and list of FDs to deduce that
BC->AD?

OK; I sit corrected - twice.
So, all the attributes are prime attributes, which means that the
relation is in 2NF by either definition (or, at least, by either
definition as amended) because both definitions only refer to non-prime
attributes and your relation schema has no non-prime attributes.

--
Jonathan Leffler #include <disclaimer.h>
Email: ***@earthlink.net, ***@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Jon Heggland

2005-07-21 16:41:50 UTC

Post by Jens Haase
1: A relation R(A,F) is in 2NF, when every attribute not belonging to
the primary key of R is fully functionally dependent on the primary key of R
2: A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R

[8<]

Post by Jens Haase
Which definition is right?

The second. Normalisation theory doesn't deal with the arbitrary primary
key distinction. The first definition probably makes the assumption that
there is only one key.

--
Jon

Jonathan Leffler

2005-07-22 05:59:26 UTC

Post by Jon Heggland

[8<]

Post by Jens Haase
Which definition is right?

The second. Normalisation theory doesn't deal with the arbitrary primary
key distinction. The first definition probably makes the assumption that
there is only one key.

I'm probably just being obtuse, but ... both definitions talk about 'the
primary key of R'. Given that prime attributes are components of some
candidate key (and therefore non-prime attributes are not a component of
any candidate key), the second is marginally more correct, but would be
more convincingly so if it finished 'dependent on each candidate key of R'.

I can't help but feel that the distinction between the two statements is
trivial - they are near enough equivalent.

--
Jonathan Leffler #include <disclaimer.h>
Email: ***@earthlink.net, ***@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Jon Heggland

2005-07-24 13:22:26 UTC

Post by Jonathan Leffler
I'm probably just being obtuse, but ... both definitions talk about 'the
primary key of R'.

Oops, sorry. Neither is much good, then. But if you replace "the
primary" with "a", they are fine. :)

--
Jon

Dan Guntermann

2005-07-22 08:01:27 UTC