I've included most of Jens original question and rather more of my
initial response to Jens question in this response, so that the
discussion is more nearly complete in one message.
Post by Jens HaasePost by Jonathan LefflerPost by Jens Haase1: A relation R(A,F) is in 2NF, when every attribute not
belonging to the primary key of R is fully functionally dependent
on the primary key of R
Can you give a hint as to the significance of the parenthesized
"(A,F)"; neither A nor F is referenced in the definition quoted. I
don't think it alters the answer, but I'm curious. (The comma after
2NF should be omitted too.)
A beter version of this definition would be "A relation R is in 2NF when
every attribute that is not part of a candidate key of R is fully
functionally dependent on the candidate key(s) of R".
Post by Jens HaaseR(A,F) means a Relation R with attributes A and functional
dependencies F
Fine - as expected, this makes no difference to the answer.
Post by Jens HaasePost by Jonathan LefflerPost by Jens Haase2: A relation schema R is in 2NF if every nonprime attribute A in
R is fully functionally dependent on the primary key of R
Assuming 'nonprime attributes' are the attributes that do not
belong to the primary key, it seems to me that the two definitions
are the same.
This definition (the second) is from Elmasri/Navate Fundamentals of
Database Systems Second Edition Page 411.
An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a
member of any candidate key.
It certainly helps to know the extra background. As I noted in a
separate response to Jon Heggland's comment, neither of the definitions
clearly allows for multiple candidate keys, though given this definition
of non-prime attribute, you can argue that this version does. However,
even so, it would be IMNSHO be better phrased if it finished 'dependent
on each candidate key of R'.
Superkeys are not relevant to most of this discussion, I think.
Post by Jens HaaseIf a relation schema has more than one key, each is called a
candidate key. One of the candidate keys is arbitrarily designated to
be the primary key, and the others are called secondary keys
Secondary key is not a common term for the non-primary candidate keys
(though it is more logical in some respects than 'alternative key',
which is the term I'm more familiar with). These days there is no
formal reason to distinguish one candidate key as the primary key and
the others by another name. Normalization theory is best defined in
terms of candidate keys (see Date 'The Primacy of Primary Keys: An
Investigation' in 'Relational Database: Writings 1991-1994').
Post by Jens HaaseAccording to this definition there is a difference between the two
definitions.
Just about - and the second is the better definition, but not perfect.
Post by Jens HaasePost by Jonathan LefflerPost by Jens HaaseAccording to the first definition the following schema would not
R = (A,B,C,D)
A->D
AB->C
C->D
D->A
According to the first definition when the primary key is AB, then
A->D violates 2NF, because D is not part of the primary key.
I agree that AB is the primary key - or at least a valid primary key.
AB->C (given); C->D (given); hence AB->D (transitivity); hence AB->CD
(union); and AB->ABCD (reflexivity); and therefore AB determines all
attributes of R, which makes it a candidate key.
Post by Jens HaasePost by Jonathan LefflerA->D itself cannot violate 2NF; it is a functional dependency, and
relation schemas are what satisfy or violate 2NF. And the reasoning
the that D is not part of the primary key is bogus too.
Clearly, 'the that' is a typo - sorry. The first sentence is logic
chopping, and may not be fair since I suspect English is not Jens's
first language; nevertheless, it is also accurate. "According ... AB,
then the FD A->D means that R is not in 2NF" would be concise - but
wrong. However, the trouble is that the first definition precludes the
possibility of multiple candidate keys, and therefore cannot be applied
to this relation schema where, as shown below, there are 3 candidate
keys. If the first definition is adapted to fix the problem, it becomes
equivalent to the second, and the relation R is in 2NF.
I don't think my second sentence makes as much sense as I'd like it to.
However, the reason that R is not in 2NF is not because of the status of
D but because of the status of A -- A on its own is not a candidate key,
and the fact that A alone determines D is what prevents R from being in
2NF.
Post by Jens HaaseA->D tells us that D is only dependant on A and so only on a part of the
primary key.
Yes. (And this is a change in stance from yesterday's post.)
Post by Jens HaasePost by Jonathan LefflerAs I see it, AB->C and C->D implies that AB->D so the primary key
determines both C and D, as required. Granted, it is neither in 3NF
nor BCNF - the mnemonic is "the key, the whole key, and nothing but
the key", and the functional dependency A->D means that 'the whole
key' part of the rule is violated. (The C->D part is also problematic,
but doesn't cause the relation schema to violate 2NF.)
By implication, yesterday I said "but R is in 2NF"; today, I think that
was wrong.
Post by Jens HaaseA relation R is in third normal form in whenever X -> A holds in R
and A is not in X, then either X is a candidate key for R, or A is
[a] prime attribute
A->D holds 3NF since D is prime attribute
AB->C holds 3NF because AB is candidate key
C->D holds 3NF because D is prime attribute
D->A holds 3NF because A is prime attribute
Yes. To understand this, we need to establish that AB, BC and CD are
candidate keys - which is discussed below. One of the strengths of BCNF
over 3NF is its greater simplicity.
Date summarizes this nicely (An Introduction to Database Systems, 7th
Edn, p375):
3NF (Zaniolo's definition): Let R be a relvar, let X be any subset of
the attributes of R, and let A be any single attribute of R. Then R is
in 3NF if and only if, for every FD X->A in R, at least one of the
following is true:
1. X contains A (so the FD is trivial);
2. X is a superkey;
3. A is contained in a candidate key of R.
BCNF is obtained from this definition of 3NF by dropping condition 3.
(And I note that superkeys make an appearance here.)
This does not depend on the definition of 2NF; some other
characterizations of 3NF have an "if the relvar [relation] is in 2NF and
..." structure.
Since every single attribute on the RHS of an FD in your set satisfies
requirement 3, R must be in 3NF. It is surely not in BCNF, though.
[
Somewhat tangentially - at p400, Date gives the following neat
characterization of BCNF, 4NF and 5NF:
BCNF: A relvar is in BCNF if and only if every FD satisfied by R
is implied by the candidate keys of R.
4NF: A relvar is in 4NF if and only if every MVD satisfied by R
is implied by the candidate keys of R.
5NF: A relvar is in 5NF if and only if every JD satisfied by R
is implied by the candidate keys of R.
]
Post by Jens HaasePost by Jonathan LefflerPost by Jens HaaseAccording to the second definition R is in 2NF because the
candidate keys are AB, BC, BD and since that, every attribute is
prime attribute, so there is no violation of 2NF.
Now you introduce a term - candidate key - not mentioned in the
definitions of 2NF that we're dissecting. I would dispute that BC
is a candidate key anyway; likewise BD. Given a relation schema R {
A, B, C, D, E } with candidate key AB, by definition, AB->CDE (and,
by the rules of trivial functional dependencies (or FDs),
AB->ABCDE). That is, a candidate key functionally determines every
attribute in the relation schema. How do you apply Armstrong's
Axioms to the relation schema and list of FDs to deduce that
BC->AD?
Well since C->D and D->A with Armstrongs 3. Axiom we get C->A that
gives us C->DA and with Armstrongs 2. Axiom we get BC->BDA, so BC is
a candidate key.
since D->A with Armstrongs 2. Axiom we get BD->AB and with AB->C with
Armstrongs 3. Axiom we get BD->C so we get BD->ABC, so BD also is a
candidate key.
OK; I sit corrected - twice.
So, all the attributes are prime attributes, which means that the
relation is in 2NF by either definition (or, at least, by either
definition as amended) because both definitions only refer to non-prime
attributes and your relation schema has no non-prime attributes.
Post by Jens HaaseThe [w]hole problem i am facing, is that the first definition is from
a german script and the second is from an english book
(Elmasri/Navathe) and my thoug[h]t is, that there is a confusion
between "Primary key" and "prime".
I still think the difference between the two definitions is mainly in
that the first implicitly assumes that there is just one candidate key
and the second implicitly permits there to be multiple candidate keys,
and therefore the second definition is more general and better.
--
Jonathan Leffler #include <disclaimer.h>
Email: ***@earthlink.net, ***@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/