Derek Ignatius Asirvadem
2021-06-14 04:35:12 UTC
I have started a new thread for two issues:
-- CASCADE
-- Benchmark Considerations
that were raised in this thread/post:
https://groups.google.com/g/comp.databases.theory/c/pCxJKwKMsgc/m/ZPCw002wAQAJ
CASCADE), are inefficient, right?
“Inefficient” is putting it mildly, it misses the relevant consideration.
CASCADE is simply not allowed, it isn’t even available on commercial SQL platforms.
Why not ? God help me, an explanation is demanded.
Because high-end SQL Platforms are heavily OLTP oriented.
And CASCADE would be a disaster in an OLTP context.
It is not about my experience, it is about knowledge of the server; server resources; maximising concurrency; minimising contention. Overall configuration and monitoring, first as a general task on every server, and then second, for the particular app and database.
Theoreticians in this space, in their total abdication of responsibility, say that the server is a black box, and that they should not concern themselves with *ANY* implementation concerns. That is like saying the engineer of the car should not concern himself with the exercise of driving. It is the asylum, where total incapacity is allowed, propagated to “science”. It is acceptable only for classroom exercises, utterly bankrupt outside the classroom.
Implementation concerns are the everyday fare for a DBA, and certainly the concerns of a Data Modeller in the latter stages. The black box is a physical reality, it is grey and blue, it fits into a rack in the computer room. All requests to the server pass through it. It is not an abstraction in the classroom that can be ignored.
From this post:
__ https://groups.google.com/g/comp.databases.theory/c/Uwc_w8HbBfw/m/trHkR1F8Mx8J
Take “business rule” in the referenced post as your “cascade” here. Cut-paste:
“
Ok, that means you do not understand the world of implementation.
1. On one side, where the business gives us "business rules", they are not to be taken as implementation imperatives. If taken as such, we would be merely clerks, implementing their requirements, without using the skills that they hired us for. Eg. we would implement a "business transaction" that updated six million rows, that hung the users up for 15 minutes in the middle of the day, and we would take no responsibility, because the business "told us to do it".
1.a Obviously, we do not do that. We exercise the skills we were hired for. Part of which is to implement OLTP Standard-compliant transactions. We do not view the business requirements as imperatives, we view them as initial requirement statements. We work back and forth, such that the requirements are modified, then accepted, and then implemented, such that they do not crash the system; such that the database does not have circular references; etc; etc; etc.
1.b So the example "business transaction" would be converted into a batch job that runs in a loop and executes six million OLTP Standard-compliant single-row transactions. The batch job keeps track of its position; is restartable; etc. So the business gets the requirement they want, but not in the METHOD that they initially stated it. Ie. Just tell me what you want, don't tell me how to do it.
1.c On this one side, in no case is a business rule to be taken as an imperative.
“
<<<<
Also this post:
__ https://groups.google.com/g/comp.databases.theory/c/qqmnhu036FQ/m/RLh9D5Ue1kUJ
please read this section:
__ III - Batch Transaction
<<<<
The most contentious object in the database is the Transaction Log (or its equivalent on MVCC systems, and by any name, in any location).
__ On “2PL” systems, it is a small, hot object, and we try to keep it small (I am not giving you the entire science here, but there are clear, defined rules, such as OLTP Standards).
__ On MVCC systems, it is huge and spread across the entire file space, and the developers are unaware of it, thus trying to constrain its use is not even a remote possibility. Everyone is programmed to pretend that the database is single-user and that they are the single user (the Stonebraker insanity). MVCC does not have ACID.
Note, Functions do not have to be Transactional or NotTransactional, that is a stupid artefact of some pretend-sqls. In ACID, Transactions are declared by the caller, not by the called object.
By virtue of implementation Standards (Software Gems in particular, because we guarantees high concurrency, zero deadlocks), here OLTP Standards, the limit for number of rows affected in an OLTP Transaction is 100, for batch Transactions 500.
So think that out. Write an ACID Transaction (no I am not being silly, I realise you can’t on your non-SQL platform, so you have to think in SQL terms, for an SQL platform). A stored proc, that:
- navigates the *levels* of the Tree,
- and loops,
- executing max 100 INSERTs per BEGIN::COMMIT for the new Key,
- then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.
Whereas CASCADE or the equivalent will update six million rows under the covers and hang up the database (prevent other updaters from progressing) for a week or two, this OLTP Transaction will execute in minutes, without hanging anyone else up.
<<
*Benchmark* generally implies not one but two platforms, and a comparison. So I would plead that you obtain a “2PL” platform for the second. The commercial SQL Platforms all provide a “developer version” which is free and limited (eg. number of simultaneous users or max table size; etc). There is no point is comparing one MVCC monster with yet another MVCC monster, you will learn nothing in the benchmark category (except difference in internals of the two freeware suites).
*Benchmark* strongly implies an ability to monitor the server, all metrics that pertain to performance (in general), and the particular metrics that pertain to the particular benchmark. Freeware has no such thing, so you will be working in the dark, with little idea re what is happening under the covers.
For freeware & Oracle, due to not having a Server Architecture, and instead deploying hundreds or thousands of programs running on Unix, the first avenue of “performance monitoring” and even “problem diagnosis”, is via Unix monitoring:
-- top
-- vmstat
For contrast, take a glimpse of what is available in commercial SQLs. The server is a genuine Server Architecture, so the monitoring regards internal metrics, and very little of Unix (set up correctly once, and forgotten).
0. Monitoring products
Additional licence fee, not shown. Feel free to search the internet.
1. Raw Stats: text, various forms. Free.
Voluminous, not shown. Can be ascertained from the following.
2. Raw Stats Formatted, especially for various types of comparison.
Uses a simple script to produce [2] from [1].
Here, I fixed an error that the SAN configuration boys made. the comparison is Before::After the change: virtually the same as a benchmark, but this is monitoring the production server at the largest teaching hospital in America. With permission of course.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
“Div” is the division of the metric by the relevant denominator.
“Dlt” is the delta, between that column-of-stats and the first column-of-stats, as a percentage, with {-|+|x}.
“RUt” is Resource Utilisation, shown for unit resources, as a percentage of the ResourceGroup (which is shown above the unit resources)
“Engine” is a single Unix Process. The Sybase ASE server comprises a grand total of FOUR Engines in this instance, and serves hundreds of active connections (doctors; nurses; medical images; client histories; etc). In contrast, freeware and Oracle would have hundreds of Unix Processes and no concept of Engine.
The entire server config is set up by me, not the local DBA, whom I support. That means the resource allocation strategy is proprietary, so all the resources are shown, but the names [eg. cache names] have been obfuscated.
- IIRC the freeware, and Oracle, have no concept of a cache, let alone control of it.
- Disks [SAN objects] are the slowest link in the chain, and the hardest to change, so they are set up correctly, once and for all.
--- Until some idiot in the SAN team made an undeclared change, that had a negative effect, that I had to diagnose, without knowledge of the change.
This is a “2PL” server, you may find the Lock Manager stats interesting.
Likewise the Transaction section.
The app is not a very good one, it is stupid, because Parallelism has to be turned off, in order for the app to work. It is not strict OLTP, but good enough in the sense that it does not cause problems. Its ACID profile is hopeless. Nevertheless, via resource management, I have managed to reduce problems and virtually eliminate deadlocks.
3. Executive reports
Pretty pictures for those with a short attention span.
Uses a similar simple script to produce a CSV file from [1], and then Excel/Number to erect the charts.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20Xact%20Lock.pdf
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20DiskGroup.pdf
“Monitor” here is a database name, where a monitoring product [0] stores its info. That the execs use. As you can see, the overhead is huge, and they can’t diagnose anything in months. I do not use such filth, I use the raw stats, and I can diagnose any problem, in minutes. So ignore “Monitor” here, and compare the rest.
“Dbcc” is also a database name, it is storage for Database Consistency Check, which is an overnight batch job, that comes with the product. It performs its namesake, plus it collects table/index/column statistics (as distinct from server statistics).
“Data” means the user database.
“Log” means the Transaction Log.
“APF” means Asynch Pre-Fetch, a read-ahead facility of the server, in which either 8 pages [Extent] or 256 pages [Alloc Unit] is read during index or table scans. Integrated with the caches of course.
All disk access is Asynch, on Raw Partitions, not filesystem files.
Cheers
Derek
-- CASCADE
-- Benchmark Considerations
that were raised in this thread/post:
https://groups.google.com/g/comp.databases.theory/c/pCxJKwKMsgc/m/ZPCw002wAQAJ
Note that one should be used to that [DELETE+INSERT instead of UPDATE] because UPDATE any Key component
is not permitted, it must be DELETE+INSERT, in a Transaction of
course, that moves the entire hierarchy belonging to the Key.
That is because in your experience, cascading updates (UPDATE ...is not permitted, it must be DELETE+INSERT, in a Transaction of
course, that moves the entire hierarchy belonging to the Key.
CASCADE), are inefficient, right?
CASCADE is simply not allowed, it isn’t even available on commercial SQL platforms.
Why not ? God help me, an explanation is demanded.
Because high-end SQL Platforms are heavily OLTP oriented.
And CASCADE would be a disaster in an OLTP context.
It is not about my experience, it is about knowledge of the server; server resources; maximising concurrency; minimising contention. Overall configuration and monitoring, first as a general task on every server, and then second, for the particular app and database.
Theoreticians in this space, in their total abdication of responsibility, say that the server is a black box, and that they should not concern themselves with *ANY* implementation concerns. That is like saying the engineer of the car should not concern himself with the exercise of driving. It is the asylum, where total incapacity is allowed, propagated to “science”. It is acceptable only for classroom exercises, utterly bankrupt outside the classroom.
Implementation concerns are the everyday fare for a DBA, and certainly the concerns of a Data Modeller in the latter stages. The black box is a physical reality, it is grey and blue, it fits into a rack in the computer room. All requests to the server pass through it. It is not an abstraction in the classroom that can be ignored.
From this post:
__ https://groups.google.com/g/comp.databases.theory/c/Uwc_w8HbBfw/m/trHkR1F8Mx8J
Take “business rule” in the referenced post as your “cascade” here. Cut-paste:
“
Ok, that means you do not understand the world of implementation.
1. On one side, where the business gives us "business rules", they are not to be taken as implementation imperatives. If taken as such, we would be merely clerks, implementing their requirements, without using the skills that they hired us for. Eg. we would implement a "business transaction" that updated six million rows, that hung the users up for 15 minutes in the middle of the day, and we would take no responsibility, because the business "told us to do it".
1.a Obviously, we do not do that. We exercise the skills we were hired for. Part of which is to implement OLTP Standard-compliant transactions. We do not view the business requirements as imperatives, we view them as initial requirement statements. We work back and forth, such that the requirements are modified, then accepted, and then implemented, such that they do not crash the system; such that the database does not have circular references; etc; etc; etc.
1.b So the example "business transaction" would be converted into a batch job that runs in a loop and executes six million OLTP Standard-compliant single-row transactions. The batch job keeps track of its position; is restartable; etc. So the business gets the requirement they want, but not in the METHOD that they initially stated it. Ie. Just tell me what you want, don't tell me how to do it.
1.c On this one side, in no case is a business rule to be taken as an imperative.
“
<<<<
Also this post:
__ https://groups.google.com/g/comp.databases.theory/c/qqmnhu036FQ/m/RLh9D5Ue1kUJ
please read this section:
__ III - Batch Transaction
<<<<
The most contentious object in the database is the Transaction Log (or its equivalent on MVCC systems, and by any name, in any location).
__ On “2PL” systems, it is a small, hot object, and we try to keep it small (I am not giving you the entire science here, but there are clear, defined rules, such as OLTP Standards).
__ On MVCC systems, it is huge and spread across the entire file space, and the developers are unaware of it, thus trying to constrain its use is not even a remote possibility. Everyone is programmed to pretend that the database is single-user and that they are the single user (the Stonebraker insanity). MVCC does not have ACID.
Note, Functions do not have to be Transactional or NotTransactional, that is a stupid artefact of some pretend-sqls. In ACID, Transactions are declared by the caller, not by the called object.
By virtue of implementation Standards (Software Gems in particular, because we guarantees high concurrency, zero deadlocks), here OLTP Standards, the limit for number of rows affected in an OLTP Transaction is 100, for batch Transactions 500.
So think that out. Write an ACID Transaction (no I am not being silly, I realise you can’t on your non-SQL platform, so you have to think in SQL terms, for an SQL platform). A stored proc, that:
- navigates the *levels* of the Tree,
- and loops,
- executing max 100 INSERTs per BEGIN::COMMIT for the new Key,
- then executing max 100 DELETEs per BEGIN::COMMIT for the old Key.
Whereas CASCADE or the equivalent will update six million rows under the covers and hang up the database (prevent other updaters from progressing) for a week or two, this OLTP Transaction will execute in minutes, without hanging anyone else up.
<<
I'd like to do some benchmarks.
Great idea. Excellent for obtaining experience in real world issues. But Whoa, that means you have to acquire knowledge of the physical implementation, both what you want to do in your database and how that translates into physical objects, as well as how your pretend-platform implements it. Eg. Indices; types of indices; Transaction Log or additional rows on every touched page plus a garbage collector. Etc, etc, etc.*Benchmark* generally implies not one but two platforms, and a comparison. So I would plead that you obtain a “2PL” platform for the second. The commercial SQL Platforms all provide a “developer version” which is free and limited (eg. number of simultaneous users or max table size; etc). There is no point is comparing one MVCC monster with yet another MVCC monster, you will learn nothing in the benchmark category (except difference in internals of the two freeware suites).
*Benchmark* strongly implies an ability to monitor the server, all metrics that pertain to performance (in general), and the particular metrics that pertain to the particular benchmark. Freeware has no such thing, so you will be working in the dark, with little idea re what is happening under the covers.
For freeware & Oracle, due to not having a Server Architecture, and instead deploying hundreds or thousands of programs running on Unix, the first avenue of “performance monitoring” and even “problem diagnosis”, is via Unix monitoring:
-- top
-- vmstat
For contrast, take a glimpse of what is available in commercial SQLs. The server is a genuine Server Architecture, so the monitoring regards internal metrics, and very little of Unix (set up correctly once, and forgotten).
0. Monitoring products
Additional licence fee, not shown. Feel free to search the internet.
1. Raw Stats: text, various forms. Free.
Voluminous, not shown. Can be ascertained from the following.
2. Raw Stats Formatted, especially for various types of comparison.
Uses a simple script to produce [2] from [1].
Here, I fixed an error that the SAN configuration boys made. the comparison is Before::After the change: virtually the same as a benchmark, but this is monitoring the production server at the largest teaching hospital in America. With permission of course.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Delta.pdf
“Div” is the division of the metric by the relevant denominator.
“Dlt” is the delta, between that column-of-stats and the first column-of-stats, as a percentage, with {-|+|x}.
“RUt” is Resource Utilisation, shown for unit resources, as a percentage of the ResourceGroup (which is shown above the unit resources)
“Engine” is a single Unix Process. The Sybase ASE server comprises a grand total of FOUR Engines in this instance, and serves hundreds of active connections (doctors; nurses; medical images; client histories; etc). In contrast, freeware and Oracle would have hundreds of Unix Processes and no concept of Engine.
The entire server config is set up by me, not the local DBA, whom I support. That means the resource allocation strategy is proprietary, so all the resources are shown, but the names [eg. cache names] have been obfuscated.
- IIRC the freeware, and Oracle, have no concept of a cache, let alone control of it.
- Disks [SAN objects] are the slowest link in the chain, and the hardest to change, so they are set up correctly, once and for all.
--- Until some idiot in the SAN team made an undeclared change, that had a negative effect, that I had to diagnose, without knowledge of the change.
This is a “2PL” server, you may find the Lock Manager stats interesting.
Likewise the Transaction section.
The app is not a very good one, it is stupid, because Parallelism has to be turned off, in order for the app to work. It is not strict OLTP, but good enough in the sense that it does not cause problems. Its ACID profile is hopeless. Nevertheless, via resource management, I have managed to reduce problems and virtually eliminate deadlocks.
3. Executive reports
Pretty pictures for those with a short attention span.
Uses a similar simple script to produce a CSV file from [1], and then Excel/Number to erect the charts.
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20Xact%20Lock.pdf
__ https://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20DiskGroup.pdf
“Monitor” here is a database name, where a monitoring product [0] stores its info. That the execs use. As you can see, the overhead is huge, and they can’t diagnose anything in months. I do not use such filth, I use the raw stats, and I can diagnose any problem, in minutes. So ignore “Monitor” here, and compare the rest.
“Dbcc” is also a database name, it is storage for Database Consistency Check, which is an overnight batch job, that comes with the product. It performs its namesake, plus it collects table/index/column statistics (as distinct from server statistics).
“Data” means the user database.
“Log” means the Transaction Log.
“APF” means Asynch Pre-Fetch, a read-ahead facility of the server, in which either 8 pages [Extent] or 256 pages [Alloc Unit] is read during index or table scans. Integrated with the caches of course.
All disk access is Asynch, on Raw Partitions, not filesystem files.
So,
how do you code them?
start transaction;
update Grand1_Grand2...GrandN_Child;
update Grand1_Grand2...Grand(N-1)_Child
...
update Child;
update Parent;
commit;
I think I have given adequate directions above, you have to “move” the whole Tree. Please try again with those in mind.how do you code them?
start transaction;
update Grand1_Grand2...GrandN_Child;
update Grand1_Grand2...Grand(N-1)_Child
...
update Child;
update Parent;
commit;
Cheers
Derek