I'll make a prediction: There will be more crashes. The next one, post "retrofit", will end the company.
(Reuters) - A review by a U.S. Federal Aviation Administration panel into Boeing Co’s grounded 737 MAX aircraft found a planned software update and training revisions to be “operationally suitable,” the agency said Tuesday, an important milestone in getting the planes back in the air.
You can't fix the problems the 737Max has with software alone.
First, the limits of authority for the MCAS system are entirely inappropriate. An aircraft that needs an "anti-stall" feature that can consume half of the stabilizer trim authority with one actuation is horribly unbalanced. The design is dangerous.
The 0.6 degree original design is reasonable; the revision to 2.5 degrees is not. Rather than find the reason(s) for the requirement to quadruple that authority and correct them in the design so the 0.6 degree authority is sufficient Boeing intentionally concealed that authority change from the FAA.
This is the root of the issue.
When you're off by 400% in your engineering predictions, as disclosed by testing, you have a dangerous situation. The correct thing to do in such a circumstance is to go back, figure out why that happened and change whatever you need to so it doesn't happen anymore. That would have meant a redesign of material components of the 737MAX (likely involving flight surfaces such as the wing geometry itself, where it's attached to the aircraft, etc) which Boeing was unwilling to do for time and cost reasons; today it would likely mean literal scrapping of all the existing hulls and starting over. That's not going to happen either as today it might well bankrupt the company.
The reason for this set of facts is quite simple -- there is a set of both probabilities and outcomes that either fall into or outside of the window of acceptability for a transport aircraft. A failure will lead to discomfort but not death (of anyone) is one you can accept occurring, since nobody dies and at worst it causes inconvenience.
One that could kill some number of people (but not crash the airplane) is much more-severe.
One that can crash the airplane is the most-severe.
The problem with all of these as the FAA defines them is that there is also a probability table associated with them. This is a fundamental fuck-up in the FAA's legal mandate and it must be changed.
Let me give you an example of the complete horseshit that MTBF and "error rate" figures present to people:
Furthermore, Deskstar NAS hard drives incorporate a rotational vibration sensor and achieve reliability of 1M hours MTBF.
One millions hours MTBF (that is, the mean time between failures) is the common specification for computer hard drives.
There are 8,760 hours in a year. This means that if you have one such disk you should expect it to last, on average, 114 years, far longer than you will.
This abuse of statistics is utter and complete horseshit. First, a disk drive is a mechanical thing. Like all mechanical things that have moving parts in it the parts wear when used. Specifically, the mechanism that positions the heads has moving parts that can wear (get "sloppy") and so does the motor that drives the platters (the "disks" inside.) Neither will last 114 years while operating under any rational set of conditions, ever, period.
So let's say you have 100 of these drives. Well now, see, the probability isn't that each will likely last 114 years. No, and no. It's that the manufacturer predicts that if you have 100 of them you'll lose about 1.1 of them every year. And guess what -- most of these fail at somewhere around, or perhaps a bit better, than those numbers. Nobody seriously expects the one disk you buy to last 114 years, and it won't. You can count on that.
1 in a million is 1 x 10^-6. This sounds very improbable but in fact as you can see it really isn't at all.
There's a lot of engineering judgement that goes into these analysis and the severity that will follow. The original MCAS design was permitted because with only 0.6 degree of authority it was judged that if it failed it would not crash the plane. Quadrupling the authority without re-analyzing the outcome was intentional -- even if by omissions -- because under the rules any modification that may impact severity must be re-analyzed.
Making the system less-likely to make a mistake (e.g. by forcing the use of both AOA sensors, for example) does not solve the problem in the general sense -- because it can't.
This is the comment I have just transmitted to the FAA on their coddling of both Boeing and willful dereliction of duty and disregard of the FARs governing transport aircraft by proposing to "accept" Boeing's changes:
Gentlemen;
The 737MAX "software revision" is, as has been described in publicly-available documents, insufficient as a fix for the root cause of the two crashes and loss of more than 300 lives that occurred.
During the original design of the MAX it has been disclosed that the MCAS system was implemented into the flight control "law" software to alleviate a materially-larger "pitch up" moment that could arise as a result of the larger, higher-bypass engines fitted to the MAX series of aircraft. The original specifically called for this software to have 0.6 degrees of stabilizer trim authority and the failure consequences and probabilities were analyzed on that basis.
Flight testing before certification disclosed that 2.5 degrees, or approximately half of the total range from neutral to the stop in either direction for the stabilizer trim jackscrew, was actually required for MCAS to perform the desired function. In addition the data recorder graphs from the Ethiopian crash appear to show that MCAS is capable of, and does, drive the jackscrew at roughly double the rate of a yoke command from the pilots, making its application of negative ("nose down") trim extremely violent on a comparative basis with that applied by the pilots.
This change was not reflected back into the design documents and failure analysis, including both probability and outcomes from failures, was thus not performed with this 400% increase in automated command authority.
Had that analysis been re-run it would have disclosed, as we now know from the two hull losses, that an unrecoverable aircraft attitude at moderate altitudes (< 10,000 AGL) could occur due to erroneous activation of the system. Two such failures did occur and in at least one, it has been disclosed that the checklist was run for that failure by the pilot and first officer and failed to restore the aircraft to controllable flight.
Patching the software, assuming the 2.5 degree limit of authority remains as it is required for the MCAS system to function, cannot resolve the root issue. While improving the reliability of sensor input (e.g. by requiring both sensors to be "always hot" and, if there is a disagreement, not engaging MCAS) at first blush appears to be sufficient to remove the failure mode, it is not and accepting same as a sufficient remedy must not be allowed.
The 737NG, from the published manual pages and block diagrams I have been sent copies of, appears to show that all flight computer access to the stabilizer trim runs through only the right side disconnect switch for stabilizer trim. The 737MAX emergency procedure that the pilots in the Ethiopian Air crash used, however, specifies that both switches are to be pulled in the event of a runaway and remain off for the remainder of the flight. This strongly implies that on the MAX aircraft computer trim authority can be exerted if even only the left-side, or "master", stabilizer trim power switch is on.
Since we have had demonstrated twice that loss of ability to operate stabilizer trim can and will result in the loss of the hull and significant or all life onboard it is not acceptable for there to be any operational part of the flight envelope, where the aircraft remains intact and controllable, that leaves the pilots with no means of adjusting stabilizer trim without an outside direction, in this case an insane computer irrespective of the root cause of the machine's insanity, overriding their input.
Ethiopian's crash documented by the CVR that the checklist procedure, which called for the handwheel operation of the trim in the event of a runaway, was inoperative due to aerodynamic forces the pilots could not manually crank against.
This is not acceptable and software fixes cannot resolve a hardware problem.
Assuming MCAS or any other part of the flight computer complex requires sufficient stabilizer trim authority to place the airframe in jeopardy should it malfunction it must be able to be disconnected from said system. Since functional stabilizer trim adjustment is required for the aircraft to be airworthy under all conditions of the flight envelope there must always be two operational means of changing same. MCAS is by no means the only possible failure in said flight control system; not only is that software highly-complex and has other stabilizer-trim functionality (e.g. mach variation, autotrim related to flap position, etc) it has both sensor input and physical output (e.g. power FETs to drive the output circuits, contractors, etc) which can fail as well, some of which are single-path failure points.
Therefore, at minimum the following is required:
1. The flight control computer, including but not limited to MCAS, must be able to be disconnected from the stabilizer trim electrical drive circuit without impacting the electrical trim switches on the command yokes. The NG block diagram appears to show that this is the case, while the MAX emergency procedure strongly implies otherwise. Either the MAX procedure is wrong and must be corrected or the physical wiring in the MAX must be modified so that the flight control computer can be positively severed from stabilizer electrical trim control by disconnecting the right-side switch, leaving the master enabled and the yoke switches available.
2. Due to the fact that stabilizer trim must always be able to be modified at all times in the flight envelope under pilot command for the aircraft to remain controllable a condition where the manual trim wheels are inoperative due to aerodynamic loads is unacceptable. Therefore a second minimum change is for the gearing ratio to be modified such that under any set of flight conditions where catastrophic hull damage has not yet occurred the hand cranks must be able to be actually operated by any person of sufficient physical capacity to have either a pilot or first officer flight certificate.
To return the 737MAX to certified status before both of these changes are implemented is, in my opinion as a person who has been writing software for approximately 30 years, including embedded software that controls potentially dangerous machinery, unwise and appears to be in violation of the FARs governing transport aircraft. Such a decision is, in my opinion, likely to result in additional lost hulls and loss of life.
I have no faith that the FAA will in fact insist on the above two changes, as both cost money and involve physical, not software modifications. However, absent both the 737MAX will remain 100% reliant on its flight control computer never suffering insanity irrespective of cause, failure of which has a high probability of killing everyone on board.