" Recent SCO/Linux News

A.P. Lawrence Home Page
A.P. Lawrence Home
Information and Resources for Unix and Linux Systems

Links: Search Find a Consultant Most Popular Newest Articles Contact Info Books Tests Forum Humor Site Map RSS Feeds
Post Linux Basics Articles OSR5 Unixware Puzzles Opinion Links Rates Home




Recent SCO/Linux News



Index

Recent SCO Security Info

Recent SCO TA's


Path: g2news2.google.com!news2.google.com!newsfeed2.dallas1.level3.net!news.level3.com!newsfeed-00.mathworks.com!enigma.xenitec.ca!jpradley.jpr.com!via-email

From: Bela Lubkin <fi...@armory.com> Subject: Re: Kernel Link Fails Date: Wed, 8 Jul 2009 17:09:58 -0700 <p...@naleco.com> wrote: > Bela Lubkin wrote: > > <rob...@unetix.net> wrote: > >>THANK YOU SCO FOR YEARS OF MISERY!!! > > > > I have to count this one as an own goal. If it is representative of > > your approach to system management over those years, a certain amount of > > misery would be expected... > > So, you give a magistral lecture with your awsome SCO UNIX kung-fu > skills, and then tell the poor chap it is his fault for not being up to > your guru level...! The error message clearly identified two filenames: " i386ld: Symbol Sdsk_no_tag in " /var/opt/K/SCO/link/1.1.1Ga/etc/conf/pack.d/blad/space.o is multiply defined. " First defined in /var/opt/K/SCO/link/1.1.1Ga/etc/conf/pack.d/alad/space.o If Robert had looked in the corresponding directories he would have seen two space.c files, each containing definitions of Sdsk_no_tag and Sdsk_no_sg. Knowledge of C would not be required, they are visible as mere text without any interpretation of the surrounding syntax. If he knew C he would have immediately seen what was wrong and been able to fix it. If not, he would have had a fine question to ask this group or his local advisor. Even without that level of analysis, he had the original error message. He even did a sensible thing and posted it here. But then he was given bad advice by "RedGrittyBrick": " Apparently there's a flaw in RS506a that overwrites the space.o " installed by a SCSI driver BTLD. The solution seems to be to re-install " the drivers from the BTLD. and then followed it incorrectly: " To make matters even worse I uninstalled the alad and blad drivers. That did indeed succeed in making matters worse. RGB didn't say "uninstall", he said "re-install ... from the BTLD". Since alad & blad both come in the 506 system and don't need to be installed from BTLDs, this was misleading advice (besides being wrong). RGB shouldn't have said that (if there is a rs506a-vs-BTLD problem, he didn't describe it correctly and should have provided a reference); RGB should have realized that BTLDs were irrelevant. Then Robert translated the bad advice into "remove and reinstall using `custom`", which was disastrous. Another thing Robert or any of the respondents could have done is spent 30 seconds doing a search on the error messages, which would have turned up at least the following, all of which have correct solutions: <http://groups.google.com/group/comp.unix.sco.misc/msg/28ca07b02a113e98> <http://groups.google.com/group/comp.unix.sco.misc/msg/c2aba177296c9f8b> <http://groups.google.com/group/comp.unix.sco.misc/msg/ac394092930d8450> > I object to your reprimand of him. The problem here is that the system > SHOULD NOT fail doing a required by design operation (the relinking), > whatever the driver combination might be. There should be a fail-safe > condition built-in there someplace. And, given the failure to relink, at > least a meaningful explanation should be given in the error messages to > the user. I object to people not doing the simplest investigation, instead falling back on the MS standard of "reboot, then if that doesn't work, reinstall". I always avoid "remove and reinstall" as far as possible because it's always an open question whether these operations will be safe in an already compromised system. If that is "awesome kung-fu" then so be it. Kernel relinks "should" not fail, but it's a procedure with hundreds of steps -- compiling various bits, linking it all together, creating device nodes, etc. It's complex enough that corner cases can be missed. A current Linux system can be made unbootable by simply deleting or corrupting /boot/grub/menu.list (or any of several GRUB binaries). There are dozens if not hundreds of other single points of failure. No system can be without these. You may wish to object that you could simply boot any of dozens of "live" or "recovery" CDs to repair this. Well yeah. And Robert could have taken any of a number of simple steps to fix his problem. Reinstalling your Linux system to fix the missing GRUB config would be overkill, don't you think? > Now that I come to think of it, maybe this SCO UNIX condition is by > design itself: so the customer always is dependent on the expensive > official SCO support, even to change the IP address! No wonder then the > company is where it is now. Yeah, snarky conspiracy theories are so helpful. Changing an IP address does not require talking to SCO Support. Repairing a system whose kernel relink fails _might_, if the affected user does no troubleshooting and doesn't ask anyone competent for help. There is a certain roundabout truth to your conspiracy theory; a causal chain can be constructed, but the overall path does not exist for the suggested reason. SCO Unix has a "link kit", consisting of precompiled linkable drivers + space.c files to control tunables. This is because, as a non-open-source company, SCO can't or won't ship the kernel source. In order for such a system to be sufficiently configurable in the field, many parameters need to be encoded in those space.c files. Contrast to Linux where many of the same sort of parameters are encoded in the kernel's overall .config file. Linux also has the /proc filesystem, a collection of per-driver kludges to allow certain configuration details to be controlled in a running system. (Now being replaced with /sys, which is a little better.) Anyway, since the link kit exists and requires complex space.c files to set tunable parameters, many opportunities exist for lousy code in a space.c file to cause kernel link failure. For instance, here is the problem code in alad/space.c and blad/space.c (irrelevant details removed): #ifdef SCSI_NSDSK extern int Sdsk_num_disks; extern int Sdsk_num_rb; #else int Sdsk_no_tag = 1; int Sdsk_no_sg = 1; #endif `#ifdef SCSI_NSDSK' is shorthand for `if the Sdsk (SCSI disk) driver is linked in'. So if it's linked in, we get extern declarations of two variables that are completely unused by this driver; otherwise, we _define_ two variables that actually belong to the Sdsk driver. This is a namespace violation, and we end up paying for it. Since all 4 of {alad, blad, ad160, ad320} drivers have this code, linking in two of them at once with no SCSI disks causes a link failure. This is, as I described it, lousy code in a space.c file. SCO receives this stuff from vendors -- Adaptec in this case -- and does various testing on it. Apparently that does not include installing multiple drivers and no SCSI disks. Nor a very careful close look at the proposed contents of the space.c files. Setting aside questions like "should these drivers care about internal Sdsk variables", how could it have gotten access without breaking this way? Here are two possibilities: - accessor function: #ifdef SCSI_NSDSK int alad_Sdsk_no_tag() { extern int Sdsk_no_tag; return Sdsk_no_tag; } int alad_Sdsk_no_sg() { extern int Sdsk_no_sg; return Sdsk_no_sg; } #else int alad_Sdsk_no_tag() { return 1; } int alad_Sdsk_no_sg() { return 1; } #endif - indirect reference (allows [shudder] _write_ access to these Sdsk-private variables): #ifdef SCSI_NSDSK extern int Sdsk_no_tag, Sdsk_no_sg; int *alad_Sdsk_no_tag = &Sdsk_no_tag; int *alad_Sdsk_no_sg = &Sdsk_no_sg; #else int alad_Sdsk_no_tag_dummy = 1; int alad_Sdsk_no_sg_dummy = 1; int *alad_Sdsk_no_tag = &alad_Sdsk_no_tag_dummy; int *alad_Sdsk_no_sg = &alad_Sdsk_no_sg_dummy; #endif Should the end user (system administrator) need to know about this stuff? No -- and generally they don't. Sometimes there's a problem, then someone needs to do some troubleshooting. Not reinstalling things with a shotgun. Stepping even further back: why must you take out your hate of the software's current owners on the software itself? OpenServer was written and even shipped long before the Evil Overlords^W^W^W Caldera took over. It isn't to blame for any ill-advised lawsuits. People _using_ the software certainly aren't to blame. Chill out. >Bela<

Index

Comments /News/sconews0645.html


Add your comments