If this isn't exactly what you wanted, please try our Search (there's a LOT of techy and non-techy stuff here about Linux, Unix, Mac OS X and just computers in general!):
From: Bela Lubkin <belal@sco.com>
Subject: Re: Xwindow hang on osr507
Date: Wed, 8 Oct 2003 08:39:36 GMT
References: <3F81C954.12D82EC5@tenzing.org> <20031006203426.GL672@jpradley.jpr.com> <20031006232654.GA714@sco.com> <3F83582E.4C0B941C@tenzing.org>
Roger Cornelius wrote:
> > > | I have two dissimilar 5.0.7 systems which exhibit the same problem.
> > > | When exiting from a console X session, X hangs approximately 75% of the
> > > | time. It appears to be exiting, but I end up with a blank root window
> > > | with the crosshatch pattern and an "x" as the mouse pointer. I can move
> > > | the pointer but nothing else. Alt-Fkey or ctrl-prtscreen will switch
> > > | away, but I just get a blank screen. Attempting to switch to another
> > > | tty again results in a beep.
> > > |
> > > | The systems:
> > > | IBM x345
> > > | SCO odt window manager
> > > | On board video identified by mkdev graphics as:
> > > | ATI RAGE PRO/LT-PRO/XL/Mobility (P/M/M1)
> > > | Also tried an ATI Xpert@Play card with same results.
> > > |
> > > | Dell Precision 330
> > > | fvwm2 window manager
> > > | Matrox Millenium G200 (configured for Matrox G100/G200/G400 series
> > > | adapters)
> > > |
> > > | Both systems have osr507mp and osr507up installed.
> > > |
> > > | I've tried various resolution configurations in mkdev graphics but no
> > > | change in the problem.
> > > |
> > > | After the hang and from another login, I can kill the X process which
> > > | results in a black or sometimes garbled screen. I can log in again,
> > > | though I can't see what's happening on the screen. On the Dell box, I
> > > | can then log out and the screen returns to normal. On the IBM box,
> > > | logging out just gives me another blank screen.
I asked you to try editing each entry in the active grafinfo file to
add:
> > MEMORY(VID, 0x000A0000,0x0020000); /* Standard VGA video memory window */
after the existing "MEMORY" line(s) in each mode. You say:
> This changed the behaviour on the IBM system and possibly fixed it on
> the Dell. For the latter, the couple of opportunities I've had to exit
> X worked correctly.
Perhaps you could cycle it a few more times for confidence? If it's as
random as it seemed, just running the X server and exiting as quickly as
possible ought to be a decent "smoke test".
> For the former, I exited X three times today. The
> first time, I was returned to the shell prompt as should be normal. The
> second time, I got a blank, black screen, like JPR described, which I
> used to log in blind, then ran clean_screen which got the video back.
> The third time, I got a kernel panic and reboot.
So previously the X server was hanging on exit (not affecting the whole
machine) about 75% of the time. I assume that 75% is a very rough
estimate. Now, out of 3 samples, one exited cleanly and two more went
wrong (in different ways). So without further examination of the
failure modes, I would tend to conclude that whatever was causing the
problem is still happening. Only the failure modes have changed. That
is, if you were to run 100 cycles under the new setup, you would see
about 25 successful exits, about 75 failures -- same as before.
Since the new failure modes include worse options (panic vs. a mere
unusable screen), you should probably undo the patch on the IBM.
Repeating part of the original message:
> > > | After the hang and from another login, I can kill the X process which
> > > | results in a black or sometimes garbled screen. I can log in again,
> > > | though I can't see what's happening on the screen. On the Dell box, I
> > > | can then log out and the screen returns to normal. On the IBM box,
> > > | logging out just gives me another blank screen.
Let's go back to the original grafinfo file. After a "bad" exit, you
seem to be saying the X server is still running. You can see this from
a network login, so the rest of the system is fine.
I don't quite understand from this description what happens on the IBM
when you run a new X server. Are you saying that it too is blank, or
that it displays normally? In other words, has the console become
totally unusable at this point, or are you able to return to a usable X
server as often as you want, but not to text mode?
Anyway, next time the exit hang happens, examine that X server's process
tree. In particular, does it have a subprocess called `vbiosd`? What
happens if you kill _that_ rather than the X server -- does X then
finish exiting in a more normal manner?
I'm thinking that you may end up with a still blank or trashed screen,
but at least your ability to flip multiscreens should return. It might
be that you can flip, but still can't see what you're doing. But you
should be able to distinguish between e.g. a multiscreen that was
sitting at a shell prompt; `echo '\07'` will beep -- vs. one that was
sitting at a login prompt.
Once the X server has exited relatively gracefully, try to get to a
shell prompt and run /etc/clean_screen. If you can't get to a shell
prompt on the console, run it from the network login as `clean_screen
< /dev/tty02` (substituting the name of the tty on which X was running
-- or, if you've flipped multiscreens, the one you think is currently
"displayed").
I'm trying both to develop a viable workaround for temporary use; and to
better understand the problem so that we can solve it permanently
without a clumsy workaround. So please describe the results very
carefully.
Now, back to the panic:
> Here are [what I think
> are] the important parts of the output of crash's panic command:
>
> Unexpected trap in kernel mode:
> cr0 0x8001003B cr2 0x0011001C cr3 0x00002000 tlb 0x00000000
> ss 0x00000001 uesp 0x0080A2CC efl 0x00010286 ipl 0x00000000
> cs 0x00000158 eip 0xF005919A err 0x00000002 trap 0x0000000E
> eax 0x00002000 ecx 0x00000001 edx 0x00000014 ebx 0xE0000E1C
> esp 0xE0000DE0 ebp 0xE0000E0C esi 0x00000001 edi 0x00000000
> ds 0x00000160 es 0x00000160 fs 0x00000000 gs 0x00000000
> cpu 0x00000001
...
> Kernel Stack before Trap:
> STKADDR FRAMEPTR FUNCTION POSSIBLE ARGUMENTS
> e0000de0 e0000e0c v86vint (u+0xe1c,0)
Hmmm. Well, it panic'd while running code under an interrupt that was
being serviced in virtual 8086 mode. Presumably that would be an
interrupt that was provoked by something the adapter's BIOS did while
coming down from graphics mode; and should have been handled by code
within the BIOS. The panic was a trap E (an illegal memory reference);
the bad reference address was 0x11001C (CR2). That address isn't a
sensible address for BIOS code to be accessing. We have no basis to
determine whether this is a BIOS bug or a bug in the simulated 8086
environment under which the Unix kernel is running the BIOS.
This does remind me of another thing that you should try, though. In
fact something that all three of the original posters should try. Many
modern systems have a BIOS setup item that boils down to "Should an
interrupt vector be assigned to the video board?". In most cases this
should be set to "no" for Unix. To be precise, I do not know of any
case where it needs to be "yes", but I could easily believe that some
video BIOSes might require it and I simply haven't run into one. This
is another one of those things that you'll learn about right away: if
you turn it off and the board/BIOS really need it, getting _into_ X will
fail and you'll back out the change.
Yet a third thing that you could try is to disable the high-precision
timer interrupts that were first introduced in OSR506. To do this, boot
with "defbootstr clock.disable_short_timers=1". The BIOS code may be
getting an unexpectedly high speed stream of timer interrupts, which
could get it in trouble.
> I'll post again as I have more details, but I won't have console access
> to the IBM again until Thursday.
I've given you several conflicting ideas to try. When you have access,
you'll have to decide what to fiddle with. I don't think it would be
wise to try more than one of these ideas at the same time, because you
wouldn't be able to tell which behavior changes were caused by what.
I think my order of attack would be:
1. Revert to the original grafinfo -- the change didn't help in this
case, and made the failure mode worse at times
2. Disable VGA IRQ in BIOS setup; test
3. Unless that made X unusable, leave it off even if it didn't help,
because it leaves more IRQs free for other devices
4. Try "defbootstr clock.disable_short_timers=1"; test
5. If that doesn't fix the problem, reboot without it and forget about
that setting
6. If neither of those fix the problem, work towards a workaround
based on killing `vbiosd` and running `clean_screen`
7. Comment on all the steps you took so we learn what was really
relevant...
>Bela<
Enter your email address for automatic notification of new posts here
(be sure to whitelist 'feedburner.com' if you use spam filtering)
| Views for this page | ||||
|---|---|---|---|---|
| Today | This Week | This Month | This Year | Overall |
| 4 | 11 | 45 | 530 | 726 |
/Bofcusm/2346.html copyright 1997-2004 Bela Lubkin All Rights Reserved
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Add your comments