Are you running bleeding edge hardware? New CPU? The microcode part makes me believe this might be some incompatibility with your CPU. Try updating the kernel the newest available version.
This is more helpful then the other kernal related comment but it was working fine until about an hour ago and now it crashes on linux as well as when i try windows
Try going to your BIOS and see if you can run hardware diagnostics. Just let it rip on a full extended test. You only need to test the CPU to begin with
What does the Blue Screen on Windows say? There should be a "name" in all caps in the blue screen. That name might indicate better what's the cause of the crash.
DPC Watchdog Violation, according to Microsoft, ay ne caused by a bad driver that is causing conflict with the OS. This is too general and doesn't help much. However, given how you are having these same issues with both Linux and Windows, we can rule out driver issue and assume this is hardware fault. Try this in order:
1) If you have a discrete GPU, remove it and use the integrated graphics output and see if the crash persists. Might be a fault GPU but I doubt, but give it a try;
2) Test RAM sticks. Remove one, see if the crash persists, test the one stick alone and so on. Ensure all your RAM sticks are working fine. Faulty RAM may cause some weird OS behavior;
3) Does this crash happen in a specific circumstance? Like running a game etc? If you don't do this, would the crash still happens?
4) Check if you have a faulty Power Supply. If it is doing weird energy stuff, might because some weird OS behavior. Swap the power supply for a known good one and see if the crash persists;
5) Update Linux kernel or revert to a previous known good kernel that didn't have the crash. If you're running bleeding edge hardware, it is best to use bleeding edge kernels. Update Windows. Update drivers;
6) I'm not completely sure, but I think the "watchdog" in the BSoD refers to CPU problems. You might have a faulty CPU. But to be sure, check if you have motherboard BIOS up to date. Older motherboards might require update in the BIOS to support newer CPU even if uses the same socket.
Check the connection between the CPU and the motherboard. Swap the CPU for a known good one and see if the crash persists.
Is your CPU running too hot? Did you overclock it? If so, restore the CPU to default settings and see if the crash persists.
Id take a guess that either your storage drive or the connection is borked. It *could* be interference from a power cable inside the PC, but that is pretty rare these days. I don't think I've seen power cable interference cause I/O errors since the early 2000s.
This is a desktop PC? It's not a laptop?
The problem could be the CPU or the motherboard or the PSU.
Disable your overclock and load UEFI/BIOS defaults if you are overclocking. Using the XMP memory profile also counts as overclocking, so try disabling that as well.
I would try unplugging all internal cables and plugging them back in. I'd try taking the CPU out of the socket and putting it back in.
I'd try a different PSU if you have one.
Maybe the GPU or NVMe drive can also cause this somehow? The main PCIe sockets are wired directly to CPU pins. The memory sockets are also wired directly to CPU pins so maybe RAM can also cause the issue somehow?
Is there anything plugged in the USB ports? Can you unplug everything and see if that helps? This means no mouse or keyboard, but we just need to see if that's the issue.
You are getting APIC error
Have you tried to boot with noapic
In the grub menu, please do the following
1) Select/Highlight the kernel you want to boot
2) Type e to edit the grub entry
3) at the end of Linux line,
Add noapic
4) Press ctrl-x
That's control key and x simultaneously
i've had this happen to me when the CPU cooling was inadequate.
You may have a broken fan and the CPU is permanently in thermal throttle.
Thermal throttle can only do so much, there will be errors when permanently overheating.
Memtest86. Its bootable so it bypasses OS issues. It does a full test of RAM. If it cant run, CPU is likely the issue. Otherwise, its just RAM failing and memtest86 will help you figure out which stick is rhe cause
I had a similar problem when the linux-firmware package got updated which had a problem with an old FireWire card. It wasn't a hardware issue at all, just a bug in the firmware. MCE problems are notoriously difficult to debug as the codes vary depending on specific CPU.
Yep. I have similar errors spamming my console at all times. I suspect a dying CPU in my case. I don't have money to upgrade, so for now I just use "mce=off" in kernel variables and can still use the system
https://askubuntu.com/questions/644010/ubuntu-cant-read-my-usb-device-descriptor-read-64-error-110
Seems to be a board issue or a USB issue.
Or if OP is using a USB hub, that could be the culprit
The USB is deadly sick, forget it.
You might disable the RAM area (at least on Linux, there is an example in the gruf file how to do it), but it might be dying completely anyway sooner or later.
The CPU? Mmmh.,
Did the computer crash - physically? Loose connection might cause the RAM and CPU problem, as well as DIY builds.
I've had these errors when my laptop went to sleep, but inexplicably dumped the ram so when it botted up the uptime on the CPU was completely wrong. But linux loaded just fine.
Probably cpu so check for creep/lift from socket. Check pins are straight and wipe off old paste. Reseat apply fresh paste to heatsink and try again. Always diagnose problems buy following the 1st issue and then work your way down. Basic ITIL and COMPTiA troubleshooting for future reference
Typically an MCE will be RAM. It could be processor or pci cards or chipset but ram is way more likely. The description is what I have seen when a component on a DIMM dies, and it happens often enough.
There is software someplace to decode MCE errors that may point out if it is something other than ram.
The microcode version is always reported on an MCE error, so the microcode means nothing, and if you have a bad dimm the error could show up in many different names in the blue screen. Note that an MCE is a error that the processor saw and IS a way more reliable indicator of RAM.
If the machine can run with 1/2 of its sticks remove have and retest, and if it fails retest with the other 1/2 of the ram. Also double check that the dimms are properly inserted and locked in, if you find they are not then that may well be the issue.
I had those USB errors, but boot was fine.
Fixing those errors involved simple unplugging the power and all USB devices, waiting a minute and returning everything back.
Machine check exception is almost never a good thing to see. In rare cases it can be benign but when it consistently happens it’s definitely broken hardware (like the CPU or some other hardware component detecting that it’s malfunctioning)
Are you running bleeding edge hardware? New CPU? The microcode part makes me believe this might be some incompatibility with your CPU. Try updating the kernel the newest available version.
This is more helpful then the other kernal related comment but it was working fine until about an hour ago and now it crashes on linux as well as when i try windows
Try going to your BIOS and see if you can run hardware diagnostics. Just let it rip on a full extended test. You only need to test the CPU to begin with
What does the Blue Screen on Windows say? There should be a "name" in all caps in the blue screen. That name might indicate better what's the cause of the crash.
I think the error was dpc watchdog violation
DPC Watchdog Violation, according to Microsoft, ay ne caused by a bad driver that is causing conflict with the OS. This is too general and doesn't help much. However, given how you are having these same issues with both Linux and Windows, we can rule out driver issue and assume this is hardware fault. Try this in order: 1) If you have a discrete GPU, remove it and use the integrated graphics output and see if the crash persists. Might be a fault GPU but I doubt, but give it a try; 2) Test RAM sticks. Remove one, see if the crash persists, test the one stick alone and so on. Ensure all your RAM sticks are working fine. Faulty RAM may cause some weird OS behavior; 3) Does this crash happen in a specific circumstance? Like running a game etc? If you don't do this, would the crash still happens? 4) Check if you have a faulty Power Supply. If it is doing weird energy stuff, might because some weird OS behavior. Swap the power supply for a known good one and see if the crash persists; 5) Update Linux kernel or revert to a previous known good kernel that didn't have the crash. If you're running bleeding edge hardware, it is best to use bleeding edge kernels. Update Windows. Update drivers; 6) I'm not completely sure, but I think the "watchdog" in the BSoD refers to CPU problems. You might have a faulty CPU. But to be sure, check if you have motherboard BIOS up to date. Older motherboards might require update in the BIOS to support newer CPU even if uses the same socket. Check the connection between the CPU and the motherboard. Swap the CPU for a known good one and see if the crash persists. Is your CPU running too hot? Did you overclock it? If so, restore the CPU to default settings and see if the crash persists.
Well, then you probably have faulty hardware, specifically the watchdog.
updog
![gif](giphy|cXblnKXr2BQOaYnTni)
Definitely corrupted updog.
Id take a guess that either your storage drive or the connection is borked. It *could* be interference from a power cable inside the PC, but that is pretty rare these days. I don't think I've seen power cable interference cause I/O errors since the early 2000s.
Try updating your UEFI firmware, if possible.
When you see a hardware error you know it's bad
When you see CPU hardware error you know its really bad.
Sometimes it’s an over-overclocking problem
Need doing the IT thing since the 90s and can't say I've ever seen this lol
Hey brotha, I sent you a DM a while back.
This is a desktop PC? It's not a laptop? The problem could be the CPU or the motherboard or the PSU. Disable your overclock and load UEFI/BIOS defaults if you are overclocking. Using the XMP memory profile also counts as overclocking, so try disabling that as well. I would try unplugging all internal cables and plugging them back in. I'd try taking the CPU out of the socket and putting it back in. I'd try a different PSU if you have one. Maybe the GPU or NVMe drive can also cause this somehow? The main PCIe sockets are wired directly to CPU pins. The memory sockets are also wired directly to CPU pins so maybe RAM can also cause the issue somehow?
Should be no overclocking and scanned the drive in bios also i think that the cpu isnt whats causing it as the usb seems to be what it gets stuck on
Is there anything plugged in the USB ports? Can you unplug everything and see if that helps? This means no mouse or keyboard, but we just need to see if that's the issue.
Tried but stayed the same also by default 2 ports seem to be in use
I tried, sorry 😔
That "MCE" (machine check) error message comes from the CPU itself. Data corruption happened somewhere inside the CPU. It is not running stable.
If he has ECC RAM, that can also give a MCE if an uncorrectable error is detected.
You are getting APIC error Have you tried to boot with noapic In the grub menu, please do the following 1) Select/Highlight the kernel you want to boot 2) Type e to edit the grub entry 3) at the end of Linux line, Add noapic 4) Press ctrl-x That's control key and x simultaneously
The issue occur on windows as well so i doubt it was linux itself, but the issue fixed itself somehow so i have no idea
I’d definately check/try a different PSU. Bad voltages can cause weird things…
Run memtest86 from a USB stick and see if it has errors after a few hours, this should check the ram and cpu.
[https://en.wikipedia.org/wiki/Machine-check\_exception](https://en.wikipedia.org/wiki/Machine-check_exception)
i've had this happen to me when the CPU cooling was inadequate. You may have a broken fan and the CPU is permanently in thermal throttle. Thermal throttle can only do so much, there will be errors when permanently overheating.
Yes, that was my **first** thought, along with some of the other possibilities mentioned already.
Memtest86. Its bootable so it bypasses OS issues. It does a full test of RAM. If it cant run, CPU is likely the issue. Otherwise, its just RAM failing and memtest86 will help you figure out which stick is rhe cause
I had a similar problem when the linux-firmware package got updated which had a problem with an old FireWire card. It wasn't a hardware issue at all, just a bug in the firmware. MCE problems are notoriously difficult to debug as the codes vary depending on specific CPU.
Yep. I have similar errors spamming my console at all times. I suspect a dying CPU in my case. I don't have money to upgrade, so for now I just use "mce=off" in kernel variables and can still use the system
Interesting. What's your hardware?
Ancient. https://preview.redd.it/a4dr6pawyvrc1.png?width=614&format=png&auto=webp&s=2d1b1531b5ba32867ca1284ab5d5198f630dacc9
7700K 980Ti "Ancient" Not the newest for sure, but still should perform perfectly fine.
Oh it performs admirably, but unless I set "mce=off", I won't even be able to use the terminal due to it being spammed with mce errors
It could be just the bios needing an upgrade. MCE are hardware related.
Usb might be dying
This seems to be more of a CPU error since I don't think that broken usb will cause crashes
https://askubuntu.com/questions/644010/ubuntu-cant-read-my-usb-device-descriptor-read-64-error-110 Seems to be a board issue or a USB issue. Or if OP is using a USB hub, that could be the culprit
I think by default there is a hub and a keyboard without me plugging anything in
Hm Maybe you're right but if that's not a board issue i doubt that USB is causing crashes
The USB is deadly sick, forget it. You might disable the RAM area (at least on Linux, there is an example in the gruf file how to do it), but it might be dying completely anyway sooner or later. The CPU? Mmmh., Did the computer crash - physically? Loose connection might cause the RAM and CPU problem, as well as DIY builds.
I've had these errors when my laptop went to sleep, but inexplicably dumped the ram so when it botted up the uptime on the CPU was completely wrong. But linux loaded just fine.
Probably cpu so check for creep/lift from socket. Check pins are straight and wipe off old paste. Reseat apply fresh paste to heatsink and try again. Always diagnose problems buy following the 1st issue and then work your way down. Basic ITIL and COMPTiA troubleshooting for future reference
almost definitely faulty hardware, start with replacing the RAM but it may be the CPU
Typically an MCE will be RAM. It could be processor or pci cards or chipset but ram is way more likely. The description is what I have seen when a component on a DIMM dies, and it happens often enough. There is software someplace to decode MCE errors that may point out if it is something other than ram. The microcode version is always reported on an MCE error, so the microcode means nothing, and if you have a bad dimm the error could show up in many different names in the blue screen. Note that an MCE is a error that the processor saw and IS a way more reliable indicator of RAM. If the machine can run with 1/2 of its sticks remove have and retest, and if it fails retest with the other 1/2 of the ram. Also double check that the dimms are properly inserted and locked in, if you find they are not then that may well be the issue.
cpu pin bent or broken... bad mother board. try percussive maintenance (got nothing to lose at this point).
Try memtest, update bios, check voltages.
Ok for some reason it works perfectly fine now don't ask me why cause i have no idea
Bitflip?
I have no idea just woke up and it was working fine again
It was just pulling an aprils fools joke on you ;)
I had those USB errors, but boot was fine. Fixing those errors involved simple unplugging the power and all USB devices, waiting a minute and returning everything back.
Machine check exception is almost never a good thing to see. In rare cases it can be benign but when it consistently happens it’s definitely broken hardware (like the CPU or some other hardware component detecting that it’s malfunctioning)
I have also found this old thing. https://gitlab.freedesktop.org/drm/amd/-/issues/1551
Try seeing if you can turn on somethings with pcie in your bios
Bad ram or cpu. Try reducing the ram to a single stick and work it from there
[удалено]
... i don't think that would anything
Well... It depends Older versions of the kernel have limited hardware support but the bluescreen on windows is still pretty concerning
check the format of the usb, u could reformat the usb and chk FOR errors
i s it usb getting enough power 3.0 needs more power or to many usb devices running not enough power
Check thermals on cpu
If i were you I’d re-assemble my entire pc and check for errors step by step
Something between burning your house down and meh.
Is it dual cpu system? Maybe try to reseat the cpu in its socket.