diff --git a/content/pmos-boot-recovery.rst b/content/pmos-boot-recovery.rst new file mode 100644 index 0000000..6228ca6 --- /dev/null +++ b/content/pmos-boot-recovery.rst @@ -0,0 +1,218 @@ +Recovering non-booting PostmarketOS +@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + +:slug: pmos-boot-recovery +:date: 2025-02-05 19:52 +:tags: linux, how-to +:category: technology +:keywords: PostmarketOS, recovery, initrd +:lang: en +:translation: false +:status: draft + +The problem +=========== + +I rebooted my PmOS on FOSDEM, and it didn't boot up. I got stuck with the penguins. basically brick, but a soft one. Probably early kernel panic, but I can only tell it was stuck (no logs whatsoever) + +This is a story of how I tried to restore my working stuff w/o internet (this is my only internet-capable device) + +.. TODO: what happened and what I think has happened + +The tools +========= + +Obvs, Kr (1+6). That has a bootloader and maybe some quirks we will see later + +Several Linux (x86_64, Arch linux) machines that have a rather sizeable amount of tools (file, binwalk, poke, adb, fastboot, …), but *almost no internet* + +The second day of FOSDEM (with free WiFi), 16 hours on the train home and more time at home :-D + +Some experience with flashing firmware to phones including the 1+6 + +The idea/goal +============= + +Rebuild initramfs and whatnot from ad-hoc obtained postmarketos install, reboot and fix stuff + +The process +=========== + +1. Realization we can fastboot +------------------------------ + +This was a rather obvious, given I flashed PmOS there in the 1st place. However, ``fastboot --help`` does not show too many useful options… + +… apart from one: switching slots. I know I had LineageOS (or something similar, I don't remember). + +However, the second slot did not work (I got bootloop instead). So, in slot A there is broken PostmarketOS, in B there is absolutely broken Lineage. + +2. Recovery +----------- + +I know about recoveries, but I was not able to boot into one (it just rebooted) + +However, after a sleep (and losing access to internet), I realised that the recoveries might be also slotted. And they were, so I got access to a basic LineageOS recovery in slot B. And that means ADB shell + +3. Backup +--------- + +Basically like this:: + + adb shell cat /dev/block/by-name/system_a | pv > system-a + chmod a-w * # to make sure I don't break my own backups + +Took quite some time, I think mostly because I only have USB 2.x cables. The good thing is that partitions are named here + +Also, the block devices ``sda`` to ``sde`` can be copied as whole. I don't know whether that does include the recovery… + +4. Partitions +------------- + +fdisk tells me that all of the partitions have 512B sectors and 2TB protective MBR. Really dummy stuff. + +After some experimenting, I realised that the sectors are 4096B in fact. ``dd if=sda bs=4096 skip=1 | file -`` however tells me that while this is a GPT data structure (in a nonstandard way, since that is at LBA 0), ``fdisk -l`` was not willing to tell me the details. Poking it would give me the data (GPT pickle seems to be part of the standard set), but that was quite unpleasant (I have little experience with Poke and wasn't able to convince it to show me the labels as strings and not byte values). ``hexdump`` is a friend, though :-) + +As for the partitions themselves, I don't understand much, but: +- ``boot_a`` seems to be the thing that the bootloader can load. According to ``file`` it is ``Android bootimg, kernel (0x8000), ramdisk (0x1000000), page size: 4096, cmdline (console=ttyMSM0,115200 pmos_boot_uuid=4724a893-210e-41b4-b89e-464252ae295f pmos_root_uuid=d1c201af-1ece-46f6-8e21-eb89acb494ff)``, which sounds good +- ``userdata`` contains another DOS partition table (with wrong sector size again) referencing partitions labeled ``pmOS_boot`` and a LUKS one. This is well known from the pmOS userspace +- ``system_a`` does not seem of interest, it is an ext2 filesystem with Android-style directory structure. (Maybe recoveries live in here? But I did not try to dis/prove this) + +5. Initramfs from pmos +---------------------- + +The recovery's shell is minimal, but it has ``cpio`` and ``chroot``. (I was hoping for kexec, but to no avail). So we can get at least the initramfs and get hopefully a better environment… + +The initramfs is however compressed with zstd and lives on a nested partitions with wrong partition tables, so find the file with a bit of ``dd``-ing, ``testdisk``-ing and get the cpio archive with ``zstdcat`` or whatever, then ``adb push`` it to ``/tmp`` on the phone; there we can do ``cpio -i < ../initramfs`` and chroot into a random directory under ``/tmp``. The tmpfs's are quite big (4G), so it is not an issue. + +Then the ordinary: fix ``$PATH``, install ``busybox``'s utils and try running init:: + + PATH=/bin:/sbin + busybox --install /bin + /init # and pray + +… didn't work. Somehow, ``/init`` has too advanced syntax for the busybox, so that crashes on setting up log :-/ + +Also: we can run ``mdev -s`` to populate ``/dev``, but we don't get partition names. Of course, ``/dev/block/by-name/*`` in the recovery fs is just a bunch of symlinks, so we can learn that (in my case) ``userdata`` is just ``/dev/sda17`` + +However. there is ``cryptsetup`` in ``/sbin``, so we should be able to do something. The question is: how to open and mount that. + +The biggest issue is still with the partition tables: while we *do* have ``losetup`` (actually, two implementations of it, since the recovery also has one), even with partition auto-find, but the tables are wrong, so the ``loop2p*`` devices would point to bad places. And the working devices I have are Arch, not Gentoo, so I don't have the source for that. + +Writing this paragraph (above) helped me realize one thing: I can do two things: Either I can fix the partition tables (if there is a bit for the size, and that would be written in the Pickles), and I have a PinePhone (since now I am already at home) that is Aarch64, too, so I can copy binaries from that! (It does not have a working modem, so still no internet, lol) + +The latter is a nice thing, but I don't think it is much useful because the configuration is still located at the encrypted partition which is still inaccessible and I am not sure whether I can fit all the required stuff into the tmpfs in order to build the kernel without decrypting (once I can chroot into the decrypted partition, I think I have won, so then using other tools would not be an advantage). + +6. Fixing (some of) the partition tables +---------------------------------------- + +We have ``xxd``, that can be used as a makeshift hex editor: dump what you need, ``sed`` or ``vi``` the fix (or just copy that into the PC and use more comfortable tools) and use ``xxd -r`` to edit. Hopefully. (Using hex-editor and piping the result into ``adb shell cat '>' /dev/block/by-name/userdata`` might also be an option, but transfering the whole partition sounds painful) + +Is this safe? Hopefully, because we are only modifying the inner table in ``userdata``, so it should not matter to the bootloader and should not affect the recovery. + +Reading into the pickle (``/usr/share/poke/pickles/mbr.pk`` on my machine), we see that there is no field for sector size, everything just is 512B big :-/ And also CHS, not LBA. But DOS partition table means we can just extract the MBR (``dd if=userdata of=userdata.mbr bs=512 count=1``) and hack on that without needing the rest. And since there is nothing of interest in the header, we only need to change the Partition Table Entries (PTEs) + +So let's poke the partition table:: + + shell$ poke userdata.mbr + (poke) .set obase 16 + (poke) load mbr + (poke) var ptes = (MBR @ 0#B).pte + (poke) ptes[0].lba *= 8 + (poke) ptes[0].sector_count *= 8 + (poke) ptes[1].lba *= 8 + (poke) ptes[1].sector_count *= 8 + +Poke applies that right away, so we can check with ``fdisk -x userdata.mbr`` the final sizes and do sth like ``dd if=userdata bs=512 skip=START count=SECTORS | file -`` to check that the rest is OK. We will silently hope that the CHS data is just not used anyway, so we will not touch that. + +Also note how we changed just the extracted MBR but checked against the original read-only backup. + +Push that to the device and use *recovery's* ``dd`` to patch the partition table:: + + pc$ adb push userdata.mbr /tmp + recovery# dd if=/tmp/userdata.mbr of=/dev/block/by-name/userdata + +Fun fact: initramfs's ``fdisk`` does assume 4k sectors correctly, but ``losetup`` somehow still uses 512B ones, leading to mismatches… Tip: for ext\* filesystem (like ``pmOS_boot``), there is a block of 0x400 zeroes at the start and the label is at offset 0x479 from the start of the filesystem, LUKS2 has ``LUKS`` magic at the start and a JSON config at offset 0x1000 + +Set up the loop device, run ``mdev`` to detect it and verify the partitions got detected correctly:: + + initramfs# losetup -P -f /dev/sda17 + initramfs# mdev -s + initramfs# losetup -a + /dev/loop4: 0 /dev/sda17 + initramfs# xxd /dev/loop4p1 | less + initramfs# xxd /dev/loop4p2 | less + + +7. Final touches^W^W Generating the correct-er initramfs +-------------------------------------------------------- + +Whoo! But this is the part I actually got most paranoid: the postmarketOS v24.12 uses kernel 6.12 (ish, it differs a bit across devices, the Oneplus 6 actually uses a -rc kernel), but the recovery has 4.9.337. That means that pmOS's userspace might be using too new system calls and random stuff might start failing at random stages. (So far we have been mostly treating the partitions like data, so everything was using the 4.9 calls and was more-or-less compatible) + +Anyway:: + + # cryptsetup open /dev/loop4p2 cry + Enter passphrase for /dev/sda17: + # mdev -s + # mkdir /mnt + # mount -t ext4 /dev/mapper/cry /mnt + mount: mounting /dev/mapper/cry on /mnt failed: Invalid argument + +… fuck. :: + + # dmesg + [100726.519997] EXT4-fs (dm-0): couldn't mount RDWR because of unsupported optional features (10000) + +Oh, here come the incompatibilities. OK, then:: + + # mount -t ext4 -o ro /dev/mapper/cry /mnt + +There we go! And the usual:: + + # for x in proc sys dev; do mount --rbind /$x /mnt/$x; done + # chroot /mnt /bin/bash + localhost:/# + +We're in! :: + + # mount -a + mount: /boot: /dev/loop1 already mounted or mount point busy. + dmesg(1) may have more information after failed mount system call. + +Uh, the single partition I need… Oh, that was a previous attempt that lead nowhere, so I can just umount that and retry successfully. :: + + # export PATH=/usr/sbin:/sbin:/usr/bin:/bin:/usr/local/bin + # mount -t tmpfs none /tmp + # export TMPDIR=/tmp + # mkinitfs + […] + ==> Not flashing boot in chroot + + +Ugh, so, did this work or not? I think the ``pmOS_boot`` partition is correct, but I cannot tell whether it failed before or after loading this partition from ``boot_a``. But anyway, we can check that the ``boot_a`` partition and ``/boot/boot.img`` are in fact the same thing, so we can flash it with dd from the recovery later. (And it is likely that the actual kernel will not like our changes to the partition table anyway, so we might be doing a second round of the process…) + +8. Let's boot, take 1 +--------------------- + +A quick ``sync`` and ``reboot``, followed by the slot switching trickery, and… + +… the situation is still the same, just penguins and nothing more. + +Let me just quickly flash the boot partition and revert the partition table (both from recovery shell using ``dd`` and some of the magic from above) + +9. Let's boot, take 2 +--------------------- + +And… wait for it… it worked! + +Now, naturally, I regenerate the initramfs again just in case the ad-hoc environment had some issues, and be done. + +Alternative and faster solution +=============================== + +Of course I could just find someplace with internet connection and re-flash the device. But I chose not to: I am not sure whether I would be able to re-flash just part of the ``userdata`` partition (I want to keep my data) and this seemed to be more of an adventure. + +Other thoughts / observations +============================= + +So far, I have no idea how “Android bootimgs” are created, though I think, there was some note about ``avdtool`` in the image.