To quote Rick and Morty, "Gentlemen, there's an option here you're not seeing":
You can have a different boot block for floppies than for profiles/widgets. This will simplify a lot of things and likely why LOS implements in this way.
A single boot block to boot them all is a lofty goal, but might not be necessary. Sure you'll need to build an installer that once running, knows the device it's writing to is a floppy, twiggy, profile, widget, or SCSI drive, or whatever, and will have to store 2 or 3 different sets of boot blocks (and optionally boot loaders.)
This way, once the boot block loads, it already knows it's booting off a floppy or off a profile. Really the boot block just needs a loop that says, load the next N blocks from the device in some specific sequence and then when you successfully get them all, execute them.
If the boot block is for a 400k floppy (or Twiggy), it needn't do anything but call the boot ROM routines in a loop to read from the drive. When you get far enough to worry about Twiggies, then you can add code to differentiate between upper and lower by looking at 1b3. And I really wouldn't worry about porting this to a Lisa 1. I mean, it's a great thing if you manage to do it, but there are so very few of them out there and likely you're not going to get access to one anytime soon, so don't worry about it until it comes up.
In terms of twiggies upper/lower I suspect that the shared floppy RAM address that holds the DRIVE number (0x00FCC003) doesn't change after reading a block, so when booting off Twiggies, as long as you don't overwrite it (and the BOOT ROM will set for you based on the user picking upper/lower), it will continue to read from the same drive if you call DOREAD.
So you wouldn't need to worry about upper vs lower in the boot block, nor the loader, but maybe the OS will need to know which device it booted from, but it can just read that drive parameter once it's up and knows it booted off a floppy. Looking at the code in the TWGBOOT routine, it even copies the drive number to address 0x535 for you, so if you need it, you can look there, though we don't know if other boot ROMs do the same exact thing or not.
I was wrong about one thing, the Lisa's BOOT ROM (at least H source) doesn't accept VIA address + I/O port spacing parameters, so you can only call the PROREAD function to read off the motherboard parallel port and nothing else. You'll have to use the expansion slot ROMs routines that get copied to RAM to boot from those.
You do have to make the boot block code location independent as the boot ROM will copy status/boot routines from expansion slots to 20000 if they exist.
But you could also cheat a little bit and load your loader blocks to a fixed address in memory that's high enough to be above all the expansion slot ROM copies. Off the top of my head it's either 4K or 2K for each ROM (I forget), so multiply by that 3 and add to 0x20000 and it should be fine as a loader address.
Once your loader is done loading the kernel (or in your case BDOS/BIOS), the boot loader can be discarded and the memory reclaimed.
With hard drives, it's a bit more complex because of expansion slots. But it's fine to hard code the assumptions for the hard disk vs floppy in the boot block itself. You could even add a third type of boot block to boot off expansion slot devices if that makes it easier. But the boot block itself should be as simple as possible.
I wouldn't be surprised if some code in the routines to boot off a specific expansion slot parallel port leave some crumbs behind somewhere to let you know what device to continue reading the next blocks from. You'd need to disassemble them to see however.
In terms of how many blocks to load, you can either hard code the number of blocks in the boot block to read off the device, or rely on the AAAA tags convention. That is, read block 1, if it has AAAA, load block 2, if it has AAAA, read block 3 in memory, ... read block N if it has AAAA, read block N+1, if it doesn't have have AAAA, discard it, and start executing the boot loader you just loaded.
This will ignore interleaving, but that probably doesn't matter all that much as you're not going to load more than maybe 20k-30k worth of loader.
If you want to get fancy, you can use the other tags to tell you the next block number to load, and you can address interleaving that way, but the boot block doesn't have to worry about that.
You could add some sanity checking - if you run out of RAM or load too many blocks or get a read error, then you can put up an error message via the boot ROM code. But you don't have to add all the logic in the boot block itself, and it's ok to have 2-3 different types of boot blocks, and optionally even different loaders depending on how complex/flexible vs small and simple you want to make it.
The boot loader itself can add a checksum routine to ensure it wasn't modified or corrupted, or if you have a bit of space left over in the boot block you can call VFYCHKSM before executing it.
The loader then has to be smart enough to find the other files (BDOS/BIOS/or Kernel in the case of a unixy system) it needs to load and know where to put them before it passes control to them. Some are smart enough to know about whole file systems (such as modern EFI ones).
Others have stringent requirements, or use parameters saved on the disk (or hardcoded in the loader itself) to find where to load the things they need are. They wouldn't need a whole entire file system driver, just enough to find the directory blocks, find the files they need, figure out the sequence of blocks to read and where to put them in memory before passing control to them (i.e. unixy loader that know where to find the kernel in the root directory by a given name that the user can provide/override.)
My advice would be to take Tom's boot loader project and library routines, and add to them as needed, but keep the resulting code as flexible/generic as possible so they can be continued to be used for other things as well as providing you features for GEM.
Since the boot block and loader are going to be released from RAM after the OS is up, it's ok to make them a bit larger than is optimum to make them more flexible, but they still need to be able to fit on a 400k floppy along with whatever OS/installer/apps you want to provide.
Then whatever formatter/installer program you'll provide can install bespoke boot blocks and loaders to the media once the OS is up. So this is where the complexity will be taken care of.
You can even do fun things like map the boot block and loader blocks to a file name inside the OS but mark it somehow as untouchable (in FAT12/16 this would be the System flag + Hidden + Read only Flag in DOSy terms). That way you don't have to duplicate one of the 2 or 3 types of loaders/bootblocks and save some space on the installation/boot disk. Or the boot block/loader can be outside the bounds of the file system and not normally accessible as a file, as UniPlus does (it reserves the 1st 100 blocks for the boot block + loader and they're totally inaccessible from the OS once it's up.)
Those are all valid options.
I'd guess the way IDLE does things for device picking is valid, or else it would have trouble booting anything. If in doubt you can play around with LisaEm as well to verify. And feel free to recompile these emulators and add code to print out what's in memory where, etc.
So tl;dr - have the boot block save a flag in low memory that you pass to the command interpreter.