News:

Want an XLerator? Please participate in the market research thread: https://lisalist2.com/index.php/topic,594.msg4180.html

Main Menu

Cycle-counting for tight code

Started by stepleton, November 25, 2025, 12:02:16 PM

Previous topic - Next topic

stepleton

Let's say you're writing some speed-critical Lisa code. Every cycle counts, and you have choices to make between different implementation options. There are various guides like this one for counting M68K instruction timings, but how can you know how long things really take if you need to access memory or one of the peripheral devices (e.g. the SCC or a VIA)?

I have to admit that the Lisa's complicated memory system makes this a bit confusing to me. I know the full story must be derivable from the Lisa Hardware Manual (e.g. section 4.2) and general timing information about the 68000 itself, but it's still confusing: the timing guide linked above refers to read and write bus cycles, and do those count differently on the Lisa? Does the MMU or accessing peripheral devices introduce wait states?

I'm happy to keep working to figure it out on my own, but I wonder if someone might know these things off the top of their head...

sigma7

#1
I use the tables in the Motorola MC68000 Users Manual (typically a PDF of it); the Quick Reference Guide is also handy for this. Presumably third party web sites have reproduced the data accurately, so if their presentation makes more sense to you, might as well use them.

For the unfamiliar, the specified number of clock cycles for a 68000 instruction includes the fetching of the instruction and the internal operations performed, but does not include the clock cycles consumed by any memory accesses performed by the instruction (which vary depending on what type of memory access it is as well as how many the instruction needs). So one looks up the "effective address" cycle timing(s) and adds that (those) to the instruction's cycle count.

IIRC, regular bus cycles (terminated by DTACK) operate in the documented number of clock cycles on the Lisa.

I believe the MMU and video are synchronized with CPU operation without adding wait states, but now that you've asked I'm not confident I can say there are never any wait states. A video access has to occur without fail, so it would have a higher design priority than the CPU. I suppose if there is an occasional wait state, you probably wouldn't be able to do anything about it. I recognize that in this case you want to be able to predict it more than avoid it.

Wait states do occur when trying to access the memory shared with the 6504 if the 6504 has locked out the 68000 to perform timing critical floppy disk access.

The AM9512 coprocessor circuitry also has a wait state generating circuit, but its unlikely anyone will run into that.

An expansion card may generate wait states; I don't recall any that do. A missing expansion card will cause wait states until bus timeout.

VPA/VMA bus cycles are different (slower) than DTACK cycles, but they are atypical (being used for the SCC, as well as the VIAs on the Lisa 1/2 I/O Board but not the 2/10 I/O or expansion cards).

Adding an XLerator changes timing as accesses to the original hardware require synchronization with the CPU Board.

This may not be very helpful, do you have a specific circumstance you are considering? Perhaps I can measure it for you.
Warning: Memory errors found. ECC non-functional. Verify comments if accuracy is important to you.

stepleton

Thanks for this very instructive response! It's good to know that regular bus cycles are (probably) dependable in their timing.

I note that MOVE.x Dy,Dz is listed as taking four cycles and as carrying out one bus read cycle. I was confused by the read cycle bit until I realised that this must simply be for the instruction fetch. This is confirmed in a document called the 68000 "Yet Another Cycle Hunting Table" (YACHT), which I had never heard of before. It conveniently breaks down what the bus cycles are for.

It would be interesting to know about how transactions via VPA/VMA cycles differ. It might be easiest and good practice for me to work out a way to measure this myself. It would be interesting, for example, to get an idea of the theoretical upper bound on parallel port communication.

sigma7

QuoteIt would be interesting to know about how transactions via VPA/VMA cycles differ.
Theoretical consequences of the 6800 VPA/VMA bus cycle operation from perusing the 68000 User Manual...

The VPA/VMA bus cycle option was provided by Motorola to simplify interfacing the 68000 with peripheral chips originally designed for use with the 6800 series.

VPA/VMA bus operation is synchronized with the "E clock" signal generated by the 68000. E is the cpu clock divided by 10, with a 60/40 duty cycle. That means the E clock can be in one of 10 different alignments with the cpu clock at the start of a particular instruction.

The User Manual provides a "Best Case" "MC68000 to M6800 Peripheral Timing Diagram" which shows 7 clock periods of wait states added to a bus cycle. If the E clock was in the worst case alignment with the start of the cycle, then an additional 9 clock cycles would be needed for a total of 16 added clock periods.

So the penalty of a VPA/VMA cycle is somewhere between 7 and 16 clock cycles, averaging around 12.

(That's assuming a peripheral doesn't need to add more wait states, which is the case in the Lisa aside from the AM9512. The circuitry immediately starts the VPA bus cycle response when the corresponding I/O address is decoded.)

Since all 68000 instructions are a multiple of 2 clock cycles (as far as I can tell), I think one might be able to optimize block moving code such that 8 wait states are added for each successive VPA/VMA cycle. To do this, successive instructions using VPA/VMA would need to be a multiple of 10 clock cycles apart (including all the clock cycles consumed by bus cycles, not just the instruction cycle time). The first VPA cycle would suffer the average ~12 cycle penalty, but each of the following VPA cycles could suffer the minimum penalty (7 cycles + 1 since 68000 instructions are multiples of 2).

The E clock is about 500 KHz in a stock Lisa, and with a 16 MHz XLerator installed, it is 1.6 MHz. The XLerator 12.5 and 18 retain the stock E clock frequency.

VPA/VMA bus cycles are used to access the 8530 SCC, and the Parallel Port and Keyboard 6522 VIAs on the Lisa 1 aka 2/5 I/O Board. IIRC, they are also used for interrupt acknowledge cycles on some expansion cards as that simplifies the circuitry.

The 6522 VIAs on the 2/10 I/O Board and the Dual Parallel Expansion Card don't use VPA/VMA bus cycles.
Warning: Memory errors found. ECC non-functional. Verify comments if accuracy is important to you.