General Category > LisaList2

Cameo/Aphid benchmarking

<< < (2/3) > >>

stepleton:
I remembered that thing about the faster clock on the parallel card; how that affects I/O with the Lisa I'm not sure. But if @Lisa2, you're willing to try your X/ProFile on the parallel port, that would be an excellent test of the difference.

I'm girding myself for the news that Cameo/Aphid really is slower! I never have tried to optimise it for speed, and there's a fair amount you could do to make it go faster. (Ditching python for C might be a start.) But I'm betting that most Lisa applications aren't really disk I/O bound. The main thing I'm worried about is compatibility, so that bit about it not working internally is what grabs my attention. Do you have the inline 100-ohm terminating resistors fitted? (I'm guessing you do.) These are what enables Widget replacement for me, but I haven't had more 2/10s to try it out on besides my own.

It's worth saying that when James Denton was working on preparing this product derived from Cameo/Aphid, he needed to make some tweaks to get it working on the internal connector. Note the presence of Rev. A and Rev. B options on his page. I'm not sure how his Rev. B is made, but I know he was investigating a simpler replacement for the TXS0801 bidirectional level adaptor chips that I use in my design, which I think is a pretty good idea. James was planning to share his designs and may have already done it somewhere.

jamesdenton:

--- Quote from: stepleton on September 08, 2021, 04:04:40 pm ---It's worth saying that when James Denton was working on preparing this product derived from Cameo/Aphid, he needed to make some tweaks to get it working on the internal connector. Note the presence of Rev. A and Rev. B options on his page. I'm not sure how his Rev. B is made, but I know he was investigating a simpler replacement for the TXS0801 bidirectional level adaptor chips that I use in my design, which I think is a pretty good idea. James was planning to share his designs and may have already done it somewhere.

--- End quote ---

Thanks for the nudge - I've push the schematics, board, and related BOM to my forked repo here.

FWIW, the Rev A board uses the same components found in Tom's design and uses TXS0108s and 100ohm resistors (where necessary). Works well with Lisa 2/5 and Apple ///. Been a while since I tested with a parallel card. I was not able to get it to work reliably with my 2/10 using the internal cable. The same can be said for the original cameo/aphid I built.

Which brings me to Rev B - I used many BSS138s in place of the (2) TXS0108s. This has shown to work really well on the 2/10, along with the 2/5 and Apple ///. However, I have found the PocketBeagle's themselves to be a little more flaky with this board. Some work great 100% of the time, others are a bit more temperamental and I see checksum errors and various oddities.

stepleton:
Thanks James! Looks like I might need to find some different 2/10s for testing besides my own. About the only thing I can think of that might be different is that I'm not using any adaptor cable between my own device and the Widget cable, which I plug straight into the 26-pin header at the back of the Aphid board.

It's been about three years since I designed the Aphid board. It's possible that trying to use fancy all-in-one automatic bidirectional level adaptor ICs for the data and signaling lines was just too clever by half. (ProFile emulation was only meant to be one application for Aphid --- I wanted to have the board be useful for other things too, like GPIB. But I've never tried any other use.)

I think there are bidirectional level shifter ICs that require you to toggle the data direction with a pin; alternatively, you could do what the ProFile did and have two tri-state buffers, one for in and one for out. The R/~W line picks the active one directly, if I recall correctly. I suspect either would be more dependable. In the meantime, I wish I knew why my 2/10 was more tolerant than other folks' machines.

rayarachelian:
Don't get misled. That "1.25MHz speed" thing is about the T1 and T2 timers that are inside the VIAs, which can cause an interrupt, or be used to generate a square wave output, etc. It's not about the transfer rate to/from a ProFile or Widget.

"625K bytes/second maximum data transfer rate is pure bullshit. They got to it like this: 68000 at 5MHz so 5,000,000 cycles/second. Each memory access is 8 cycles. 5,000,000 / 8 = 625,000 or 6.25KB/s.

However that's a lie. If you have a tight 68000 assembly loop that reads from port A on that via, and then turns around and writes to memory and increments an address register, you need at least two memory accesses (each 8 CPU cycles), one for a read and one for a write.

But wait! There's more! The 68000 doesn't really have a cache (well it has the IR register, which is a single word, so might as well have none), so it needs some bus cycles to read those opcodes, and then, to execute them.

The smallest opcode can be read in a single shot (2 bytes/1 16-bit word) - so 8 CPU cycles.

So if you use register based addressing, indexing you can do something like this:


--- Code: ---       LEA.L  VIA_portA, A0
       LEA.L  BUFFER,A1
       MOVE.W #512+20,D0
loop:  MOVEP.B (A0),(A1)+
       DBRA D0,loop

--- End code ---

This guy MOVE.B (A0),(A1)+ will take at least 24 CPU cycles, likely more. 8 cycles to read the opcode, 8 cycles to read from the VIA port A (IRA), 8 cycles to write to the port's data to memory (A1), a few more to increment A1 +. Then, DBRA will take yet at least 8 more cycles just to read the opcode, likely more for the decrement of D0.

So, right off the bat you're looking at minimum 32 cycles.

On top of that, half the bus cycles are used by the video state machine, during which time the 68000 is just sitting there waiting. Sure, it could perform internal operations, but that's unlikely as there's no cache. So you're already starting with [ 5,000,000 cycles/sec / 2 (video state steal)  / 32 (cycles for those two instructions) ] at the absolute minimum.

So doing that division gives us something like 78.125KB/s at the absolute best case. And yes, I cheated, I was lazy and didn't assemble this and look up each opcode generated in the 68000UGM to see the exact cycle. Most likely its more cycles than I said, so slower than that 78KB/s - possibly even as slow as half of that.

Sure you can play tricks like loop unrolling (which LOS 3.1 does) or whatever, but there is some limit that you can't go past.

Perhaps if they had rigged up a DMA controller it might have gotten closer, but still, it would take about the same number of cycles while the DMA controller does its thing and the 68000 is waiting for the bus. You'd need to rig up a separate bus out of the way of the CPU to the DRAM and tie it into the DRAM-refresh to get that fast - that is instead of using half the cycles to just refresh the DRAM (which is the other half of the purpose of the video state machine) you could have the DMA controller push data from the ProFile directly to a buffer setup by the OS in RAM. Which is really hard to do.

The Mac does something similar with its video state machine, but uses faster DRAM so they can go the full 8MHz and also do the DRAM refreshes while the CPU isn't using the bus (I think it uses !CLK vs CLK so both high and low clocks are used) - Steve Chamberlain had a nice writeup on this when he was building his Mac clone: https://www.bigmessowires.com/category/plustoo/ or here https://www.bigmessowires.com/category/68katy/  - but I remember he had another with the bus cycle timings and what not that I don't see right now.

The Lisa actually alternates between 8 cycles for the CPU, and then 8 cycles for the video/DRAM refresh - somewhere in the Lisa HWG they show this with all the bus cycles 0-7 for CPU, then 0-7 for video. That's why it's much slower. So it's closer to 2.5MHz in reality.

So tl;dr there's no way to get anywhere near 625KBPS. I'll go away and shut up now. :)

Lisa2:
As a followup to my post yesterday:

1. I did confirm the SD card in the Aphid I used for test was a 16Gig UHS speed class 1 ( equivalent to a speed class 10 ).

2. My interface uses TXS0108s and 100ohm resistors

3. I did try to test my real 5M profile, but while it was working last weekend, it was not cooperating last night.  Those things are quite temperamental.

4. Tested the same X/Profile on both the internal 2/10 port and using a Dual Par Card.  The end result is that Dual Par Card is faster than the internal port in this very un-scientific test.  Long, un-professional video of the testing here:

https://youtu.be/63BF9FbOylU

Rick

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version