ROAST « Category - Programming

ROCoding #4: ARM Wrestling

Saturday, April 4, 2026

[Edit 2026-04-18: Fixed incorrect byte ordering of colour channels.]
It’s been 25 years — blimey! — since I did any serious ARM coding. The last was fixing some bugs in my Lisp interpreter in 2001, after which my Iyonix died and I moved over to Linux.

Back with RISC OS now, and in the intervening years the ARM processor has had some substantial changes. Originally it was the Acorn RISC Machine, of course, and it first appeared in the Acorn Archimedes computer in 1987. It was a genuine breakthrough at the time, a custom-designed (by Sophie Wilson et al) 32-bit processor running at 8MHz. I’m now using a 4té², a repackaged Raspberry Pi 4b containing an ARM Cortex-A72, running at 1.8GHz — over 200 times faster.

And while the original ARM chips were indeed Reduced Instruction Set Computers, with only about 25 instructions¹, these days it’s something of a misnomer. So what’s been added? SIMD and NEON, mostly. This article is a simple introduction to using some SIMD instructions; we’ll cover NEON² later.
[Read more…]

ROCoding #3: Ellipses and rings

Monday, February 23, 2026

The previous post in this series covered circular or radial blends and gradients. With a few simple modifications the code can be adapted to create elliptical and annular fills. Or eggs and doughnuts, if you prefer 😉.

Again, all these examples will be using various procedures from the previous posts.

RISC OS provides some graphics primitives to draw ellipses, and from BASIC this is:

ELLIPSE [FILL] centreX,centreY, width,height [, angle]

We won’t be using this however, and we won’t be including the angle setting, which draws a rotated ellipse. Let’s not make it too complicated…
[Read more…]

ROCoding #2: Circles

Tuesday, January 20, 2026

The first post in this series covered filling a rectangle with blends and gradients. This time we’ll look at generating circular or radial blends and gradients, which is a bit more complicated. All these examples will be using various procedures from the previous post.

RISC OS provides graphics primitives to draw outline and filled circles. From BASIC:

PROCinit
SYS CT_SetGCOL%,&40dd4000
CIRCLE 64,192,60
CIRCLE FILL 192,64,60

The parameters are the centre coordinates and the radius, in OS units. As an aside, here’s the additive RGB triplet as used in the PhotoDesk manual, showing the complementary cyan, magenta and yellow colours. The circles are blended with the OR operation:

PROCinit
SYS CT_SetGCOL%,red%
CIRCLE FILL 128,160,80
SYS CT_SetGCOL%,blue%,,,,1
CIRCLE FILL 170,88,80
SYS CT_SetGCOL%,green%,,,,1
CIRCLE FILL 92,88,80

So, can we use these primitives to create a radial blend? Here’s a first attempt:
[Read more…]

ROCoding #1: Rectangles

Friday, December 12, 2025

Coming back to RISC OS has been interesting. I’ve had to re-learn a number of things, like using the WIMP and, in particular, graphics programming. I hope this article will be the first of a series explaining my learning curve, and hopefully providing some useful programs. It’s fairly basic — in all senses! — but does assume some familiarity with RISC OS and BBC BASIC.

We’ll start with some things you can do when drawing rectangles. While writing the Solar application I wanted to provide some better backgrounds for the graphs. A plain background is easy, of course: just use RECTANGLE FILL with some appropriate colour, like this.

GCOL 0,&ff,&dd,&ff
RECTANGLE FILL 0,0,256

…which draws a 256×256 square at the graphics origin. More generally, RECTANGLE accepts both width and height, which can be negative.

First up, we’ll generate a blend between two colours, from the bottom to the top of the rectangle. We’re assuming a full-colour display here, which is necessary for displaying blends with any fidelity. The Solar application graphs are built up by redirecting all graphics output to a sprite, then letting the Wimp handle displaying them on screen; this takes care of any mismatch between the screen and graph colour depths.
[Read more…]

BASIC 5 vs BASIC64 [updated]

Saturday, November 29, 2025

[Updated 5 Dec 2025. Now includes BASIC FPA timings, and text updated. DIV is now included.]

While writing the Gradgrind program (qv) I wondered if it would benefit from running under BASIC64 (or VI), as opposed to BASIC 5 (V). So I ran up a few simple speed tests (and they are simple — don’t take the results as definitive benchmarks). Although the program runs acceptably fast on my setup (a 4té², which is based on a Pi 4b), it uses a fair amount of real arithmetic. The chief difference between the two versions is that BASIC 5 stores reals in 5 bytes, while BASIC64 uses 8 bytes. BASIC64 itself has two variants, VFP using the vector floating point operations available on the Pi’s ARM processor, and FPA using software floating point.

To use each variant:

basic <file> Runs <file> using BASIC V. This is also the default setup for double-clicking a BASIC file.
basic64 <file> Runs <file> under BASIC VI, which will select the VFP variant if available, otherwise uses FPA.
basicvfp <file> Forces the VFP variant, if available.
basicfpa <file> Forces the FPA variant.

The program to generate these results is available from the Downloads page.

System details:
R-Comp’s 4té² (based on the Raspberry Pi 4b) running RISC OS 5.31 (21-Jul-24) at 1.5Ghz
BBC BASIC V 1.85 (03 Oct 2022)
BBC BASIC VI 1.85 (03 Oct 2022) VFP
BBC BASIC VI 1.85 (03 Oct 2022) FPA

Notes on the tests

All tests were run in single-tasking mode (not in a task window).
Each test (other than the WHILE/REPEAT ones) is enclosed in a FOR…NEXT loop, with the index variable running from 1 to 100,000,000. So they are run 100 million times. This was chosen so that an empty loop runs in about 1 second. Other than tests 3, 4 and 6, the index variable is an integer.
The description specifies what operation is inside the loop.
All timings are in seconds with centisecond resolution, other than the total time which is in minutes:seconds.
The “Assignment a%=long…” tests are done because I often use systematic prefixes for variable names, and I wondered if having many variable names with identical prefixes would slow things up. The testing program defines 300 variables called “VeryLongVariableNameWithIncrementingSuffix001%” to “VeryLongVariableNameWithIncrementingSuffix300%”.

Test	BASIC V	BASIC VI VFP	BASIC VI FPA
Empty FOR loop (int)	0.97	1.02	0.98
Empty FOR loop (int, no spaces)	1.01	1.15	1.00
Empty FOR loop (real)	3.09	1.11	65.07
Empty FOR loop (real, no spaces)	3.10	1.10	65.05
Empty FOR loop, NEXT I% (int)	3.02	3.04	2.97
Empty FOR loop, NEXT I (real)	5.12	3.76	67.05
Assignment A%=100	4.17	4.13	23.61
Assignment a%=100	4.14	4.14	30.69
Assignment a%=b% (100)	4.39	4.39	23.63
Assignment a=0.5	4.19	4.02	53.65
Assignment a=b (0.5)	4.89	3.82	53.78
Assignment a%=Very…x001%	4.38	4.39	23.63
Assignment a%=Very…x300%	4.39	4.39	23.62
Integer maths a%=b%+c% (100+50)	6.62	6.63	26.16
Integer maths a%=b%-c% (100-50)	6.62	6.63	26.26
Integer maths a%=b%c% (10050)	6.78	6.78	26.28
Integer maths a%=b%/c% (100/50)	16.47	8.44	181.51
Integer maths a%=b%DIVc% (100DIV50)	7.96	7.62	27.55
Real maths a=b+c (0.5+0.2)	7.93	5.85	154.60
Real maths a=b-c (0.5-0.2)	8.43	5.88	135.51
Real maths a=bc (0.50.2)	7.90	5.85	134.98
Real maths a=b/c (0.5/0.2)	17.88	6.69	171.36
Real maths a=b^c (0.5^0.2)	46.09	90.33	1651.64
Real maths a=SQRb (0.5)	15.96	5.40	103.40
Trig a=RADb (0.5)	6.24	4.52	85.88
Trig a=SINb (0.5)	19.28	46.44	768.88
Trig a=COSb (0.5)	28.86	47.25	794.56
Trig a=TANb (0.5)	26.89	74.45	757.27
WHILE loop (int, I%=I%+1)	10.89	10.80	31.46
WHILE loop (int, I%+=1)	9.39	9.38	9.38
REPEAT loop (int, I%=I%+1)	11.58	11.16	32.17
REPEAT loop (int, I%+=1)	9.52	9.49	9.52
Total (m:s)	5:18	6:50	92:43

Takeaways

BASIC VI FPA is slow. If you run the test program be prepared to cook and eat your dinner while it shuffles along. And possibly include a nap.
I’m puzzled why integer operations under BASIC VI FPA are so slow; I wasn’t expecting these to vary much. Addition, subtraction, multiplication and the DIV operator are about 4 times slower than BASIC V. Using / to divide is even worse, 11 times slower. And even simple integer assignments are about 6 times slower.
Note that using a real variable as the index in a FOR…NEXT loop in BASIC VI FPA is twenty times slower than BASIC V (and over sixty times slower than BASIC VI VFP).
Don’t use variables after NEXT; it’s three times slower, and they’re very rarely needed. This can make a considerable difference for nested FOR loops.
I’ve done both “A%=” and “a%=” assignments to check if so-called ‘resident integer variables’ (A% to Z%) still give a speed benefit. But it seems there’s no advantage now to using them.
Ancient history note: this feature dates back to the Acorn Atom (my first computer back in 1980 — I soldered it together!), when A-Z were the only variables you had (integer only); if you wanted more you had to use arrays (just AA to ZZ) or indirection. BBC BASIC evolved from Atom BASIC — indeed, you could get it on the Atom as an add-on board — and the resident integer variables on the BBC Micro’s original BASIC were a legacy.
The long variable name tests don’t appear to make much difference. But it turns out that both versions use a fairly sophisticated caching strategy for variable names, so this isn’t really a valid test. I’d be interested to know if the same strategy is used for procedure/function names (which I haven’t tested here).
Integer maths is pretty much identical between BASIC V and BASIC VI VFP, with the exception of division which is twice as fast in BASIC VI VFP.
Simple real maths is faster in BASIC VI VFP than BASIC V, by about 25-30%; again, division is twice as fast. It’s interesting that BASIC VI VFP’s real maths operations are slightly faster than their integer equivalents. Of course, different values may give a different result.
Square root — a common operation — is three times faster in BASIC VI VFP.
Avoid exponentiation if you can — it’s a very expensive operation, in all versions.
I was a bit surprised that BASIC VI VFP’s trig functions weren’t better. Although they are, of course, much more accurate.
Using += and -= to increment/decrement variables gives a small but useful speed increase, particularly under BASIC VI FPA.

Conclusion

No, it wouldn’t really be worth Gradgrind using BASIC VI VFP. For much of the real arithmetic involved I’ve used pre-calculation, and the speed increase would be minimal.

Downloads

Download the program from the Downloads page.

ROAST ::

Admin

Menu

Categories

Links

Archives

ROCoding #4: ARM Wrestling

ROCoding #3: Ellipses and rings

ROCoding #2: Circles

ROCoding #1: Rectangles

BASIC 5 vs BASIC64 [updated]

Notes on the tests

Takeaways

Conclusion

Downloads

ROAST ::

Admin

Menu

Categories

Links

Archives

Search

ROCoding #4: ARM Wrestling

ROCoding #3: Ellipses and rings

ROCoding #2: Circles

ROCoding #1: Rectangles

BASIC 5 vs BASIC64 [updated]

Notes on the tests

Takeaways

Conclusion

Downloads