Netgear support over the days following my report of the problems I was having with the RN104 was awesome. They tried everything, kept me informed at all stages and eventually asked me to return the old one and get a replacement.
The replacement now has an uptime of 26 days and hasn't skipped a beat.
Moral: don't generalise based on a sample of one. (Of course I could quibble about how the faulty unit got past QA in the first place...)
Wednesday, January 8, 2014
Thursday, November 21, 2013
ReadyNAS RN104 is a disaster.
You get what you pay for they say. In my upset about almost losing a year's data from my ReadyNAS NV+ I purchased an RN104, Netgear's latest version of "consumer level" NAS. I was kind of hoping that after all these years of development the system software would be more stable than that on the NV+.
After much googling and angst I was finally able to copy my data from the stalled NV+ onto my new 104 and I thought it was all plain sailing after that.
Three weeks later and I'm ready to demand my money back for the 104. It has stalled at least once a day for the past three weeks, sometimes twice, even three times. Thankfully redundancy in the disks has allowed resynching (if it doesn't stall first) so I haven't lost any data (yet) but basically I've got a very power hungry brick. Obviously its performance as a file server is very poor while it's resynching.
I don't think Netgear support has the slightest clue about why it's happening. The ReadyNAS forum is full of complaints about the stalling. I've filed a support request but so far none of the replies has indicated they have any idea why it stalls.
It might be snapshots, it might be automated backups, it might be power supply under load, it might be the anti-virus software (not enabled in my case), it might be CPU too hot under load...
At the moment my old NV+ is sitting next to the 104 looking very smug. At least it only has unrecoverable stalls once a year or so...
As an indicator of how clueless Netgear support people are, I posted a complaint on the forum about the iTunes server in OS6 being compiled to serve only Windows boxes. It serves up FLAC files with the bytes reversed for a Mac and all one hears is a deafening squeal. "Thanks for the heads up" was the reply from Netgear. They never tested it!
After much googling and angst I was finally able to copy my data from the stalled NV+ onto my new 104 and I thought it was all plain sailing after that.
Three weeks later and I'm ready to demand my money back for the 104. It has stalled at least once a day for the past three weeks, sometimes twice, even three times. Thankfully redundancy in the disks has allowed resynching (if it doesn't stall first) so I haven't lost any data (yet) but basically I've got a very power hungry brick. Obviously its performance as a file server is very poor while it's resynching.
I don't think Netgear support has the slightest clue about why it's happening. The ReadyNAS forum is full of complaints about the stalling. I've filed a support request but so far none of the replies has indicated they have any idea why it stalls.
It might be snapshots, it might be automated backups, it might be power supply under load, it might be the anti-virus software (not enabled in my case), it might be CPU too hot under load...
At the moment my old NV+ is sitting next to the 104 looking very smug. At least it only has unrecoverable stalls once a year or so...
As an indicator of how clueless Netgear support people are, I posted a complaint on the forum about the iTunes server in OS6 being compiled to serve only Windows boxes. It serves up FLAC files with the bytes reversed for a Mac and all one hears is a deafening squeal. "Thanks for the heads up" was the reply from Netgear. They never tested it!
Tuesday, October 29, 2013
Mounting Sparc-based ReadyNAS Drives on a Raspberry Pi
dbott at http://home.bott.ca/webserver/?p=306 gives a simple process for mounting ReadyNAS disks on a Linux box. Unfortunately, that Linux box is presumed to be a Debian Linux PC with at least a few spare disk trays.
In my case, my ReadyNAS NV+ stalled and wouldn't reboot. I installed the latest firmware (4.1.12) a couple of days ago and I am thus suspicious about the cause of the stalling...
However it was obvious from the way the boot process happened that the disks were OK. The NV+ simply refused to read them. I was fed up. A similar event happened just over a year ago and I am tired of this potentially complete loss of data. In the light of my experience last year I at least had a full backup on my old Drobo 1 but the backup itself took so long I was only running it monthly.
I googled for info about reading ReadyNAS Sparc disks and found the above link. I don't have a spare Linux box sitting around. I do, however, have a Raspberry Pi which was running my backups to the Drobo and was thus doing nothing.
Courtesy of dbott's notes, I realised I had to install fuse-ext2 and lvm2 on the RPi in order to read the disks. I used apt-get to install lvm2 but there didn't seem to be a .deb package for fuse-ext2.
I googled once more and found this:
http://loveraspberrypi.blogspot.com.au/2013/07/modify-raspberian-image-file-on-mac-osx.html which is actually the reverse of what I want. I want to read an ext2 disk on an RPi. This article tells how to read an RPi SD card on a Mac. However it contains the critical link to the fuse-ext2 source:
http://alperakcan.net/?open=projects&project=fuse-ext2. And then I had to install the GNU devel toolchain (apt-get install autoconf automake libtool libfuse-dev ext2fs-dev).
Finally the configure/make/install process succeeded and I thought I had plain sailing from there. I put disk 1 from the NV+ into a USB docking station and connected it to the RPi via a powered USB hub connected to one of the USB ports. Then I followed dbott's instructions. Unfortunately vgscan told me there were two disks missing from the set. Which was correct of course. I needed at least three of the four disks from the NV+ to make the virtual volume.
After a hasty visit to my local computer parts shop, I installed the remaining two disks using SATA to USB cables connected to the hub and was delighted to run vgscan and see the three disks.
Finally I ran
fuse-ext2 -o ro -o sync_read /dev/c/c /mnt/media
and was delighted to see the "unreadable" volume fully accessible with all my files listed.
In my disgust :) I went out and bought a ReadyNAS RN104. I'm quite happy with the Netgear hardware. I just hate their crappy (firm|soft)ware. So I'm hopeful that the latest firmware (OS6) will be a bit more stable than that on the NV+. I'm currently letting the RN104 format some disks and then I hope to be able to transfer all the files from the USB disks to the RN104. I'll add an update if/when it completes.
Update: Initially I tried a simple 'cp -r' from the llvm filesystem on the RPi to the RN104 filesystem. It stalled. So, realising that there was quite a high probability of the copy stalling again, I'm using rsync because it can pick up where it stopped if the llvm volume disappears, which for some reason, it seems to do occasionally. (Actually I suspect the cause of the stalling is the WD Green disks spinning down to "save power" and then not recovering quick enough when they are needed. This might also have caused the NV+ problems. Not sure how to verify this. The WD Greens are definitely not on the recommended list for the RN104.) I'm about 1/3rd through the data transfer and am listening to some of my music from the new server so it looks like I'll be able to pick up without data loss. Phew! what a relief!
Update2: I finally discovered that one of the power packs for the SATA to USB cables was faulty. Actually, the pack was OK. But two of the connecting cables were faulty. What can I expect for $20? Anyway, I swapped the mains cable and fixed the moving pin in the 12v/5v connector. And the uploads have been much more stable since. But the RPi still occasionally stalls. And I have to pull the power plug to restart the RPi and twice the SD card was corrupted. I had to re-copy the image to the SD card using dd. Restore still not completed but the most important files have copied.
In my case, my ReadyNAS NV+ stalled and wouldn't reboot. I installed the latest firmware (4.1.12) a couple of days ago and I am thus suspicious about the cause of the stalling...
However it was obvious from the way the boot process happened that the disks were OK. The NV+ simply refused to read them. I was fed up. A similar event happened just over a year ago and I am tired of this potentially complete loss of data. In the light of my experience last year I at least had a full backup on my old Drobo 1 but the backup itself took so long I was only running it monthly.
I googled for info about reading ReadyNAS Sparc disks and found the above link. I don't have a spare Linux box sitting around. I do, however, have a Raspberry Pi which was running my backups to the Drobo and was thus doing nothing.
Courtesy of dbott's notes, I realised I had to install fuse-ext2 and lvm2 on the RPi in order to read the disks. I used apt-get to install lvm2 but there didn't seem to be a .deb package for fuse-ext2.
I googled once more and found this:
http://loveraspberrypi.blogspot.com.au/2013/07/modify-raspberian-image-file-on-mac-osx.html which is actually the reverse of what I want. I want to read an ext2 disk on an RPi. This article tells how to read an RPi SD card on a Mac. However it contains the critical link to the fuse-ext2 source:
http://alperakcan.net/?open=projects&project=fuse-ext2. And then I had to install the GNU devel toolchain (apt-get install autoconf automake libtool libfuse-dev ext2fs-dev).
Finally the configure/make/install process succeeded and I thought I had plain sailing from there. I put disk 1 from the NV+ into a USB docking station and connected it to the RPi via a powered USB hub connected to one of the USB ports. Then I followed dbott's instructions. Unfortunately vgscan told me there were two disks missing from the set. Which was correct of course. I needed at least three of the four disks from the NV+ to make the virtual volume.
After a hasty visit to my local computer parts shop, I installed the remaining two disks using SATA to USB cables connected to the hub and was delighted to run vgscan and see the three disks.
Finally I ran
fuse-ext2 -o ro -o sync_read /dev/c/c /mnt/media
and was delighted to see the "unreadable" volume fully accessible with all my files listed.
In my disgust :) I went out and bought a ReadyNAS RN104. I'm quite happy with the Netgear hardware. I just hate their crappy (firm|soft)ware. So I'm hopeful that the latest firmware (OS6) will be a bit more stable than that on the NV+. I'm currently letting the RN104 format some disks and then I hope to be able to transfer all the files from the USB disks to the RN104. I'll add an update if/when it completes.
Update: Initially I tried a simple 'cp -r' from the llvm filesystem on the RPi to the RN104 filesystem. It stalled. So, realising that there was quite a high probability of the copy stalling again, I'm using rsync because it can pick up where it stopped if the llvm volume disappears, which for some reason, it seems to do occasionally. (Actually I suspect the cause of the stalling is the WD Green disks spinning down to "save power" and then not recovering quick enough when they are needed. This might also have caused the NV+ problems. Not sure how to verify this. The WD Greens are definitely not on the recommended list for the RN104.) I'm about 1/3rd through the data transfer and am listening to some of my music from the new server so it looks like I'll be able to pick up without data loss. Phew! what a relief!
Update2: I finally discovered that one of the power packs for the SATA to USB cables was faulty. Actually, the pack was OK. But two of the connecting cables were faulty. What can I expect for $20? Anyway, I swapped the mains cable and fixed the moving pin in the 12v/5v connector. And the uploads have been much more stable since. But the RPi still occasionally stalls. And I have to pull the power plug to restart the RPi and twice the SD card was corrupted. I had to re-copy the image to the SD card using dd. Restore still not completed but the most important files have copied.
Wednesday, June 19, 2013
Direct Digital Synthesis (DDS) on the GA144 (Part 2)
I found out why I couldn't get two nodes to run with my start script. I was too impatient. I'd forgot that I had the serial connection to the Eval board running at only 115kbps. The command to install the code in the various nodes visits all the nodes and it was simply taking forever and I assumed the IDE had stalled.
Anyway, I took the delay word out of node 617 and put a simple counting loop in node 616 which asserts the 'right' port (which is common to the nodes) when the loop finishes. Then node 617 waits till its 'right' port is asserted, then continues with code to load the sine value into the DAC port.
This makes the sine wave much more stable. With the value of 500 in the go loop the sine wave is precisely 799.9Hz and stays rock solid on that value. I doubt if a crystal would make it much more stable but I'll try a crystal next.
With the following code, make sure 866 and 868 are loaded in block 200 to ensure they are compiled. Then '870 load' will start the app.
866 list
Anyway, I took the delay word out of node 617 and put a simple counting loop in node 616 which asserts the 'right' port (which is common to the nodes) when the loop finishes. Then node 617 waits till its 'right' port is asserted, then continues with code to load the sine value into the DAC port.
This makes the sine wave much more stable. With the value of 500 in the go loop the sine wave is precisely 799.9Hz and stays rock solid on that value. I doubt if a crystal would make it much more stable but I'll try a crystal next.
With the following code, make sure 866 and 868 are loaded in block 200 to ensure they are compiled. Then '870 load' will start the app.
866 list
sine wave generator
,
617
node
0
org
,
0
,
40
,
80
,
120
,
170
,
,
220
,
280
,
370
,
511
,
,
hart 3300; -.00433 .07943
-.64589 1.57079
,
cos
tri
2* 2* . triangle
dup *.
2
poly
-281
,
5203
,
-42329
,
37407
,
push drop pop *. + ;
scaled
2/
8000
. +
8191 12
interp ;
dac!
io b!
155
or !b ;
start
1f
128
phaseinc
dup dup or
begin
dup cos scaled
synch
right b! @b drop
dac!
over . +
end
868 list
timer count cycles
,
616
node
0
org
,
hold
for . . unext ;
,
ms
for
27063
for
. . . unext
next;
start
08 right b!
1
ms
,
go
500
hold
!b go ;
870 list
sine wave loader
host load loader load
using default ide paths
kill boots
0 708
hook
0
-hook
setup application
616
+node
616
/ram
8
/p
617
+node
617
/ram
1f
/p
visit sine path
panel pause
2
ship
Tuesday, June 18, 2013
Direct Digital Synthesis (DDS) on the GA144
I wanted a simple demo of the GA144 producing sound. I found a sample DDS app for a SeaForth chip (predecessor to the GA144) but the code syntax is sufficiently different that I couldn't simply copy it.
The concept is simple enough. Calculate the values of a sine wave over a fixed time interval and load those values into a digital-to-analog converter (DAC). Connect the DAC to a loudspeaker and listen to the pretty tone.
Rather than look up tables of (co-)sine values, which can take up lots of memory, this app chooses to approximate the values at each interval using relatively fast calculations. The code takes much less room and easily fits into the limitations of the F18 nodes in the GA144.
The concept is simple enough. Calculate the values of a sine wave over a fixed time interval and load those values into a digital-to-analog converter (DAC). Connect the DAC to a loudspeaker and listen to the pretty tone.
Rather than look up tables of (co-)sine values, which can take up lots of memory, this app chooses to approximate the values at each interval using relatively fast calculations. The code takes much less room and easily fits into the limitations of the F18 nodes in the GA144.
But it took me a couple of days to understand the code. It all resolved to two words: cos and scaled. cos in turn uses three other words: triangle, poly and *. while scaled uses interp. Now I knew that each of those words is part of the GA144 ROM library. There was also an intriguing comment in cos: 'Hart, 3300'.
So I googled for 'Hart cosine' and one of the first hits was a page by Chuck Moore about sines and cosines in colorForth. This page also explained the 'Hart, 3300' comment, viz., Computer Approximations by John F Hart, Wiley 1968; function 3300. I googled for 'Computer Approximations Hart' and found that the only copy I could buy was hardcover from Amazon. However I also found a link to an electronic copy which allowed me to read the theory behind the magic numbers that turn up in the code.
cos
f-f'
hart 3300
-0.0043 0.0794 -0.6459 0.5708
2* 2* . triangle dup *.
2
poly
-281
,
5203
,
-42329
,
37407
,
push drop pop *. + ;
The final code was this:
866 list
sine wave generator
,
617
node
0
org
,
0
,
40
,
80
,
120
,
170
,
,
220
,
280
,
370
,
511
,
,
hart 3300; -.00433 .07943
-.64589 1.57079
,
cos
tri
2* 2* . triangle
dup *.
2
poly
-281
,
5203
,
-42329
,
37407
,
push drop pop *. + ;
scaled
2/
8000
. +
8191 12
interp ;
dac!
io b!
155
or !b ;
delay
700
1
for . . unext ;
start
22
128
phaseinc
dup dup or
begin
dup cos scaled
delay
dac!
over . +
end
A couple of explanations. The delay word in the original was another node running a single timing loop and raising a signal on a comms port at the end. This in turn allows the main node to synchronise with a (relatively) accurate clock signal. However I ran into problems starting multiple nodes on the chip so I brought the synch delay into the node, preferring to sacrifice some accuracy for simplicity. The constant, 700, controls the amount of delay between DAC updates and, hence, the frequency. 500 gives around 550Hz, 700 gives around 447Hz.
interp expects an interpolation table at address 0. In this case it's not quite a linear interpolation. (Refer to DB001 F18A Technology Reference for details.) scaled halves the cos value, adds 0.5 (to scale into positive numbers only) and calls interp with s and m calculated for L=16 bits and n=3 (a 9-entry table i.e. 2**3+1) to interpolate the values between 0 and 511. dac! sets the DAC value on the node.
The output of the DAC needs to connect to a resistive load. The EVB001 kit includes some 47ohm resistors so I soldered one between the DAC pin and earth and then I was able to see a (relatively) clean sine wave output on my 'scope. It also sounded like a sine wave when I connected a speaker.
Make sure block 200 contains a load of block 866 so 'compile' will include it. The block to start the app on the GA144 (copied from 9.4 of the Users Guide) is:
872 list
demo ide boot
empty compile serial load
customize
-canon
0
fh orgn !
a-com sport ! a-bps bps ! !nam
run
talk
0 617
hook
0 64 617
boot
upd ?ram panel
22
call ;
To load and run type '872 load' and 'run' in the IDE.
My next task is to get a second node working as a timer to increase the accuracy of the sine wave (possibly using a crystal?). Then I want to port another VentureForth app, Musicbox, to the GA144.
Tuesday, June 11, 2013
Combsort for Softsim
Chuck Moore has put a bubblesort script on one of his colorForth pages:
bubble nt a! for @+ @ - + - -if
@ a -1 + a! @ push !+ pop ! then drop next ;
bubble nt a! for @+ @ - + - -if
@ a -1 + a! @ push !+ pop ! then drop next ;
Simple and compact. I was interested to see if such a simple sort could be speeded up without using too many more instructions. I found the Wikipedia entry for Combsort which calculates a decreasing gap between the two comparison values with the aim of getting the smallest values moved into place as quickly as possible.
So I started transliterating the Wiki algorithm into colorForth. The gap is shrunk by a factor of 1.3 which is apparently a result of some extensive testing with varying shrink factors. Initially I wrote a subroutine to multiply by 10 then divide by 13 (plus lots of shuffling registers around):
shrinkgap
a push dup dup or over
10
*
-13
--u/mod
pop a! push
drop drop drop pop ;
And then it hit me that 1.3 is only an approximation. There's nothing absolute about it. It occurred to me that if would be much more efficient to use factors of 2 where possible. So my shrink calculator is simply:
2/ dup 2/ +
In other words take (1/2+1/4) (i.e. 0.75) of the gap. 1/1.3 = 0.77 which is amazingly close to 0.75.
In the end my Combsort still took lots of code and only saved a few hundred cycles compared to Chuck's bubblesort.
In the end my Combsort still took lots of code and only saved a few hundred cycles compared to Chuck's bubblesort.
comb sort
,
504
node
0
org
,
4
,
19
,
7
,
1
,
12
,
13
,
0
,
,11
,
3
,
9
,
,
17
,
16
,
13
,
6
,
,2
,
1ffff
,
5
,
,
shrink
11
2/ dup 2/ + if ;
then - 2* - ;
comb
tn
push dup dup
,
...
begin
,
......
16
pop dup push a!
,
......
dup or
swpd
push
,
......
shrink
,
......
over over - . +
,
......
for dup a + b!
,
......
@ -
1
+ @b + -if
,
......
gt
20
@ @b ! !b pop pop
,
......
dup or - push push
,
......
le
then drop @+ drop next
,
...
26
dup dup dup or - +
,
......
if dup or - then
gap1?
,
......
pop
swpd?
,
......
over - and or -
,
...
until pop ;
,
2e
17 0
comb
32
r---
Friday, June 7, 2013
Quicksort and GreenArrays' arrayForth simulator from a beginner's perspective.
I purchased an EVB001 eval board for the GA144 chip from GreenArrays a year ago. I wasn't sure what I wanted to do with it at the time but eventually I realised the GA144 might be more appropriate for low power digital signal processing apps than the FPGA apps I'd been developing.
A couple of months ago I started trying to program it. Conceptually the GA144 seems simple: 144 asynchronous cores, CPU instruction set is a variant of Forth called colorForth, multiple ADCs, DACs and I/O ports. What could go wrong? :)
Well, for starters I hadn't written in Forth before. One of my first jobs (about 35 years ago) was writing assembly language code for 8080 and Z80 chips and I remembered many of the problems with low-level programming. Memory management was paramount. Speed of operation was next most important.
I later became a C programmer. Once again 90% of my time was spent managing memory. When I discovered a language called Perl in 1989 which had automatic memory management I was in Heaven.
A recent blog reminded me of the sheer elegance Perl is capable of. Here is a minimal Quicksort script that had me simply gazing in admiration:
The algorithm takes full advantage of Perl's automated memory management. Copies of lists created by the greps and the recursive calls to quicksort probably eat up huge chunks of RAM. But the "simplicity" and "obviousness" of the algorithm are outstanding.
So could I implement Quicksort in colorForth in the arrayForth IDE simulator?
A quick scan of the online literature brought up this. The problem for me is that it is written in "standard" Forth. I hadn't looked at Forth until about a year ago and to complicate matters colorForth and Forth have almost nothing in common syntactically. Even when the instructions are the same (e.g. 'or') they do different things ('or' is "inclusive OR" in Forth, "exclusive OR" in colorForth). So unless I was prepared to thoroughly learn Forth in addition to colorForth there was no way I could use Will Baden's algorithm.
So I decided to implement the algorithm from scratch using the GreenArrays manuals and user guides.
I quickly discovered I was back in memory management hell.
Programming the GA144 is mostly about memory management. Each core (GreenArrays call it a 'node') has only 64 18-bit words of RAM plus 64 18-bit words of ROM library. The 32 colorForth instructions are sufficient to implement a viable version of Forth. There is a 10-word data stack and a 9-word return stack, a general purpose register, A, and a write-only B register used mostly to talk to the I/O ports but which can address all of RAM and ROM.
There are 144 nodes in the GA144 chip running asynchronously but able to synchronise with the other nodes via the I/O port communication mechanism.
So the first decision I had to make was whether to use one node or multiple? The Perl example used only 13 elements in the list and since Quicksort uses O(log N) stack elements it seemed possible to fit the code and the data in the same node. (The example below runs out of stack space with 18 or more elements in the data array. It also runs out of instruction space with more than 19 elements.)
The second decision was to follow the wikipedia example using in-place element exchanges rather than creating multiple sub-lists for later merging as used in the Perl example.
The following code is mostly a line-for-line transliteration of the wikipedia example. I did steal one idea from Will Baden's example. I used his midpoint pivot selection code to bypass Quicksort's weakness when sorting already sorted lists.
The arrayForth IDE is a PITA. It's a steep learning curve for a total beginner like me. And it can't import code from outside sources. Every character has to be hand-typed. So "literate" documentation is virtually impossible and largely a waste of time. But eventually I found that the ability to quickly change an instruction and see its effect in the Softsim simulator provided me with all I needed to learn the colorForth instruction set. Admittedly the IDE won't teach me the idioms Forth programmers have used for decades. Nor will it teach me the idioms colorForth programmers have discovered. But, again, the IDE contains the full source code for colorForth, arrayForth, polyForth and a heap of tests displaying how one uses colorForth in "real life".
I also found useful idioms in Chuck Moore's notes about colorForth's instructions.
And GreenArrays' email support has been superb. Every question of mine, no matter how dumb, has been answered quickly and comprehensively.
The program code below is in coloured text (red for method definitions, green for instructions and yellow for data) and I've interspersed my comments in black text.
The code and data fill 56 of the available 64 words of RAM. Not much room left for experimentation. The starting line (at the end of the listing) loads the data stack with the beginning and ending address of the data array and calls qsort which calls part to divide the data array into partitions of smaller and larger elements around a pivot value and then recursively calls itself on each of these subarrays.
part partitions the array into three sections. The "left" section contains all array elements less than or equal to the pivot value. The "pivot" section contains the pivot value. And the "right" section contains all array elements greater than the pivot value. At completion the data stack contains (left, right, pivot) addresses. (Pivot address is in T).
But most of my time was spent trying to keep the use of the data stack to a minimum because of the recursive calls to qsort. So often I had to store data in A or R rather than leaving it on the data stack for fear that a recursive call would push previous start and end addresses off the end of the 10-element stack. In fact this does happen when there's more than 18 elements in the data array.
I picked node (CPU) 503 randomly. Nothing special about it.
I put the data array at the start because I always know its start address (0).
Duplicate S and T then move T into B and S into A.
Array addresses are in S and T. Call setab to load A and B with array element addresses, then swap array elements pointed to by A and B.
Increment/decrement T.
Exchange the pivot element (address in T) with the end element (address in S). The pivot address in T can now be dropped from the stack as it is no longer needed.
over over - . + 1+ - push
S and T contain start and end addresses of array to sort. Calculate length of array less 1 for loop counter. Push loop cntr into R register.
setab puts array start address (S) into A and end address (T) into B. The end array element contains the pivot value. And at start of loop S is also the address of where values less than or equal to the pivot value will be swapped into (referred to as sx, switchindex, hereafter) so put a copy of sx into T (over).
Save sx onto return stack (nowhere else to safely store it) and load A into data stack
Load contents pointed to by A onto data stack, load contents pointed to by B into stack, subtract.
If T is negative *A (array element pointed to by A) is less than or equal to *SX (pivot value, pointed to by SX) so pop SX into T, exchange *A and *SX, increment SX and push it back into R. The dup creates a dummy value for the next line to drop
Drops either result of subtract or dummy value.
Drop leaves A in T. Increment T and store it in A so A points to next element in array. T now holds the value of B. Make a copy and pop it into B because the element exchange destroys the previous contents of B. Then pop SX into T.
When loop finishes T holds pointer to pivot value and S holds SX pointer. So can swap pivot value into its correct place prior to return. After exchange stack holds (left, right, sx).
Duplicate the start and end addresses and subtract the start address from the end address.
If the array addresses are equal there is nothing more to sort so return with stack effectively unchanged by dropping the result of the subtraction.
If the array addresses are unequal there is something to sort. After the subtraction T contains the length of the array so halve it with a left shift (2/) and add it to the start address giving the address of the midpoint of the array.
part returns (left, right, sx) so pop SX into A, push right index into R, retrieve A, decrement it, retrieve another copy of A, increment it and then pop R into T. This leaves (left, pivot-1, pivot+1, right) in stack.
First call to qsort will sort right partition. Returns when right partition is (recursively) sorted. So upon return drop pivot+1 and right. Second call to qsort will (recursively) sort left partition. Upon return whole array is sorted.
Call qsort with start and end addresses of data array.
Code without comments:
I wonder what a multi-core version of this algorithm would look like? And it's also pretty obvious that a recursive algorithm is not feasible with such a limited data stack. Might try a comb sort next.
A couple of months ago I started trying to program it. Conceptually the GA144 seems simple: 144 asynchronous cores, CPU instruction set is a variant of Forth called colorForth, multiple ADCs, DACs and I/O ports. What could go wrong? :)
Well, for starters I hadn't written in Forth before. One of my first jobs (about 35 years ago) was writing assembly language code for 8080 and Z80 chips and I remembered many of the problems with low-level programming. Memory management was paramount. Speed of operation was next most important.
I later became a C programmer. Once again 90% of my time was spent managing memory. When I discovered a language called Perl in 1989 which had automatic memory management I was in Heaven.
A recent blog reminded me of the sheer elegance Perl is capable of. Here is a minimal Quicksort script that had me simply gazing in admiration:
sub quicksort {
@_ and return (
quicksort (grep { $_ < $_[0] } @_),
grep ({ $_ == $_[0] } @_),
quicksort (grep { $_ > $_[0] } @_)
);
();
}
$, = ','; say quicksort(10,2,5,3,1,9,7,4,2,3,4,8,6);
The algorithm takes full advantage of Perl's automated memory management. Copies of lists created by the greps and the recursive calls to quicksort probably eat up huge chunks of RAM. But the "simplicity" and "obviousness" of the algorithm are outstanding.
So could I implement Quicksort in colorForth in the arrayForth IDE simulator?
A quick scan of the online literature brought up this. The problem for me is that it is written in "standard" Forth. I hadn't looked at Forth until about a year ago and to complicate matters colorForth and Forth have almost nothing in common syntactically. Even when the instructions are the same (e.g. 'or') they do different things ('or' is "inclusive OR" in Forth, "exclusive OR" in colorForth). So unless I was prepared to thoroughly learn Forth in addition to colorForth there was no way I could use Will Baden's algorithm.
So I decided to implement the algorithm from scratch using the GreenArrays manuals and user guides.
I quickly discovered I was back in memory management hell.
Programming the GA144 is mostly about memory management. Each core (GreenArrays call it a 'node') has only 64 18-bit words of RAM plus 64 18-bit words of ROM library. The 32 colorForth instructions are sufficient to implement a viable version of Forth. There is a 10-word data stack and a 9-word return stack, a general purpose register, A, and a write-only B register used mostly to talk to the I/O ports but which can address all of RAM and ROM.
There are 144 nodes in the GA144 chip running asynchronously but able to synchronise with the other nodes via the I/O port communication mechanism.
So the first decision I had to make was whether to use one node or multiple? The Perl example used only 13 elements in the list and since Quicksort uses O(log N) stack elements it seemed possible to fit the code and the data in the same node. (The example below runs out of stack space with 18 or more elements in the data array. It also runs out of instruction space with more than 19 elements.)
The second decision was to follow the wikipedia example using in-place element exchanges rather than creating multiple sub-lists for later merging as used in the Perl example.
The following code is mostly a line-for-line transliteration of the wikipedia example. I did steal one idea from Will Baden's example. I used his midpoint pivot selection code to bypass Quicksort's weakness when sorting already sorted lists.
The arrayForth IDE is a PITA. It's a steep learning curve for a total beginner like me. And it can't import code from outside sources. Every character has to be hand-typed. So "literate" documentation is virtually impossible and largely a waste of time. But eventually I found that the ability to quickly change an instruction and see its effect in the Softsim simulator provided me with all I needed to learn the colorForth instruction set. Admittedly the IDE won't teach me the idioms Forth programmers have used for decades. Nor will it teach me the idioms colorForth programmers have discovered. But, again, the IDE contains the full source code for colorForth, arrayForth, polyForth and a heap of tests displaying how one uses colorForth in "real life".
I also found useful idioms in Chuck Moore's notes about colorForth's instructions.
The program code below is in coloured text (red for method definitions, green for instructions and yellow for data) and I've interspersed my comments in black text.
The code and data fill 56 of the available 64 words of RAM. Not much room left for experimentation. The starting line (at the end of the listing) loads the data stack with the beginning and ending address of the data array and calls qsort which calls part to divide the data array into partitions of smaller and larger elements around a pivot value and then recursively calls itself on each of these subarrays.
part partitions the array into three sections. The "left" section contains all array elements less than or equal to the pivot value. The "pivot" section contains the pivot value. And the "right" section contains all array elements greater than the pivot value. At completion the data stack contains (left, right, pivot) addresses. (Pivot address is in T).
But most of my time was spent trying to keep the use of the data stack to a minimum because of the recursive calls to qsort. So often I had to store data in A or R rather than leaving it on the data stack for fear that a recursive call would push previous start and end addresses off the end of the 10-element stack. In fact this does happen when there's more than 18 elements in the data array.
503 node 0 org
I picked node (CPU) 503 randomly. Nothing special about it.
19 , 4 , 7 , 1 , 12 , 13 , 0 , 11 , 3 , 9 ,
17 , 16 , 10 , 6 , 2 , 13 , 5 ,
I put the data array at the start because I always know its start address (0).
setab over over b! a! ;
Duplicate S and T then move T into B and S into A.
exch setab @ @b ! !b ;
Array addresses are in S and T. Call setab to load A and B with array element addresses, then swap array elements pointed to by A and B.
1+ 1 . + ; 1- -1 . + ;
Increment/decrement T.
part exch drop
Exchange the pivot element (address in T) with the end element (address in S). The pivot address in T can now be dropped from the stack as it is no longer needed.
over over - . + 1+ - push
S and T contain start and end addresses of array to sort. Calculate length of array less 1 for loop counter. Push loop cntr into R register.
setab over
setab puts array start address (S) into A and end address (T) into B. The end array element contains the pivot value. And at start of loop S is also the address of where values less than or equal to the pivot value will be swapped into (referred to as sx, switchindex, hereafter) so put a copy of sx into T (over).
begin
push a
Save sx onto return stack (nowhere else to safely store it) and load A into data stack
@ @b - . +
Load contents pointed to by A onto data stack, load contents pointed to by B into stack, subtract.
-if drop pop exch 1+ dup push
If T is negative *A (array element pointed to by A) is less than or equal to *SX (pivot value, pointed to by SX) so pop SX into T, exchange *A and *SX, increment SX and push it back into R. The dup creates a dummy value for the next line to drop
then drop
Drops either result of subtract or dummy value.
1+ a! dup b! pop
Drop leaves A in T. Increment T and store it in A so A points to next element in array. T now holds the value of B. Make a copy and pop it into B because the element exchange destroys the previous contents of B. Then pop SX into T.
next exch ;
When loop finishes T holds pointer to pivot value and S holds SX pointer. So can swap pivot value into its correct place prior to return. After exchange stack holds (left, right, sx).
qsort over - over . +
Duplicate the start and end addresses and subtract the start address from the end address.
-if drop ;
If the array addresses are equal there is nothing more to sort so return with stack effectively unchanged by dropping the result of the subtraction.
then 2/ push over pop . +
If the array addresses are unequal there is something to sort. After the subtraction T contains the length of the array so halve it with a left shift (2/) and add it to the start address giving the address of the midpoint of the array.
part
part is called with the data stack containing (left, right, mid) addresses.
a! push a 1- a 1+ pop
part returns (left, right, sx) so pop SX into A, push right index into R, retrieve A, decrement it, retrieve another copy of A, increment it and then pop R into T. This leaves (left, pivot-1, pivot+1, right) in stack.
qsort drop drop qsort ;
First call to qsort will sort right partition. Returns when right partition is (recursively) sorted. So upon return drop pivot+1 and right. Second call to qsort will (recursively) sort left partition. Upon return whole array is sorted.
0 16 qsort r---
Call qsort with start and end addresses of data array.
Code without comments:
503 node 0 org
19 , 4 , 7 , 1 , 12 , 13 , 0 , 11 , 3 , 9 ,
17 , 16 , 10 , 6 , 2 , 13 , 5 ,
setab over over b! a! ;
exch setab @ @b ! !b ;
1+ 1 . + ; 1- -1 . + ;
part exch drop
over over - . + 1+ - push
setab over
begin
push a
@ @b - . +
-if drop pop exch 1+ dup push
then drop
1+ a! dup b! pop
next exch ;
qsort over - over . +
-if drop ;
then 2/ push over pop . +
part
a! push a 1- a 1+ pop
qsort drop drop qsort ;
0 16 qsort r---
I wonder what a multi-core version of this algorithm would look like? And it's also pretty obvious that a recursive algorithm is not feasible with such a limited data stack. Might try a comb sort next.
Subscribe to:
Posts (Atom)