I've been working my way through Mike Field's free ebook "Introducing the Spartan 3E FPGA and VHDL" mentioned on Gadget Factory's wiki page for the LogicStart MegaWing after a flurry of purchases of boards for learning about FPGA's. I've been using Xilinx's ISE Design Suite 14.4 (part of their Vivado Design Suite) to synthesise and implement the design and Gadget Factory's Papilio Loader to load the resulting .bit file into the Spartan 6 FPGA.
In my newbie ignorance I ordered the LogicStart as well as the RetroCade peripheral boards and although Mike's ebook is very good, I've now got to the stage where I want to branch out a bit to see if I have learned enough of FPGA basics to be reasonably independent.
The RetroCade has a 2x16 LCD display on it so I thought I'd like to write a program to use it to display some text. Initially I found VHDL examples which drive the display directly but it occurred to me that the display was never meant to be the only thing controlled by the FPGA. Dedicating a Spartan6 FPGA just to drive the display seemed wasteful in the extreme. So I looked a bit further and found Xilinx's s3e-starterkit mentioned here (clicking the download link pops up a required but free registration form).
The idea was to re-write the code for the s3 to use the s6 used in the Papilio Pro and to change the constraints file to use the pins and display on the RetroCade rather than those on the s3 starter kit board.
The starterkit code for the s3 adds a PicoBlaze 8-bit micro-controller to the design which uses less than 5% of the available circuitry. An updated version of the PicoBlaze for the s6 uses even less circuitry.
It's a fairly convoluted path to write such an FPGA application. As well as the usual VHDL file to define the actual application at top-level (pprolcd.vhd), there is also a standard processor VHDL file (kcpsm6.vhd) supplied by Xilink which defines the micro-controller in the FPGA circuitry. Then the micro has to be programmed like any standard micro, so there is an assembler (kcpsm6.exe) to translate a standard assembly program (control.psm) into a VHDL file (control.vhd) to define a block of ROM which contains the program instructions. Finally, there is a constraints.ucf file which holds the FPGA pin definitions which correspond to the RetroCade display pins.
Files are available here.
Monday, January 28, 2013
Wednesday, November 7, 2012
Indexing epubs
I have undertaken the task of converting my libary of printed books by my favorite philosopher into epub and mobi formats so I can read them on my various e-reader devices and make it easier to search them. Also some of the printed editions are nearly 80 years old and I worry how much longer they will be readable.
I've got 21 of his books which is most of them. The task has been to scan the pages of each book, OCR it into HTML and text versions, convert the text version into Pandoc's modified Markdown format, proofread and correct typos and create links for the index. Then I use Pandoc to create an epub version and Calibre's ebook-convert to convert the epub to mobi for my Kindle.
The scanning was tedious but thankfully a one-off. All of the books are small enough to allow two pages per scan. I wrote a small Perl script which uses the ImageMagick module to rotate and split the image in two and clean off the black borders. Then I tried various OCR programs including ReadIris, Abbyy Reader and Tesseract (free). All of them worked well.
Then life intervened and I've only just picked up the project in the last week or so. I got as far as converting a pamphlet and one book to epub and mobi. Big learning experience using Pandoc. The main problem is indexing. Most of the books have a hand-built index which references the printed pages by number. E-books of course don't have fixed page numbers, so I needed to add HTML links around the parts of the text corresponding to the index entry.
The first book I worked on took a couple of days to proofread then about eight days to index. It has 400 index entries. The next book I selected has nearly 800 index entries. And the task of finding the word(s) which correspond to the index entry is both repetitive and boring and thus very prone to errors from lack of attention on my part. Time to get some automated help.
I used Vim to wrap every index entry with an href tag and a unique ID. (A simple macro which incremented a register, inserted it into the href and went looking for the next entry.) Then I wrote a macro which loads the ID of the index entry into a register. Then I wrote a keymap macro which allows me to visually select the text I want to jump to for the link then wraps it in a
But still there was a lot of boring repetition finding the indexed words/phrases. And, foolishly perhaps, I had eliminated the page numbers from the Markdown text, thinking they were no longer of relevance. I had only finished 100 of the 800 entries after a week. And I hate repetition. That's what computers are for. And of course there's still the other books (and this current one isn't the longest). So I wrote another Perl script which uses the HTML file from the OCRing to extract the text of each page into a lookup table indexed by page number. The HTML conversion wasn't as accurate as the text conversion and so I hadn't attempted to correct all its typos and formatting errors. Then the script creates a table of the start and end line numbers in the Markdown text which correspond to the page extracted from the HTML. I had to manually assist a couple of entries but mostly it worked. Finally, the script reads each entry in the index and searches for the keyword in the Markdown between the start and end line numbers given for the page of the index entry and prints a line showing the line of text containing the keyword and it's line number in the Markdown file.
This final step has taken most of the repetition out of the indexing task. So with the indexing printout on one side of my display and Markdown file open in Vim on the other side I can zoom through the entries. Now I jump to the line of text which looks most likely to contain the keyword I'm trying to index, visually highlight the word(s), press F6 and F7 and the word is linked and so on. I was able to input the second hundred entries in about two hours. And the best part is that I can replicate this process for the rest of the books.
Update 1: Vim key mappings for F6 and F7:
Update 2: A (meaningless to anyone but me) workflow.
Make a Markdown file from the OCR HTML file.
Not sure why overflow but I googled and found this works:
Initialise dir as a git repo
In Vim wrap all index page numbers in href tag:
In Vim, set register y to 2 (because we want to keep ix1 as is) e.g.
or
then replace every other #ix1 with incrementing register value by running the macro behind
to run substitute over the rest.
Can now build kwiclist using kwic.pl and use it to link index entries to where they occur in the text. Process consists of finding from kwiclist the line number of the next link word(s), going to that line with Vim, visually highlighting the word(s) and pressing F6 to wrap the word(s) in a span tag and F7 to change the id no. to that corresponding to the next link. F7 also auto-increments the id no.
Use pandoc to create an HTML version to check links:
When HTML version looks right, use epub_chapfix.pl to add chapters to index links. (This is a one-way process, cannot use MD file to create HTML version after this because pandoc splits epubs into chapters based on H1 headings which are no longer usable as a local HTML file):
Check
Convert MD file to epub (Note: option
Check epub with Calibre reader, confirm format, TOC and Index then convert to mobi:
Copy .mobi to Kindle and confirm format and links. Optionally use pandoc to create a PDF version.
I've got 21 of his books which is most of them. The task has been to scan the pages of each book, OCR it into HTML and text versions, convert the text version into Pandoc's modified Markdown format, proofread and correct typos and create links for the index. Then I use Pandoc to create an epub version and Calibre's ebook-convert to convert the epub to mobi for my Kindle.
The scanning was tedious but thankfully a one-off. All of the books are small enough to allow two pages per scan. I wrote a small Perl script which uses the ImageMagick module to rotate and split the image in two and clean off the black borders. Then I tried various OCR programs including ReadIris, Abbyy Reader and Tesseract (free). All of them worked well.
Then life intervened and I've only just picked up the project in the last week or so. I got as far as converting a pamphlet and one book to epub and mobi. Big learning experience using Pandoc. The main problem is indexing. Most of the books have a hand-built index which references the printed pages by number. E-books of course don't have fixed page numbers, so I needed to add HTML links around the parts of the text corresponding to the index entry.
The first book I worked on took a couple of days to proofread then about eight days to index. It has 400 index entries. The next book I selected has nearly 800 index entries. And the task of finding the word(s) which correspond to the index entry is both repetitive and boring and thus very prone to errors from lack of attention on my part. Time to get some automated help.
I used Vim to wrap every index entry with an href tag and a unique ID. (A simple macro which incremented a register, inserted it into the href and went looking for the next entry.) Then I wrote a macro which loads the ID of the index entry into a register. Then I wrote a keymap macro which allows me to visually select the text I want to jump to for the link then wraps it in a
<scan>
tag and inserts the ID from the register. This took a lot of the repetition out of the task.But still there was a lot of boring repetition finding the indexed words/phrases. And, foolishly perhaps, I had eliminated the page numbers from the Markdown text, thinking they were no longer of relevance. I had only finished 100 of the 800 entries after a week. And I hate repetition. That's what computers are for. And of course there's still the other books (and this current one isn't the longest). So I wrote another Perl script which uses the HTML file from the OCRing to extract the text of each page into a lookup table indexed by page number. The HTML conversion wasn't as accurate as the text conversion and so I hadn't attempted to correct all its typos and formatting errors. Then the script creates a table of the start and end line numbers in the Markdown text which correspond to the page extracted from the HTML. I had to manually assist a couple of entries but mostly it worked. Finally, the script reads each entry in the index and searches for the keyword in the Markdown between the start and end line numbers given for the page of the index entry and prints a line showing the line of text containing the keyword and it's line number in the Markdown file.
This final step has taken most of the repetition out of the indexing task. So with the indexing printout on one side of my display and Markdown file open in Vim on the other side I can zoom through the entries. Now I jump to the line of text which looks most likely to contain the keyword I'm trying to index, visually highlight the word(s), press F6 and F7 and the word is linked and so on. I was able to input the second hundred entries in about two hours. And the best part is that I can replicate this process for the rest of the books.
Update 1: Vim key mappings for F6 and F7:
" Tag the visual selection as a destination
vnoremap <F6> :s/\(\%V.*\%V.\)/ \
<span id="ix1">\1<\/span>/<CR>
" Increment the tag id (uses register y)
noremap <F7> :s/ix\d\+/ \
\='ix'.(@y+setreg('y',@y+1))/<CR>
Update 2: A (meaningless to anyone but me) workflow.
cd NextBook
cp ../PreviousBook/metadata.xml .
cp ../PreviousBook/title.txt .
Edit these files to reflect new book title and date.Make a Markdown file from the OCR HTML file.
pandoc -S -s -f html -t markdown -oNB.md NextBook.html
Stack space overflow: current size 8388608 bytes.
Use '+RTS -Ksize -RTS' to increase it.
Not sure why overflow but I googled and found this works:
pandoc +RTS -K10000000 -RTS \
-S -s -f html -t markdown -oNB.md NextBook.html
Initialise dir as a git repo
git init
ga .
gc "Initial commit"
And copy an old .gitignore
and edit it to suit.cp ../PreviousBook/.gitignore .
Can now edit/proofread MD file. Remove UTF chars. They usually upset Pandoc. Wrap page numbers in HTML comment tags. Use Vim spellchecker to find obvious errors. Clean up Index entries.In Vim wrap all index page numbers in href tag:
:s/\(\d\+[fn]*\.*\)/<a href="#ix1">\1<\/a>/
In Vim, set register y to 2 (because we want to keep ix1 as is) e.g.
i2<esc>"yw
or
:let @y=2
then replace every other #ix1 with incrementing register value by running the macro behind
<f7>
once and then::.,$&
to run substitute over the rest.
Can now build kwiclist using kwic.pl and use it to link index entries to where they occur in the text. Process consists of finding from kwiclist the line number of the next link word(s), going to that line with Vim, visually highlighting the word(s) and pressing F6 to wrap the word(s) in a span tag and F7 to change the id no. to that corresponding to the next link. F7 also auto-increments the id no.
Use pandoc to create an HTML version to check links:
pandoc -S -s --epub-metadata=metadata.xml -f markdown -t html \
--toc -o NB.html title.txt NB.md
When HTML version looks right, use epub_chapfix.pl to add chapters to index links. (This is a one-way process, cannot use MD file to create HTML version after this because pandoc splits epubs into chapters based on H1 headings which are no longer usable as a local HTML file):
../bin/epub_chapfix.pl NB.md > tmp1.md
Check
tmp1.md
index entries look OK. Might need to make chapters into 3-digit, leading zeroes entries. When tmp1 looks OKmv tmp1.md NB.md
Convert MD file to epub (Note: option
--toc
is not needed for epub):pandoc -S -s --epub-metadata=metadata.xml -f markdown \
-t epub -o NB.epub title.txt NB.md
Check epub with Calibre reader, confirm format, TOC and Index then convert to mobi:
ebook-convert NB.epub NB.mobi
Copy .mobi to Kindle and confirm format and links. Optionally use pandoc to create a PDF version.
Thursday, October 18, 2012
Automating YouTube video creation and upload with Perl and Sibelius.
I was asked by the admin of stmaryssingers.com to explore the feasibility of converting their audio practice track library to videos on YouTube. Aside from the savings in bandwidth and storage space, the admin thinks there might be some possible click revenue towards the Singers costs.
I originally created the audio library of works that were being practiced by the choir by scanning, OCRing (Optical Music Recognition more accurately), importing the scans into Sibelius and exporting audio tracks for each part.
Originally it was purely to help me learn the music faster as I don't sight-read music and I don't have (music) keyboard skills. But it quickly became obvious that other choir members could use a little help so I put the tracks onto the website for easy access by members.
I originally created the audio library of works that were being practiced by the choir by scanning, OCRing (Optical Music Recognition more accurately), importing the scans into Sibelius and exporting audio tracks for each part.
Originally it was purely to help me learn the music faster as I don't sight-read music and I don't have (music) keyboard skills. But it quickly became obvious that other choir members could use a little help so I put the tracks onto the website for easy access by members.
Creating YouTube videos from Sibelius .sib files.
After receiving the request to see if it were possible to create YT videos from .sib files, it occurred to me that a more recent version of Sibelius might have that facility. Sibelius has had Scorch for a long time, which creates music videos to be played on a website using the Scorch plugin. It shows the score together with a "bouncing ball" cursor which moves in time with the audio track. Scorch has had a very small takeup. YouTube on the other hand is a bit more popular and Avid announced a "feature-limited but cheap" version of Sibelius 7, called Sibelius 7 First, which exports a music score as a YouTube video (almost) exactly as I wanted.
The problem was that Sibelius doesn't have an automated facility to create "parts" videos. The audio practice tracks I created were re-mixed to emphasise each part. So the "Soprano" mix has the Soprano part in the centre of the sound stage, the Alto off to the right, the Tenor and Bass off to the left and the Piano even further off to the right. And so on for the other parts.
What I wanted to do was create an overlay image for each part which is mixed with the plain (SATB) video so that the particular part is highlighted while the others parts and accompaniment are slightly greyed out. Then I needed to replace the SATB audio file in the video with the audio re-mix for each part.
And all of this has to be an automated script and the final step is to upload each part video to YouTube and update each description box to contain links to all the other parts in the work.
It took me a few days but it works now (mostly).
Thankfully I had stored the Sibelius (.sib) file for each work on the website. But some of them are close to five years old and my knowledge of Sibelius was pretty minuscule back then. I've had to spend a lot of time cleaning up the .sib files to make them suitable to look at not simply listen to.
The Process (so I can remember what to do next time).
1. Download and open the .sib file in Sibelius 6. Clean it up:
- Use the Text/Delete Dynamics plugin to remove anything that changes the volume. This audio is for learning to sing the notes and it helps to be able to hear them.
- Clean up the lyric lines. I'm only putting in one verse and chorus in the Soprano lyric line. I'm removing all other lyrics.
- Open the Score Info window (File/Score Info) and set the composition name and the composer. YouTube requires these fields.
- Reset the audio mix. When creating the audio library, I often saved the .sib file with one of the parts emphasised and it ruins the SATB export.
- Show the staff rulers (View->Rulers->Staff Rulers) and adjust the inter-staff spacing: 29 from the top, 12 between each singing part, 14 between Piano staves.
- Export Audio and save it as song_satb.aiff.
- Run the Playback/SetPiano plugin to set all voices to Piano. Makes it easier to distinguish notes when learning.
- Run the Playback/SetVoicePan to export an AIFF file for each vocal part in the score.
- Adjust the names of the AIFF files so there is no digit in the filename if there is only one part in that voice e.g. song_a1.aiff should be song_a.aiff if there is only one alto part. But no need to change names if there is an alto1 and alto2 part.
- Reset the mixer levels and the voice names in the score after running SetVoicePan.
- Save the .sib file.
- Click 'File'
- Click 'Export'
- Click 'Video'
- Deselect 'Use score paper texture'
- Select Resolution as 'HD (720p)'
- Edit the filename (default should be song.mov).
- Click 'Export'
Or whatever params are appropriate.
- makmov will pause after creating a single-frame PNG from the movie and the overlay PNGs (called 'gradient_*.png').
- Use Preview to line up the single frame and the gradients. Stop the script and restart with different '-o' and '-i' values if they don't line up.
- When all is OK, press 'Return' after the pause and process will create all the individual videos, upload them to YouTube and rewrite their description text with links to the other parts in the song.
Saturday, October 13, 2012
A development setup using MacVim+zsh+tmux
I was initially inspired to seriously update my devel environment by Dr Bunsen's Text Triumvirate article. In the process I discovered some great configurations for development on my MacBook Air + Thunderbolt Display setup.
I really like YADR (Yet Another Dotfile Repo). It also has some highly opinionated setups for MacVim, zsh and tmux. My main complaint with it is that I really prefer vundle to pathogen for handling Vim plugins. I forked yadr with the intention of replacing pathogen with vundle.
But yesterday I discovered The Ultimate Vim Distribution. It's not the "Ultimate" (which actually means "the last") but it uses vundle instead of pathogen and after looking a bit more carefully at yadr I realised I can use the spf13 Vim plugins without upsetting yadr. I simply had to make ~/.vim and ~/.vimrc point to the .spf13 versions, instead of the .yadr versions.
So I spent most of yesterday merging the .vimrc file from yadr with the one from spf13.
Yadr's developer had the quite brilliant insight to put most of the Vim configuration code (key mappings, config variables etc.) usually kept in .vimrc into separate files in .vim/plugin/settings because anything in .vim/plugin is automatically loaded by Vim at startup. This allowed me to reduce .vimrc from 524 sloc to 288 sloc while adding an additional 55 Bundles to the original 50.
On the other hand, spf13 has an excellent line folding scheme which I've incorporated into the keymap settings file.
Both yadr and spf13 have a preference for Ruby programming whereas I prefer Perl so I removed a lot of Ruby/Rails packages and added a couple of Perl-centric ones.
There were quite a few questionable key mappings in spf13 which I removed and/or changed. Spf13 had <C-L> to move to the right window, an extra keypress for no reason. Yadr has <C-l> and I'm using it (same for h, j and k). Spf13 stole <C-e> to open NERDTree! One of my most used keystrokes! No way. I much prefer yadr's <D-N> (Cmd-N) for the much less used NERDTree. I also disabled the changes to H and L. My muscle memory has them firmly ingrained. spf13 has some useful shortcuts for the fugitive commands (,gs => :Gstatus etc.) but omits the main one I use ,gw (=> :Gwrite). Very strange.
Of course, I wasn't happy with Yadr's tmux config either, so I proceeded to change the link to .tmux.conf from the .yadr version to my own customised version. And Yadr's zsh config only required a few additions/changes to aliases.zsh to get all the shortcuts I've been using for years now. I've not used zsh prior to using Yadr so it has been a slight learning curve from my years of bash use.
So now I have the Text Triumvirate + Ultimate Vim + YADR + My Customisations development config. Simple :)
I really like YADR (Yet Another Dotfile Repo). It also has some highly opinionated setups for MacVim, zsh and tmux. My main complaint with it is that I really prefer vundle to pathogen for handling Vim plugins. I forked yadr with the intention of replacing pathogen with vundle.
But yesterday I discovered The Ultimate Vim Distribution. It's not the "Ultimate" (which actually means "the last") but it uses vundle instead of pathogen and after looking a bit more carefully at yadr I realised I can use the spf13 Vim plugins without upsetting yadr. I simply had to make ~/.vim and ~/.vimrc point to the .spf13 versions, instead of the .yadr versions.
So I spent most of yesterday merging the .vimrc file from yadr with the one from spf13.
Yadr's developer had the quite brilliant insight to put most of the Vim configuration code (key mappings, config variables etc.) usually kept in .vimrc into separate files in .vim/plugin/settings because anything in .vim/plugin is automatically loaded by Vim at startup. This allowed me to reduce .vimrc from 524 sloc to 288 sloc while adding an additional 55 Bundles to the original 50.
On the other hand, spf13 has an excellent line folding scheme which I've incorporated into the keymap settings file.
Both yadr and spf13 have a preference for Ruby programming whereas I prefer Perl so I removed a lot of Ruby/Rails packages and added a couple of Perl-centric ones.
There were quite a few questionable key mappings in spf13 which I removed and/or changed. Spf13 had <C-L> to move to the right window, an extra keypress for no reason. Yadr has <C-l> and I'm using it (same for h, j and k). Spf13 stole <C-e> to open NERDTree! One of my most used keystrokes! No way. I much prefer yadr's <D-N> (Cmd-N) for the much less used NERDTree. I also disabled the changes to H and L. My muscle memory has them firmly ingrained. spf13 has some useful shortcuts for the fugitive commands (,gs => :Gstatus etc.) but omits the main one I use ,gw (=> :Gwrite). Very strange.
Of course, I wasn't happy with Yadr's tmux config either, so I proceeded to change the link to .tmux.conf from the .yadr version to my own customised version. And Yadr's zsh config only required a few additions/changes to aliases.zsh to get all the shortcuts I've been using for years now. I've not used zsh prior to using Yadr so it has been a slight learning curve from my years of bash use.
So now I have the Text Triumvirate + Ultimate Vim + YADR + My Customisations development config. Simple :)
Sunday, October 7, 2012
RPi + SiriProxy + IRToy = VoiceTV
I've been working on a voice-controlled TV remote for many months now. Almost got a native iOS version working using an IR dongle from L5 and the OpenEars voice to text recogniser. Then life matters intervened and I've never completed the project.
In the meantime all sorts of developments in voice-controlled apps for iOS have popped up. Apple bought Siri for the iPhone 4S then crippled it. Immediately hackers got to work and developed Siri Proxy to allow a local server to intervene when certain commands were spoken to Siri.
Also Nuance released an SDK for Dragon Naturally Speaking with hefty licencing fees but a reasonable, free evaluation period. And Thinkflood's RedEye wifi controller is much easier to program now using URLs.
It occurred to me (and I'm obviously not the first to have thought it) that if I were to attach a USB IR Toy to a Raspberry Pi computer and add the Siri Proxy app I could turn the RPi into a voice controlled WiFi to IR TV controller.
So far I've got Siri Proxy installed on the RPI and I've added an Edimax EW-7717UN WiFi adapter. Coincidentally I had accidentally destroyed the Raspbian wheezy image on the RPi's SD card so installed a new one which just happened to have the latest drivers for the 7717. It was working as soon as I plugged it into the USB port.
Next task is to add the USB IR dongle and start adding some plugins to Siri Proxy to output commands to the dongle to control the TV.
Thursday, October 4, 2012
ReadyNAS NV+ backup to Amazon Glacier Part 1
My previous posts about planning to back up my ReadyNAS NV+ to DreamHost all came unstuck when I read the DH page explaining what they mean by "Unlimited storage". I'm sure this page was added after I agreed for my account to be moved to one of their new servers in exchange for "unlimited storage and bandwidth" back in 2007. Anyway, I'm not in a position to argue.
DH does not allow me to use my account for storing "non-website-related" files, such as copies of ebooks, videos and music files I've purchased, or in my case, created myself. For "personal files", they offer up to 50GB of free but not-backed up storage and then charge 10c/month for each GB over 50.
My current storage needs are around 1TB so this make DH's "personal files" storage an expensive $95/month.
Amazon's announcement of Glacier in August didn't register at the time as a cheap form of backup for anyone. I didn't read the fine print and simply assumed it was only for S3 users of which I am not. As I read further today I realised it's ideal for my requirements as part of a "belts and braces" backup strategy. 1TB would cost me $10/mth (about the same as I'm paying for my "unlimited" DH account). It's not important if it can take up to 4 hours to retrieve files. I will already have the local backup on my Drobo. It's only if I get a double disaster/failure hitting my home systems will I need to retrieve from Glacier. It'll take me more than four hours to replace my home hardware if there's a fire etc.
During the installation I learned that Duplicity can backup to Amazon S3 as one of its backends. It seems a pretty obvious addition to allow it to backup to Glacier. But when I Googled, I found lots of requests for such an addition but no note of it happening.
However I did discover glacier-cli which is built on an equally intriguing project called git-annex. Git-annex uses the git filesystem and commands without actually having to check the files into git. Git is really fast at identifying file differences and there's a lot of user command line knowledge that annex can take advantage of.
Work in progress and worth watching.
DH does not allow me to use my account for storing "non-website-related" files, such as copies of ebooks, videos and music files I've purchased, or in my case, created myself. For "personal files", they offer up to 50GB of free but not-backed up storage and then charge 10c/month for each GB over 50.
My current storage needs are around 1TB so this make DH's "personal files" storage an expensive $95/month.
Amazon's announcement of Glacier in August didn't register at the time as a cheap form of backup for anyone. I didn't read the fine print and simply assumed it was only for S3 users of which I am not. As I read further today I realised it's ideal for my requirements as part of a "belts and braces" backup strategy. 1TB would cost me $10/mth (about the same as I'm paying for my "unlimited" DH account). It's not important if it can take up to 4 hours to retrieve files. I will already have the local backup on my Drobo. It's only if I get a double disaster/failure hitting my home systems will I need to retrieve from Glacier. It'll take me more than four hours to replace my home hardware if there's a fire etc.
During the installation I learned that Duplicity can backup to Amazon S3 as one of its backends. It seems a pretty obvious addition to allow it to backup to Glacier. But when I Googled, I found lots of requests for such an addition but no note of it happening.
However I did discover glacier-cli which is built on an equally intriguing project called git-annex. Git-annex uses the git filesystem and commands without actually having to check the files into git. Git is really fast at identifying file differences and there's a lot of user command line knowledge that annex can take advantage of.
Work in progress and worth watching.
ReadyNAS NV+ backup to DreamHost Part 3
Mystery of why scp stalled solved!
But it's taken me many days to understand the solution. I've had to learn how to use WireShark and tcpdump (both useful to know at any time). Then I ran tcpdump on the NV+ while running scp to upload to DreamHost. Then I did the same thing on my MacBook Air and the only difference I could see when I loaded the pcap files into WireShark was that the NV+ was being forced to packet split and eventually the ACK response from DH got lost. But why was it packet splitting? The MTU of the modem router is 1492. The MBA has no problem working this out and sends the correct sized packets. Why can't the NV+?
Still didn't make any sense to me. I tried a number of times switching off Jumbo packets/frames on the NV+ and setting the MTU to 1492 but scp still was forced to packet split. In my frustration I started looking for other backup solutions and discovered Amazon Glacier (more on that in a later blog).
So I started closing all the TABs I had open in Chrome concerning scp stalling and noticed a link I hadn't seen before on this page. I presume I didn't understand it when I saw it and ignored it. The link is to an FAQ for the "snail book". And as I read through it I now understood what I was seeing in the tcpdump file.
So I tried to change the NV+ MTU to 576 as suggested but Frontview limits me to 1400 upto 1500, so I entered 1400 and re-ran scp and it works! Seems the default 1492 in the NV+ Frontview input box is simply wrong for ADSL, but OK for internal LAN transfers (1492+8byte header = 1500 (default TCP packet size)).
Updated command to run Duplicity (The following should be one line.)
duplicity --exclude-globbing-filelist=$HOME/exclude.list
--archive-dir=/c/temp/cache/duplicity
--tempdir=/c/temp/tempsig
--use-scp /media/Music scp://dh/backup/nas
Note that I've specified the archive and temp directories to be on /c/. It is critical not to use the default directories on root filesystem for these. Trying to squeeze a backup of 50GB into a 2GB filesystem is sure to lead to disaster.
Subscribe to:
Posts (Atom)