Alex's blog Alex Martin XSite + XBlog https://www.alm.website/blog/ 2022-09-26 https://www.alm.website/blog/2022-09-26-note-1 Note 1 from 2022-09-26 2022-09-26 2022-09-26 In reply to Aaron Parecki

In reply to Aaron Parecki

Don't forget the mostly-red frame where the camera was destroyed mid-scan!

I almost think that's the coolest part.

https://www.alm.website/blog/2022-09-17-note-1 Note 1 from 2022-09-17 2022-09-17 2022-09-17 <em> and <strong> are not "the correct way to do italic and bold," despite what WYSIWYG editors may have you believe. <em> is for emphasis, but if you need different text, say, the name of a ship, or a Latin word, <i> is still the right tool. Both elements exist because there is more than one reason to italicize things. If you want italics for purely stylistic reasons, there is yet another tool you should be using: <span style="font-style: italic">.

<em> and <strong> are not "the correct way to do italic and bold," despite what WYSIWYG editors may have you believe. <em> is for emphasis, but if you need different text, say, the name of a ship, or a Latin word, <i> is still the right tool. Both elements exist because there is more than one reason to italicize things. If you want italics for purely stylistic reasons, there is yet another tool you should be using: <span style="font-style: italic">.

This is why I don't like WYSIWYG, and why I don't write my site in Markdown...

https://www.alm.website/blog/2022-09-12-note-1 Note 1 from 2022-09-12 2022-09-12 2022-09-12 I had to fiddle with systemd for a while but I fixed a problem where ejabberd's /run directory wasn't getting created. The LDAP server starts after whenever it would normally happen, and ejabberd's account lives in LDAP for... reasons. Now it waits for the LDAP server to start and then creates it as part of the daemon startup.

I had to fiddle with systemd for a while but I fixed a problem where ejabberd's /run directory wasn't getting created. The LDAP server starts after whenever it would normally happen, and ejabberd's account lives in LDAP for... reasons. Now it waits for the LDAP server to start and then creates it as part of the daemon startup.

https://www.alm.website/blog/2022-09-11-various-stuff Various stuff I've been up to 2022-09-11 2022-09-11 Since I last blogged about my personal infrastructure, I've been up to a few things. I meant to write longer blog posts about each of these, but it's been a bit and I've forgotten quite a few details. Sorry. Here's some quick summaries, though.

Since I last blogged about my personal infrastructure, I've been up to a few things. I meant to write longer blog posts about each of these, but it's been a bit and I've forgotten quite a few details. Sorry. Here's some quick summaries, though.

IPv6

Perhaps most exciting, I got IPv6 up and running! Hurricane Electric was satisfied with their ability to ping me, and I was able to set up an SIT/6in4 tunnel through their tunnel broker service, with the router holding down this end; with a little minor tweaking, I got it routing packets and making router announcements for our /64, and incredibly my desktop just picked up an address immediately.

Unfortunately, our ISP happened to go down just as I was starting to work on routing and firewalling for IPv6, which led me to chase a ghost for a while. Once I realized the problem, I was in the interesting situation of knowing IPv6 was working over the Internet and over the LAN, but not knowing if the two would be able to talk to each other. Once the ISP did come back, it did in fact Just Work, so the whole network now has IPv6!

Except that I couldn't actually talk to my server from the Internet (though I could talk to my desktop). It was a slog to figure out what was wrong, and I fought with two firewalls pointlessly for several hours before finally realizing they were both actually passing the packets. It turns out they were being dropped by a Linux feature I wasn't familiar with, rp_filter. rp_filter is a bit of kernel code that implements RFC 3704; specifically, it checks incoming packets against the routing table, and if they're not coming from where the kernel's routing code would expect to send packets going to that origin, they get dropped. It turned out that my old Wireguard tunnel, which I had been using for IPv6 on that machine in particular, was still running, and the kernel had decided it was a better route to the Internet than the Ethernet link; thus, it would only accept packets on Ethernet if they came from LAN addresses. I shut down the tunnel, and everything started working.

ACME and mod_md

One of the other things I wanted to accomplish was to finish the process of getting rid of the stand-alone unpackaged Go programs I was using for a couple things. I had previously removed Gitea in favor of cgit (which, by the way, dropped my load averages considerably), which left only Caddy as a target. With Gitea removed, Caddy's only job was to terminate TLS and reverse-proxy everything to Apache, which seemed rather pointless. The main reason I'd been using Caddy was due to its extremely convenient ACME support, which allows it to get certificates from Let's Encrypt or another similar certificate authority automatically.

It turns out Apache can actually do this too, and has been able to for quite some time. mod_md provides integrated ACME support, as well as a few convenience features such as automatically redirecting unencrypted HTTP requests to HTTPS. It's still more typing than Caddy, but I was able to get my six sites with their mildly varying configurations up in a couple hours, with a couple of hitches. And with that, I deleted Caddy and am now back to entirely packaged software that gets upgrades with zero effort.

XMPP

Another thing I've wanted to do is switch to my own XMPP server; I've heard running one isn't very difficult, so using someone else's server doesn't strike me as all that necessary. I installed ejabberd and fiddled around with the config file to make it talk to LDAP and PostgreSQL (why not, right? I have the database server anyway), then after some firewall fiddling I was able to connect with my LDAP account and send and receive messages to/from my account on my previous server. After that, I had to hook up Apache to proxy its HTTP(S) ports. I still haven't completely finished XMPP setup yet; I have to fix a certificate issue with Apache for the XMPP site, and make some minor additions for full compliance with the recommended suite of XEPs (mostly to support Web-based clients, which I don't use, but 100% is a shiny number). Once I have all of this done, I'll switch over to the new XMPP account (you'll know when I do because I'll change the XMPP address on my profile, and I'll probably post a note).

https://www.alm.website/blog/2022-09-07-note-1 Note 1 from 2022-09-07 2022-09-07 2022-09-07 Is there any significant difference between ul {display: flex;} and li {display: inline;}? The latter seems to require less additional work to remove the list markers, pad the items out, etc.

Is there any significant difference between ul {display: flex;} and li {display: inline;}? The latter seems to require less additional work to remove the list markers, pad the items out, etc.

I might switch my navbar from flexbox to inline, quick testing doesn't seem to show any problems stemming from doing that...

https://www.alm.website/blog/2022-08-29-reconfiguring-home-network-2 Reconfiguring our home network, part 2 2022-08-29 2022-08-29 To follow-up on part 1: In brief, we got it working.

To follow-up on part 1: In brief, we got it working.

When last I left off, I was hoping to use hestia as a router. While that probably would have worked, it would have needed some VLAN fiddling to isolate the outside and inside of the network properly, and it would have meant considerable trouble any time the server itself went down, since someone has to connect to it via SSH to enter its disk encryption key before it will finish booting. Instead, we opted to buy a dedicated router. After a false start with a router that turned out to be from an apparently-defunct company, and which strangely didn't support IPv6 despite being a gigabit Ethernet router with quite a few advanced features, we bought a Ubiquiti EdgeRouter X SFP, which was very well-reviewed. This unit worked quite well, and within a few hours I had everything rearranged with it at the core of the network (I kept the switch, even though the router has enough ports; the wired hosts connect to the switch, and the Wi-Fi APs directly to the router, since they're topologically switches). While I was setting up the router, I discovered it has configurable passive (i.e. non-negotiating, thereby highly Etherkill-capable) PoE support; unfortunately, only for 24 volt devices, and our main access point, though it supports PoE, wants 12 volts. Our outdoor AP, though, turned out to be 24 volts, so we were able to take the PoE injector previously powering it out of the equation (salvaging a much-needed extra Ethernet cable).

While the router was comparatively easy, even given that I was trying to avoid disrupting the existing functioning of the network where I could, I couldn't get the new modem working; I still had the old modem/router/AP combo unit serving as the modem, with its DHCP server and Wi-Fi functions disabled and its "forward everything" setting pointed at the new router's static address on its segment, then the router set with a static route to it for 0.0.0.0/0. This worked perfectly, but I wasn't happy to keep the old hardware. I was on hold with the ISP for an hour and a half or so, but once I got on the phone with an actual person, we got it resolved pretty quickly; all that needed to happen was to whitelist the new modem's MAC address (a process slightly frustrated by a typographical error). As soon as that was done, it rebooted itself and the router got a DHCP lease from the ISP. Naturally, though, it was a different address from our previous one, so I had to go update my DNS entries, which was itself a difficult process because I had created a routing table entry (in an attempt to be able to talk to the modem's administrative interface) that ended up directing swathes of traffic to entirely the wrong place. With that removed, I was able to get to the DNS control panel and fix the name entries, putting my various personal services back into reachability.

Regrettably, the ISP specifically told me when I asked (since I was on the phone anyway) that they don't currently support IPv6, so, for the moment, I remain on my previous IPv6 solution (a Wireguard tunnel to a VPS in New York that has an embarassingly-small block of IPv6 addresses, which at the moment only covers hestia). I'd like to set up a better tunnel, preferably one that would give me a whole /64 so the whole network could have IPv6. Some time ago, the ISP was dropping ICMP within their network somewhere, which prevented setting up a Hurricane Electric IPv6 tunnel, but they must have stopped; with a firewall rule to allow ICMP traffic, they were able to ping us, so a project for sometime soon will be setting up tunnelled IPv6 for the whole network (I've already got the tunnel itself running, I just have to figure out routing and advertise the prefix through NDP so hosts on the network can get addresses).

Anyway, the entire pile of hardware has now been shoved atop a convenient high place where the AP has good sight lines, and hopefully nobody will have to think about it too much unless they (I) want to. Victory!

https://www.alm.website/blog/2022-08-25-note-1 Note 1 from 2022-08-25 2022-08-25 2022-08-25 I like CGI. I know it's slow, but I never run at that scale. It's so simple. I can just write a response to standard output. I don't need to learn a framework, I don't need routing or HTTP parsing, I can just put a normal program on my server and it will work.

I like CGI. I know it's slow, but I never run at that scale. It's so simple. I can just write a response to standard output. I don't need to learn a framework, I don't need routing or HTTP parsing, I can just put a normal program on my server and it will work.

https://www.alm.website/blog/2022-08-20-note-2 Note 2 from 2022-08-20 2022-08-20 2022-08-20 I think the only things left to add to Lillybooks now are a better tokenizer than just splitting on spaces, and a Misskey driver.

I think the only things left to add to Lillybooks now are a better tokenizer than just splitting on spaces, and a Misskey driver.

I've got CW-based filtering now, including filtering out any post with a CW (through a slight hack...). If someone decides they want plaigarism avoidance... Well, I guess I'll take patches, but it hasn't been that copy-happy in my testing.

https://www.alm.website/blog/2022-08-20-note-1 Note 1 from 2022-08-20 2022-08-20 2022-08-20 I'm working on Lillybooks again, I've implemented filtering based on CWs, and I'm moving token filtering into the database instead of an external filter file (this is better because it means less bouncing between SQL and Python).

I'm working on Lillybooks again, I've implemented filtering based on CWs, and I'm moving token filtering into the database instead of an external filter file (this is better because it means less bouncing between SQL and Python).

https://www.alm.website/blog/2022-08-19-note-1 Note 1 from 2022-08-19 2022-08-19 2022-08-19 More than once the Rust compiler has demonstrated a better understanding of good software architecture than me, and as a result of its complaints I ended up with a program that was not just safer but cleaner and more performant.

More than once the Rust compiler has demonstrated a better understanding of good software architecture than me, and as a result of its complaints I ended up with a program that was not just safer but cleaner and more performant.

https://www.alm.website/blog/2022-08-17-shoot More foot-shooting 2022-08-17 2022-08-17 I've updated one of my most popular pages, Shooting yourself in the foot in various programming environments, with a number of new additions, and a few changes to more accurately reflect how they make me feel.

I've updated one of my most popular pages, Shooting yourself in the foot in various programming environments, with a number of new additions, and a few changes to more accurately reflect how they make me feel.

https://www.alm.website/blog/2022-08-17-v4.1-xslt-only Version 4.1: Going XSLT-only 2022-08-17 2022-08-17 I decided to redesign my static site generator, XSite, because the way it worked previously was kind of limiting. So I threw out XSlots, my custom templating language, and switched to using XSLT for everything. My entire site is now generated with XSLT.

I decided to redesign my static site generator, XSite, because the way it worked previously was kind of limiting. So I threw out XSlots, my custom templating language, and switched to using XSLT for everything. My entire site is now generated with XSLT.

In order to do this, I ended up implementing all of XSlots' unique features as XPath functions (and to do that, I added support for XPath extension functions and XSLT extension elements (so far unused) to XSite's plugin system). As a consequence of having done this, I was able to implement a couple of things I've wanted to do for a while: Previews on the blog index page, and summaries and full content in the Atom feed.

There was a lot of back-and-forth necessary to get all of this working, and most of it doesn't show up in the Git history; most of the commits involved in this hide at least two or three things that didn't work.

While I was making changes to the templates, I also added a bunch of microformats2 markup, specifically h-feed on the blog index, h-entry on blog posts themselves (including for the summary entries in the index), and h-cite on mentions. I want to add a p-author h-card for myself on blog posts, but I can do that later.

One interesting possibility with XSLT is that, because it can do much more complex processing, it would be feasible to actually make the input documents full HTML documents (with <head> elements and such), then pick them apart and put them back together with template content. I don't know if that's actually a good idea, but it seems interesting.

Anyway, I'm glad to be away from the world of juggling what was basically two different templating languages, each with separate capabilities. I am thinking about the idea of bringing back the XSlots language, compiled to XSLT, for conciseness, but not too seriously; I'm personally fine with verbosity, so unless other people who aren't complain, I probably won't go the the effort of building something I probably won't use much.

This is a major engineering change to the site, but the content and structure are basically the same, so: Version 4.1.

https://www.alm.website/blog/2022-08-13-note-1 Note 1 from 2022-08-13 2022-08-13 2022-08-13 If I could revise the way keyboards work, they would have another two modifier keys, a Lock Screen key, and a Secure Attention key. Also Num Lock would be removed because the presumption would be that you always want to use the numpad as a numpad.

If I could revise the way keyboards work, they would have another two modifier keys, a Lock Screen key, and a Secure Attention key. Also Num Lock would be removed because the presumption would be that you always want to use the numpad as a numpad.

Would I keep the arrow keys? Maybe, or maybe I would add arrows to hjkl... That might be mean to non-Vim users though (I even still use the arrow keys in Vim, but maybe having Vim hjkl keys would help me learn).

Also Shift+Caps Lock would toggle a Caps Lock Lock that prevents Caps Lock from being toggled.

https://www.alm.website/blog/2022-08-07-reconfiguring-home-network-1 Reconfiguring our home network, part 1 2022-08-07 2022-08-07 We embarked on a project to rebuild our home network today; here's what's happened so far.

We embarked on a project to rebuild our home network today; here's what's happened so far.

The person responsible has resisted a rebuild for a while, though it's been necessary, because they wanted to see if fast satellite (the new kind, like OneWeb and Starlink) would become viable; they've now decided it's not going to happen soon. So, we bought three new pieces of hardware: A generic 5-port switch, a generic 2-port gigabit DOCSIS 3.1 cable modem (the second port is unnecessary, but it's what they had), and a fancy Wi-Fi access point I had good experiences with at work. It's designed for commercial deployments, with SNMP, RADIUS/WPA-Enterprise, the ability to configure a whole bunch of SSIDs, and all the rest; it's not a router, just an access point. We don't need all of this, but we decided to go with it because I already know it can cover the area and get good performance.

I set everything up without touching the existing network at first, leaving the existing leased modem-router-AP in place and just moving my machines over so I could make sure they would be stable on the new network. I also turned off the radios on the new AP before I configured the SSID so it wouldn't disrupt the existing network. After a bit of troubleshooting, I was reasonably confident the new setup should work, so I unplugged the ISP box and moved the cable line to the new modem.

Unfortunately, it didn't work; I couldn't get a DHCP address and I couldn't talk to the Internet. My main suspicion is that the ISP will only talk to a machine with a fixed MAC address, so we'll need to call them and set that up. So, I left the new AP's radios turned off and turned the ISP box back on.

Additionally to that, I made a rather interesting discovery: The new modem appears to only act as a DHCP server while the WAN link is disconnected; once the WAN comes up, it drops out. I suspect this is for troubleshooting; if your WAN link goes down and you rely on the ISP to provide your single machine with its IP address via DHCP, you wouldn't be able to talk to the modem to troubleshoot, so it gives you an address in the same block as its Web server. Annoyingly, there is no way to disable this behavior, so it will become a rogue DHCP server automatically in the event that the WAN link dies and you do have a proper DHCP server on the network. Or maybe that's why it waits so long to send offers...

So we'll need a DHCP server, and probably a router; the modem doesn't seem to have any NAT settings, so the packets need to go through somebody else first to do NAT if we don't want to give every machine on the network a public IPv4 address (which I doubt the ISP would be up for)... I think hestia (the server that also runs my Web site, my LDAP directory, my Git hosting, my personal file sharing, and hopefully someday my mail and certain other things) can take up that role.

My hope is that it can be set up like this:

ISP <-> Modem <-> Ethernet switch <-> hestia

Where hestia gets either a DHCP lease or static assignment of a public address from the ISP as well as being 192.168.0.1, and NATs IPv4 for everyone else.

I don't actually know much about how a DOCSIS CPE is supposed to work, but I know that at work, where we have a similar modem and the same ISP, the firewall, and not the modem, has the public IP address the rest of the world sees; it's connected to the modem by an Ethernet cable. I presume this means either that the ISP can see the Ethernet traffic on the other end of the modem (a little (a lot) spooky and probably inefficient) and can answer DHCP requests or that anybody with the globally-routable address assigned statically will just be able to send IP packets over Ethernet to a static gateway address. Hopefully, a switch in the middle won't cause any trouble.

I'm also hoping IPv6 (once enabled on the ISP side, they say they support it but the modem reported IPv4-only address provisioning) won't need anything special?

Read the conclusion in .

https://www.alm.website/blog/2022-08-07-note-5 Note 5 from 2022-08-07 2022-08-07 2022-08-07 This note is just a test.

This note is just a test.

https://www.alm.website/blog/2022-08-07-note-3 Note 3 from 2022-08-07 2022-08-07 2022-08-07 y'know if we'd just gone ahead and created ~/etc, ~/var, etc. as part of the standard homedir layout we might not be in this situation

y'know if we'd just gone ahead and created ~/etc, ~/var, etc. as part of the standard homedir layout we might not be in this situation

https://www.alm.website/blog/2022-08-07-note-2 Note 2 from 2022-08-07 2022-08-07 2022-08-07 I really dislike the recent trend of taking command-line tools that were previously implicitly accessible to people with visual disabilities and piling unnecessary TUIs or terminal pseudographics on top so that the actual text stream becomes unreadable or nearly so.

I really dislike the recent trend of taking command-line tools that were previously implicitly accessible to people with visual disabilities and piling unnecessary TUIs or terminal pseudographics on top so that the actual text stream becomes unreadable or nearly so.

https://www.alm.website/blog/2022-08-07-note-1 Note 1 from 2022-08-07 2022-08-07 2022-08-07 I love how simultaneously incredibly boring and very exciting RISC-V is.

I love how simultaneously incredibly boring and very exciting RISC-V is.

https://www.alm.website/blog/2022-08-06-note-3 Note 3 from 2022-08-06 2022-08-06 2022-08-06 Follow-up to Note 2: It appears certain variable substitutions are allowed in cgitrc, including one called $HTTP_HOST. I think that solves this.

Follow-up to Note 2: It appears certain variable substitutions are allowed in cgitrc, including one called $HTTP_HOST. I think that solves this.

https://www.alm.website/blog/2022-08-06-note-2 Note 2 from 2022-08-06 2022-08-06 2022-08-06 Hmm, problem attempting to switch to cgit: How do I virtualhost?

Hmm, problem attempting to switch to cgit: How do I virtualhost?

I need to serve two different sets of repositories from the same machine, under different hostnames.

https://www.alm.website/blog/2022-08-06-note-1 Note 1 from 2022-08-06 2022-08-06 2022-08-06 I'm gonna have to get rid of these two Go programs. Just download the binary and move it to /usr/local/bin sounds nice until you realize you have to do that by hand every time they release a new version.

I'm gonna have to get rid of these two Go programs. Just download the binary and move it to /usr/local/bin sounds nice until you realize you have to do that by hand every time they release a new version.

Having to run 3-4 commands every time an RSS entry shows up is infinitely more difficult than doing literally nothing, which is how I get upgrades for literally every single other application on this server. And no, I will not be writing a script to poll the RSS feeds and install new versions automatically. I have a package manager already.

https://www.alm.website/blog/2022-08-03-note-1 Note 1 from 2022-08-03 2022-08-03 2022-08-03 I should probably swear less.

I should probably swear less.

https://www.alm.website/blog/2022-08-01-note-2 Note 2 from 2022-08-01 2022-08-01 2022-08-01 Has anyone found a good piece of software yet? Just one?

Has anyone found a good piece of software yet? Just one?

https://www.alm.website/blog/2022-08-01-note-1 Note 1 from 2022-08-01 2022-08-01 2022-08-01 It's very satisfying to install 365 package upgrades and then only have to restart Firefox which takes like 15 seconds

It's very satisfying to install 365 package upgrades and then only have to restart Firefox which takes like 15 seconds

https://www.alm.website/blog/2022-07-31-note-1 Note 1 from 2022-07-31 2022-07-31 2022-07-31 Proposing: dm-cloud

Proposing: dm-cloud

Block device storing data in cloud storage services. Blocks are encrypted and written in rotation, and only deleted once quota is reached, minimizing analyzability. Mirroring and striping used to exploit capacity of multiple services while still withstanding lost accounts (e.g. if they decide you're breaching ToS). Only account credentials and the encryption key need to be stored locally.

https://www.alm.website/blog/2022-07-19-note-3 Note 3 from 2022-07-19 2022-07-19 2022-07-19 it would be fun to play with the ntoskrnl source some day if intellectual property is abolished or Microsoft goes under

it would be fun to play with the ntoskrnl source some day if intellectual property is abolished or Microsoft goes under

https://www.alm.website/blog/2022-07-19-note-2 Note 2 from 2022-07-19 2022-07-19 2022-07-19 that's a relief - my backup drive is working again

that's a relief - my backup drive is working again

I should still probably set up network backup though, and probably tarsnap on the server

https://www.alm.website/blog/2022-07-19-note-1 Note 1 from 2022-07-19 2022-07-19 2022-07-19 Having a personal domain under com implies you consider your personal pursuits a commercial endeavor.

Having a personal domain under com implies you consider your personal pursuits a commercial endeavor.

https://www.alm.website/blog/2022-07-18-note-1 Note 1 from 2022-07-18 2022-07-18 2022-07-18 Does anyone know if it's safe to share a Guix /gnu/store directory between two operating systems on the same disk? It feels like it should be fine, but this kind of thing makes me nervous.

Does anyone know if it's safe to share a Guix /gnu/store directory between two operating systems on the same disk? It feels like it should be fine, but this kind of thing makes me nervous.

https://www.alm.website/blog/2022-07-16-note-1 Note 1 from 2022-07-16 2022-07-16 2022-07-16 I found Wine inside the WSL image on my work laptop yesterday. ???

I found Wine inside the WSL image on my work laptop yesterday. ???

https://www.alm.website/blog/2022-07-15-note-1 Note 1 from 2022-07-15 2022-07-15 2022-07-15 Okay so microformats2 right

Okay so microformats2 right

Expanding brain meme:

  1. p-sex
  2. p-gender
  3. e-gender
  4. h-gender and u-gender

How about it? :>

https://www.alm.website/blog/2022-07-15-more-evil-sqlite More evil SQLite 2022-07-15 2022-07-15 Here's some more evil use of SQLite.

Here's some more evil use of SQLite.

CREATE TRIGGER sqlar_update
  INSTEAD OF UPDATE
    ON sqlar
  BEGIN
    -- ...
  END;

(Explanation: sqlar is the name of the table used for a SQLite Archive. Here I've implemented it as a view, rather than a table, and written INSTEAD OF triggers to translate UPDATEs, and INSERTs and DELETEs too, to the radically differently-arranged tables that the view pulls from. So my rather complex format should also as a side feature be accessible by the SQLite shell as an archive (or by the standalone sqlar program).)

https://www.alm.website/blog/2022-07-13-note-1 Note 1 from 2022-07-13 2022-07-13 2022-07-13 Occasionally I get a desire to install like a dozen Windows 2000 VMs and set them up in an Active Directory forest just to see how it worked before they piled 20 years of Innovation on top.

Occasionally I get a desire to install like a dozen Windows 2000 VMs and set them up in an Active Directory forest just to see how it worked before they piled 20 years of Innovation on top.

https://www.alm.website/blog/2022-07-10-hif Some non-criminal database design: HIF 2022-07-10 2022-07-10 I made a non-evil database design. This one is an instance of SQLite as an Application File Format, because I've always wanted to do that.

I made a non-evil database design. This one is an instance of SQLite as an Application File Format, because I've always wanted to do that.

It's called HIF (HERO Interchange Format), and it's for character sheets used in my personal favorite role-playing game system.

https://www.alm.website/blog/2022-07-09-note-1 Note 1 from 2022-07-09 2022-07-09 2022-07-09 The best sign of reliable information is URIs matching /wp-content/*.pdf.

The best sign of reliable information is URIs matching /wp-content/*.pdf.

https://www.alm.website/blog/2022-07-08-crime-against-database-design A crime against database design 2022-07-08 2022-07-08 I present, a crime against database design, in interface definition form.

I present, a crime against database design, in interface definition form.


Inputs consist of one or more rows inserted into a table resembling the following:

CREATE TABLE input_data (
  seq INTEGER PRIMARY KEY AUTOINCREMENT
);

The input_data table must have that name exactly, but may (and should) have following columns to contain the data.

Once rows are inserted into input_data, the database should generate output accessible via a table or view resembling the following:

CREATE TABLE output_data (
  seq INTEGER PRIMARY KEY AUTOINCREMENT,
  done INTEGER CHECK(done IN (0, 1))
);

Again, output_data must have that name exactly and may have following columns, comprising the output.

After inserting each input row, output_data should be checked for new rows with a greater seq value than previously seen; these are the output rows. The database is not required to preserve old output rows when new input is inserted; however, seq must continue to increase throughout the process. Once all input data has been inserted, if the second column (done) of the last output row is 0, rows of all NULL should continue to be inserted into input_data until a row with done 1 appears at the end of output_data.

input_data should be cleared (truncated/deleted from unconditionally) before any new inputs. Once this occurs, the database is permitted to reset or rewind the seq values used for the next output, and the same is true of the next input.

https://www.alm.website/blog/2022-07-06-note-2 Note 2 from 2022-07-06 2022-07-06 2022-07-06 MediaWiki is one of the best Web applications, because the "Web" part is the focus, not the "application" part.

MediaWiki is one of the best Web applications, because the "Web" part is the focus, not the "application" part.

https://www.alm.website/blog/2022-07-06-note-1 Note 1 from 2022-07-06 2022-07-06 2022-07-06 If Walt Disney's original EPCOT concept had actually happened it would probably have been destroyed by riots within the first ten years.

If Walt Disney's original EPCOT concept had actually happened it would probably have been destroyed by riots within the first ten years.

https://www.alm.website/blog/2022-07-03-broadcast-pipes 'Broadcast' pipes on Linux 2022-07-03 2022-07-03 A little while ago, I wrote a couple of Fediverse posts about wanting a sort of broadcast pipe and subsequently a solution for implementing one without coordination of a list of recipients. (Those are Misskey links, so unless you have a high-powered machine I recommend copy-pasting the URLs into another Fediverse server's search box to load them there instead of opening them directly...)

A little while ago, I wrote a couple of Fediverse posts about wanting a sort of broadcast pipe and subsequently a solution for implementing one without coordination of a list of recipients. (Those are Misskey links, so unless you have a high-powered machine I recommend copy-pasting the URLs into another Fediverse server's search box to load them there instead of opening them directly...)

At the time, I said I would write a blog post about the topic maybe the next day. Well, that didn't happen then, but now it is happening. In this post I'll explain what a broadcast pipe is, the use case for such a thing, my solution, and the weaknesses in it that I'm aware of.

I didn't actually write or test code implementing this, so there could be a problem in the design I'm unaware of, but as far as I'm aware it should work in theory.

The problem

A pipe, on Unix, is a one-way stream of bytes between two processes. One end of the pipe acts like a write-only file, and the other end like a read-only file. Anything written to the write end can later be read from the read end. This provides a simple but powerful inter-process communication mechanism that Unix uses mainly to implement pipelines, which are chains of processes where the output of each is directed to the input of the next. Another kind of pipe is called a FIFO file, or named pipe. This is a file in the filesystem, but when opened it provides you with one end or the other of a persistent pipe (depending on whether you open it for reading or for writing). A FIFO is usually used to provide a way of sending commands to a process, a queue of data to be processed, or other similar constructions.

What I want is the ability to use FIFOs, or pipes, to send messages to multiple receivers. My main use case is the implementation of an event system for LENS. I don't want receivers to have to register themselves with a central broker process or something similar, which eliminates possibilities such as a directory full of FIFOs, where a message is written to all of them. In practice, that exact solution is probably the best way to implement this, but I'd like to see what the alternative looks like. Since LENS runs atop Linux (the kernel), I don't need to worry about other Unix systems and will be focusing on the facilities provided by Linux.

The solution

The critical discovery to make it possible to implement this is a Linux system call named tee. The tee system call performs a (logical) copy from one pipe to another, but doesn't remove the data from the first pipe. This means that multiple readers can get the same data from the same pipe. Each reader creates a second, private pipe, which it uses tee to fill with data from the broadcast pipe. It can then read the data from its private pipe at its convenience.

Unfortunately, there is a problem: If readers don't remove data from the broadcast pipe, the data is never removed at all. As a result, it gradually fills up with data. A pipe has a maximum capacity, which, when reached, will prevent more data from being written. Also, all of the historical data in the pipe remains there forever and will be read by readers who join later, which might be undesirable if the data expires. To solve this, somebody needs to read the data from the pipe, and the sensible participant to have this responsibility is the writer, since it put the data there in the first place.

While I wanted to be able to have a completely unaware writer, which doesn't even need to know it's doing anything special, this isn't feasible, at least with the tee approach. The writer will need to have some special logic. Additionally, because we don't want a race condition between readers trying to get the data and the writer trying to clear out the pipe, there needs to be some kind of locking in the picture.

The sub-problem

The lock we need is one that allows only one writer at a time (because we only need one at a time, and it simplifies the implementation), but allows many readers. The writer needs to clear out the data from the pipe only after all readers have done their tee calls, otherwise some readers will miss it; the lock will control erasing the data, not the actual write. But, readers may try to take the lock again quickly after they release it, so bad timing could lead to the writer never getting it. This would result in the readers all getting duplicate data until the writer eventually won the race to take the lock. So the lock needs to ensure a writer always wins the race.

To summarize, we need a reader-writer lock that ensures that if a writer is waiting, readers will never take the lock before it gets it. There isn't a built-in lock in Linux with these semantics, so we need to build it ourselves.

The sub-solution

Fortunately, Linux has a dedicated feature for building your own locks. A futex is a synchronization primitive provided by the kernel, consisting of a 32-bit futex word, access to which is synchronized using the futex system call. This system call can do one of a few things, depending on flags. For our purposes, there are two operations:

  • Atomically check that the futex word has an expected value (and return if not), then block until the futex is signaled to indicate a change in its value. The check ensures that the value isn't changed by another process/thread between when the user-space locking logic decides to wait and when the futex call runs, which could result in waiting on an available lock or similar problems.
  • Signal that the futex word has changed value, waking up a specified number of waiters.

Futexes are generally used in a loop, where you repeatedly check if the lock is in a state that allows you to take it, calling futex to wait if not, and then eventually using atomic CPU instructions like compare-and-exchange to update the lock to indicate that you have it. If someone else touches the lock while you're trying to take it, you restart the loop.

We'll implement our reader-writer lock using a futex. The futex word will be UINT32_MAX if a writer holds the lock (because nobody else can take it while a writer has it), and otherwise the low 31 bits will indicate the number of readers holding it, while the uppermost bit will indicate that at least one writer is waiting to take the lock (and therefore that no new readers should take it). Zero indicates no writer waiting, and no readers holding the lock, meaning the next comer gets the lock immediately.

A reader will follow these steps:

  1. Wait for the high bit to be unset. If a writer holds the lock, UINT32_MAX has a set high bit. If a writer is waiting, this avoids starving it.
  2. Atomically increment the futex word. If the lock is unheld (zero), this takes the lock for readers. If it's held already, it increases the number of readers holding it. If the increment fails, start over.
  3. When done reading, atomically decrement the futex word. If it decremented to zero, then wake all waiters.

A writer, meanwhile, will follow these steps:

  1. Atomically set the high bit (if it is already set, that's fine, continue).
  2. Wait for the futex word to be 2147483648, or 0x80000000 in hex (all zeroes except the high bit, indicating a writer (us) is waiting, and no readers or writers hold the lock). On each wake, ensure the high bit is still set.
  3. Atomically set the futex word to UINT32_MAX. If the atomic set fails, start over (another writer took the lock).
  4. When done, set the futex word to zero (doesn't need to be atomic because the writer holds exclusive control of the lock, and the most that will happen is atomic sets of the high bit by other writers waiting, which don't change the value), then wake all waiters.

The problems with this lock are that it frequently falls victim to a thundering herd, and it will sometimes give writers two turns in a row, which for our purposes could result in readers missing a write. This can probably be mitigated to some extent by writers waiting briefly after unlocking before trying to lock again, but not if there is more than one writer.


Now that we have our lock, we can actually implement a broadcast pipe. The broadcast pipe consists of the pipe itself and a shared-memory region holding the futex (in retrospect, we could probably just put the data into the shared memory... it's always going to have to use up at least PAGESIZE of memory). A single writer can follow these steps (multiple writers would step on each other, necessitating a second simpler lock ensuring there is only one writer):

  1. Write the data, without locking.
  2. Immediately try to take the lock. Readers will be holding the lock, and the writer will wait until they all release it, signalling they have read the data.
  3. Read the data using read, removing it from the pipe. Do something with it or throw it away.
  4. Release the lock.

Meanwhile, readers follow these steps:

  1. Create a private pipe to store retrieved data for later reading.
  2. Take the lock.
  3. Use tee to peek the data into the private pipe. Read it later when it's convenient.
  4. Release the lock, signalling that the data has been read.

Obviously, a reader can re-use the private pipe from one read to another.

Is this useful?

No.

Well, the lock might be. But the broadcast pipe is too complicated to be a savings over a directory full of FIFOs/sockets, a broker process, or lock-controlled shared memory. I don't plan to actually use this anywhere, but it was interesting to design.

https://www.alm.website/blog/2022-07-02-note-2 Note 2 from 2022-07-02 2022-07-02 2022-07-02 So I've talked about this before, but Java as a language is, like... okay.

So I've talked about this before, but Java as a language is, like... okay.

Its APIs are kinda verbose, but overall it's a decent language.

The biggest problem is its build system and runtime environment. class files are a pain and JARs only paper over that.

https://www.alm.website/blog/2022-07-02-note-1 Note 1 from 2022-07-02 2022-07-02 2022-07-02 Design Patterns Considered Harmful, TBH

Design Patterns Considered Harmful, TBH

https://www.alm.website/blog/2022-06-30-note-1 Note 1 from 2022-06-30 2022-06-30 2022-06-30 If anyone has a recommendation for a Matrix client or a way of using Matrix via XMPP, I'd appreciate it. I've already ruled out Element, nheko, Quaternion, and FluffyChat, and I'd like to be able to handle encrypted conversations and rooms. I realize there's probably not one but I figured I should ask.

If anyone has a recommendation for a Matrix client or a way of using Matrix via XMPP, I'd appreciate it. I've already ruled out Element, nheko, Quaternion, and FluffyChat, and I'd like to be able to handle encrypted conversations and rooms. I realize there's probably not one but I figured I should ask.

https://www.alm.website/blog/2022-06-29-note-1 Note 1 from 2022-06-29 2022-06-29 2022-06-29 Need to make my webmention scripts better, they still pull the wrong author from Mastodon replies, and some other problems.

Need to make my webmention scripts better, they still pull the wrong author from Mastodon replies, and some other problems.

https://www.alm.website/blog/2022-06-28-note-2 Note 2 from 2022-06-28 2022-06-28 2022-06-28 what it is: memory access violation

what it is: memory access violation

what it's called: segmentation fault


great job

https://www.alm.website/blog/2022-06-28-note-1 Note 1 from 2022-06-28 2022-06-28 2022-06-28 I have far too many project ideas for a person of my level of executive function.

I have far too many project ideas for a person of my level of executive function.

https://www.alm.website/blog/2022-06-27-note-2 Note 2 from 2022-06-27 2022-06-27 2022-06-27 *sees a commit in my site repository named send/receive mentions*

*sees a commit in my site repository named send/receive mentions*

*gets excited*

*commit is just recreating an empty moderation queue file because I deleted it*

:/

https://www.alm.website/blog/2022-06-27-note-1 Note 1 from 2022-06-27 2022-06-27 2022-06-27 What do you mean database? Everyone around here just uses a directory full of XML files.

What do you mean database? Everyone around here just uses a directory full of XML files.

https://www.alm.website/blog/2022-06-26-note-2 Note 2 from 2022-06-26 2022-06-26 2022-06-26 I need to do something quite urgently about the unpaginated, low-content index and Atom feed on this site.

I need to do something quite urgently about the unpaginated, low-content index and Atom feed on this site.

https://www.alm.website/blog/2022-06-26-note-1 Note 1 from 2022-06-26 2022-06-26 2022-06-26 no points for correctly guessing what the opcode mnemonic iaddi8 means

no points for correctly guessing what the opcode mnemonic iaddi8 means

https://www.alm.website/blog/2022-06-25-bittorrent About BitTorrent 2022-06-25 2022-06-25 I want to stress to those unaware that BitTorrent can be, and often is, used for entirely legitimate purposes. It is a protocol for downloading large files in a scalable way. It's a good choice any time you want to make a big download available to lots of people simultaneously.

I want to stress to those unaware that BitTorrent can be, and often is, used for entirely legitimate purposes. It is a protocol for downloading large files in a scalable way. It's a good choice any time you want to make a big download available to lots of people simultaneously.

Personally, my only use of BitTorrent has been downloading ISO images for live-boot or installation of Linux distributions, which is not piracy.

ISPs: Please stop indescriminately blocking protocols just because they have been used for piracy. HTTP is also used for piracy. Are you going to block the entire Web?

https://www.alm.website/blog/2022-06-25-note-2 Note 2 from 2022-06-25 2022-06-25 2022-06-25 Why is it called a saucepan when it's clearly a pot?

Why is it called a saucepan when it's clearly a pot?

https://www.alm.website/blog/2022-06-25-note-1 Note 1 from 2022-06-25 2022-06-25 2022-06-25 If you couldn't tell, I'm starting to try and move towards POSSE, using this site.

If you couldn't tell, I'm starting to try and move towards POSSE, using this site.

This is partially just because I think that's a good way to do things, and partially because I feel like the sheer rate that social media like the fediverse comes at is giving me anxiety and making it hard to focus on anything else.

https://www.alm.website/blog/2022-06-25-linux-filenames Linux filenames (and legal characters within them) 2022-06-25 2022-06-25 The following are legal in Linux (and most other Unix) filenames: ()[]<>$:;&@?`’“!=+*^%#~| and space. (Note here that I do mean Linux, the kernel, not any of the operating systems built on it.)

The following are legal in Linux (and most other Unix) filenames: ()[]<>$:;&@?`’“!=+*^%#~| and space. (Note here that I do mean Linux, the kernel, not any of the operating systems built on it.)

Some of them may be difficult to correctly input in some contexts, but you can use them. Linux, at least with most filesystems, only prohibits / and NUL bytes, allowing any other byte sequence, including invalid UTF-8.

However, given that filenames are supposed to be… names, it is usually best to restrain yourself to things that can actually be decoded by a person as meaningful language, or that at least can be easily input in contexts such as a shell command, which means sticking to normal word characters plus punctuation such as (English examples) .,-_ and space, used sanely.

If you do want to name a file $*@! though, nothing will stop you.

https://www.alm.website/blog/2022-06-24-note-1 Note 1 from 2022-06-24 2022-06-24 2022-06-24 For some reason I automatically look at my PineTime whenever my phone buzzes in my pocket, even though I don't have notification forwarding and have never had a smartwatch that does it.

For some reason I automatically look at my PineTime whenever my phone buzzes in my pocket, even though I don't have notification forwarding and have never had a smartwatch that does it.

https://www.alm.website/blog/2022-06-06-note-1 Note 1 from 2022-06-06 2022-06-06 2022-06-06 I'm thinking about dropping Caddy from my Web server setup and switching to using Apache httpd as the reverse proxy.

I'm thinking about dropping Caddy from my Web server setup and switching to using Apache httpd as the reverse proxy.

I'd have to use certbot or another ACME client, but I'll need one eventually for mail and any other non-HTTP TLS protocols anyway so...

https://www.alm.website/blog/2022-05-30-note-2 Note 2 from 2022-05-30 2022-05-30 2022-05-30 Hm, I wonder if Adrian will get this correctly or if I'm going to need to fiddle with it.

Hm, I wonder if Adrian will get this correctly or if I'm going to need to fiddle with it.

https://www.alm.website/blog/2022-05-30-note-1 Note 1 from 2022-05-30 2022-05-30 2022-05-30 Replying to Jacky Alciné

Replying to Jacky Alciné

Ah, I should probably be replying via WebMention, shouldn't I? :P

https://www.alm.website/blog/2022-05-25-note-1 Note 1 from 2022-05-25 2022-05-25 2022-05-25 I just noticed that several of my styles were broken by CSP. Now the search box should have proper proportions.

I just noticed that several of my styles were broken by CSP. Now the search box should have proper proportions.

https://www.alm.website/blog/2022-05-23-note-1 Note 1 from 2022-05-23 2022-05-23 2022-05-23 I'm almost getting addicted to typing scripts/publish, it's just so satisfying to have it automated.

I'm almost getting addicted to typing scripts/publish, it's just so satisfying to have it automated.

https://www.alm.website/blog/2022-05-21-note-2 Note 2 from 2022-05-21 2022-05-21 2022-05-21 Okay well there's definitely a lot to fix, like there's no way to find the Atom feed I think, but I'm done for today, that was like six hours of work.

Okay well there's definitely a lot to fix, like there's no way to find the Atom feed I think, but I'm done for today, that was like six hours of work.

https://www.alm.website/blog/2022-05-21-note-1 Note 1 from 2022-05-21 2022-05-21 2022-05-21 As you might have guessed, it wasn't quite as simple as typing scripts/publish to roll out the new version. I had to do a lot of troubleshooting of my deployment scripts, and my Apache configuration. But it seems to be working now! :)

As you might have guessed, it wasn't quite as simple as typing scripts/publish to roll out the new version. I had to do a lot of troubleshooting of my deployment scripts, and my Apache configuration. But it seems to be working now! :)

https://www.alm.website/blog/2022-05-21-version-4-and-xblog Version 4.0 and XBlog 2022-05-21 2022-05-21 On-and-off over the last little while, I've been rearranging and reworking a lot of this site, trying to turn it into an IndieWeb-style personal Web site instead of the previous no-particular-purpose site.

On-and-off over the last little while, I've been rearranging and reworking a lot of this site, trying to turn it into an IndieWeb-style personal Web site instead of the previous no-particular-purpose site.

The new version includes a personal profile page, blog categories, an Atom feed (albeit not a very good one), more information about my projects, support for the Webmention protocol, some Microformats2 markup, and considerable improvements to the workflow. There's a lot of work still to be done (as there always is), especially with IndieWeb stuff, but I expect to actually be able to use the site for something now.

2021-10-26

This was the first day I actually started seriously working on the site itself. I enabled XBlog's Webmention support, overriding the default blog template to move the mentions section to the main template so they'd show on all pages. Once I had that working, I set up a "canonical social profile," to act as a single place to point people to with all my information about me, using Microformats2's h-card to mark it up. Fighting briefly with XML whitespace significance in some of my inline markup got me introduced to the xml:space="preserve" attribute value, which is a tool I'll hopefully remember in the future.

I also got started on the structural reorganization of the site, and some markup changes. I switched headings from <h1>, <h2>, etc. to <h1> only with <section>, <article>, etc. nesting to communicate hierarchy instead, which is a poorly-adopted but in my opinion very nice feature of HTML5. I removed the top-level site title <h1> and changed the "Home" entry on the navbar to use the site title, instead. I just did this in the CSV file for the moment, but it would probably work better as part of the XSLT.

2021-10-29

After a couple of days, I rebuilt the front page as a sort of summary, with (mostly manual) extracts and links to other pages. I included a DuckDuckGo site search box, though it was less than perfect since I didn't want a CGI script involved and I couldn't get the site: parameter to be set without actually putting it in the form input. I went back and forth for a bit on using a sort of "dashboard" layout, with a flexbox-arranged set of bordered blocks, but I ended up scrapping that, at least for the front page. I kept the CSS for later, though.

I also rearranged the "about site" page, reversing the history section to most-to-least recent, along with revising the text, writing some preliminary discussion of 4.0, and adding a link to an "administrative" page to absorb the previous domain policies page.

I had to catch up on some housekeeping and reorganizing, committing things to Git (including the profile page) and eliminating the dreaded Miscellaneous.

I added a projects page, listing various current and past projects of mine, and a "me now" page so people can get an idea of what I'm up to at the moment (as nownownow.com recommends).

2022-03-13

I completely stopped working on the site for a long time, due purely to a random loss of interest. In the intervening time, I did at least set up my personal Git hosting. When I came back, I finished up the project pages, including moving some of my discontinued projects into place and reformatting one of them into XHTML that XSite would accept.

2022-03-27

After another break, not quite as long this time, I touched up some links and other minor issues, committed some stuff to Git, and decided to actually finish up and get ready to publish the new version. Unfortunately, decision does not always necessarily lead to immediate action.

2022-05-21

Once again, I stalled for a while, but this time it was at least partially for a reason: I needed to set up OpenPGP, and to do that in a way my paranoia would consider sufficiently secure I ended up going down a long yak-shaving chain that involved, among other things, reflashing the firmware on my PinePhone Pro. Eventually, I got all that done, and after some non-justified delay I came back and added the public key to my profile page, along with the OMEMO keys and some other revisions.

Then, I needed to revise this blog post, and the one about XBlog into what you're seeing now. I also needed to do a bunch of Git cleanup, revision to the httpd configuration (especially to add redirects for the various things that moved), and other general cleanup, as well as some setup for OpenPGP Web Key Directory, to make my PGP key auto-findable. Then, I typed scripts/publish, and hopefully you're looking at the result.

https://www.alm.website/blog/2022-05-21-xblog XBlog for XSite blogging 2022-05-21 2022-05-21 As a follow-up to XSite, I decided to split out my blogging tools into their own project and expand on them to build a proper blogging support package for XSite. I decided to keep blogging tools separate from the core XSite project out of a general pro-modularity attitude; a lot of static site generators bundle blogging in their base configurations, but there's no reason that functionality can't be separate. My approach is to keep XSite as the Python code that provides the core functionality like XSlots, XSLT support, datasets, etc., while implementing blogging support on top of it largely as XSLT and XSlots templates. Much of this work happened in the middle of the night, and often into the early hours of the next day, but I've datestamped it with the day I started each session.

As a follow-up to XSite, I decided to split out my blogging tools into their own project and expand on them to build a proper blogging support package for XSite. I decided to keep blogging tools separate from the core XSite project out of a general pro-modularity attitude; a lot of static site generators bundle blogging in their base configurations, but there's no reason that functionality can't be separate. My approach is to keep XSite as the Python code that provides the core functionality like XSlots, XSLT support, datasets, etc., while implementing blogging support on top of it largely as XSLT and XSlots templates. Much of this work happened in the middle of the night, and often into the early hours of the next day, but I've datestamped it with the day I started each session.

2021-10-04

Right at the start of building XBlog, I hit some major frustration; there seems to be some kind of bug in certain versions of libxml2, or at least the way it's used by lxml's etree, where serializing a single element can end up serializing everything after that element too (including dangling end tags!), but only if it has children. This pretty much totally broke XSite; after several hours trying to coax lxml into doing the right thing, I ended up hacking around this problem with some rather ugly text manipulation, and simply truncated away everything that didn't belong. I'm not happy about this, but it got XSite working again and made it possible to start building XBlog.

The first day, I started by pulling my site's existing blogging-related templates out into a new repository (which I set up as a Git submodule), then symlinking them back into place. My first actual improvement was to add category support; I added a "category" column to the posts.csv file which stores information about posts, and modified the blog entries XSLT stylesheet to check it against an "index category" parameter, which is set on each blog index page. After fiddling with this for a while, I straightened out some XSLT issues (to do with parameters and conditional templates) and ended up with a system where categories look like Unix paths, with the "root" category being / and others descending from it in a /-separated hierarchy.

My other first-day improvement was to add support for generating an Atom feed; this is closely analagous to generating the root category index (previously just the index), in that it consists of a document holding pre-written content, an XSlots template plugging in configuration, and an XSLT stylesheet, fed the posts dataset by XSlots, which does the actual work of generating <entry> elements. The XSLT stylesheet also sets the <updated> value for the whole feed, basing it on the last entry's publication time; due for some improvement, but workable. For the moment, it doesn't generate a summary, instead giving a notice that no summary is available, since that would need some fairly serious intelligence to pull meaningful content from the documents themselves. Getting this part working was a lot more frustrating than it ought to have been; first, I left off the xsl: namespace prefix, which led to empty output but no errors. I was nearly ready to give up for the day when I noticed the problem. Second, once that worked, only one entry would come out. This turned out to be an XSite quirk; it can't handle multiple elements without a common parent as document content, XSLT output, etc.; this is why <main> exists, but the details of when it is and isn't needed eluded me briefly. Once I wrapped the entries in a fragment, they all appeared as they should.

2021-10-06

After a day of not working on the project, I spent my second day building deployment tooling so I could just make some changes and type scripts/publish; this kicks off a series of checks to ensure I have Git in sync, offers to pull from origin and commit any uncommitted changes, then pushes directly to the server (bypassing the origin remote; remember, Git is a distributed version control system!), builds the site there, and deploys it to the correct directory on the server, keeping a backup so I can roll back (also automated with a script) if a bad deploy happens. These scripts aren't part of XBlog or XSite; they're my own tools and are part of the alm repository, but of course other people can use them if they come in handy.

Also in the same directory of scripts and built the same day is my "note" script, which takes either standard input or a file, assigns it a number, either one or one more than the highest-numbered existing note that day, adds the appropriate header and footer, and wraps each line in <p> (blank lines become <br /> instead). Once it's done that, it has a valid XHTML file (assuming I don't screw up any XML syntax in the input text), which it adds to the posts index (categorizing it as a note), commits to the Git repository, then finishes up by running the publish script. In theory, this should let me publish my quick thoughts in a matter of moments, only a little slower than Tweeting or Tooting or whichever your preference. You can see my test note from the middle of the night for yourself.

2021-10-07

I spent my third day making tools in preparation for building a Webmention sender; specifically, I built Python modules for

  • extracting links from XML, HTML, JSON, Gemini gemtext, and plain text, and
  • fetching HTTP(S), FTP(S), and file:// URIs.

These were quite a bit of work, but I didn't have much to show for that day and there isn't really much to talk about with this.

2021-10-08

I continued on from the previous day, adding support for the Gemini protocol, fixing various issues, adding support for rel attributes (and things equivalent to them), building infrastructure to guess media types for e.g. file:// URIs that otherwise don't provide them, and tying my link extraction and fetching code together so a single function call can get the links of any supported URI. The three modules together run to 400-some lines, and in my opinion at least, they're fairly handy on their own.

For no particular reason, I ended up writing my own Gemini client; it's about 70 lines of Python and was pretty simple to get working. The single part that takes up the most lines is actually the handling for status 44 ("Slow Down"), since it has to handle a few different kinds of unacceptable wait times. That's the only status that's handled specifically, though; all the others are checked by their first digit only. Maybe the biggest bump was that I actually forgot to write any code to handle success, which led to a couple minutes of head-scratching trying to figure out why nothing seemed wrong but an empty result was coming out.

I ended up having to rewrite the XML part of the link extraction code (which also handles HTML) in order to support rel attributes, and I ended up with something a bit more complicated. I think it's still pretty nice, though; it should be easy to teach about more kinds of links if I find out about any, since it's just a couple short XPath expressions. I did end up dropping some specificity, so now any attribute named href, src, or resource will end up being seen as a link, rather than only if those attributes are on known "link elements;" this falls out of RDFa, which defines attributes without a containing XML Namespace.

My media type guessing is probably more than it needs to be; among other things, I have built-in support for using Python's mimetypes (which uses filename extensions), as well as libmagic, though it has to be enabled since the Python module isn't in the standard library. There are still situations where the media type will end up being set to application/octet-stream though, so I have some further fallback options where I can give a predicted media type if I think I know what it'll be, or I can have it react to application/octet-stream by just trying all the link-finders and appending their results together.

2021-10-09

Now the weekend, I started by fixing some bugs with FTP and plain text searching, and doing some general cleanup.

With those sorted, I jumped over to XSite to make the extension points I needed. I did some refactoring while I was there and built a "package" system, allowing third-party additions to XSite, and a plug-in system using it, including support for a new "sidecar" plug-in type (they're passed the original source document, configuration, and parameters for each document XSite processes, so they can "ride along" with XSite's processing pass).

Back to XBlog, I added support for filtering on rel values to my link extraction code, and used all this to implement a sidecar plug-in that finds outgoing links, looks up the Webmention endpoints for them, and makes a list of Webmentions to send. After some debugging, I got this working; I had a list of valid Webmention endpoints and target URIs coming out, along with another list of URIs that had been checked but found not to have Webmention support. I still couldn't actually send any mentions, but that part is simpler, relatively speaking; it's just sending some very simple POST requests. I wanted to do this as part of the publishing workflow, though, so it needed a bit more work than you might think.

2021-10-10

I made some changes so outgoing mentions would have the full URI of the source page, which required some additions to XSite in the form of library functions to handle that, and also fixed a conformance issue; the Webmention spec says to only send a mention to the first link found of three possible, and I was making to-send entries for all of them. After that, I wrote the actual script to send mentions, which was very simple. I tested using webmention.rocks and discovered that I'd completely failed to handle relative URIs in my Webmention discovery. I added code to pass around and use a base URI to my link extraction code and on another test it got the right URIs.

However, actually sending the mentions failed; looking further, I discovered this was actually a problem with webmention.rocks; it was sending an Accept header that only allowed text/html! Since XSite only outputs XML, all the documents it produces are application/xhtml+xml, and my server is configured to consider them as such. This meant when webmention.rocks came in asking for text/html and only text/html, Apache declared an HTTP 406 and refused to send it anything. webmention.rocks, naturally, reacted by failing the tests, sending me back an HTTP 400 (which seems like a strange code to use since it implies the request was actually corrupt, but that's what the Webmention spec says so I suppose it's correct).

I reported this problem to the webmention.rocks project, though I haven't yet received any response. I don't think it's likely to be a very hard fix; webmention.rocks is PHP, and I believe PHP can handle XHTML pretty easily. It might just be a matter of changing the Accept header, but that depends on the exact details of how the XHTML support in PHP works...

2021-10-11

With the webmention.rocks problem still unresolved, I decided to carry on anyway and start implementing the site-specific script, which runs the script to send mentions, then commits the list of sent mentions, and the list of non-Webmention-supporting pages generated during the discovery done at build time, to Git and pushes them (along with the actual content changes) to my origin remote, from which the changes will be pulled back to my desktop. That was largely it for sending support, except for updates/deletes, which I was planning to implement later. For the moment I decided to move onto receiving support. My approach is in one way very simplistic; I just store the source and target URIs to a file in /var/spool and send back a hard-coded response. Most of the CGI script that does this is code to ensure that

  • a POST request was actually used
  • the request was sent with the application/x-www-form-urlencoded media type
  • both a source and a target are specified
  • and that neither contains a tab or a newline.

If any of these checks fail, a hard-coded error response is sent back with an appropriate status code (the Webmention spec demands that only 400 be used, but I refuse to cut back on useful semantics to satisfy that requirement). Otherwise, the source, a tab, the target, and a newline are appended to the file, and the hard-coded success response is sent. It's good to keep the CGI script this simple, since it means it has no dependencies except the Python standard library. This makes it very easy to install on any CGI-supporting server, at least in theory.

Later, the spool file is read by a script that checks that the targets are on the site they're meant to be on and that the sources actually link to them, then writes them to a moderation queue for me to check manually. I modified my site-specific script for mentions to run this check step and commit the moderation queue as well, so it too will end up on my desktop, where I can do moderation and move the good mentions to a final accepted list. Based on that, I'll be able to do a final processing step of downloading the pages, extracting the metadata, and saving that to an XSite dataset I can use to include mentions in-line on the mentioned pages.

It's a lot of steps, but I like it because each one does just one thing, without closely tying everything together.

2021-10-14

I took a break for a few days before getting into the final stretch of tool-building so I could start actually rebuilding my site. When I came back I wrote the script to pull down the source pages of new moderated mentions, pull out (in extremely simplistic and fragile fashion) a title, author, and first-paragraph summary, and put them into a CSV file which can be used as an XSite dataset. At some point I intend to come back and make this a lot more robust, and pull out more information like a publication date if it's available.

After that, I wrote an XSLT stylesheet to generate a mentions section and added it to the bottom of the blog template. In the process I added a little code to XSite to pass the output path and document URI as parameters so XSLT could get them. But with that in place, the mentions XSLT worked with only very little fiddling! The results are relatively basic, but they work well enough, so I'm happy for now.

While I was passing through, I also added some text handling from the info extraction script to the parsing that the link extraction code uses, which should make it a bit more robust to different encodings of XML documents.

At this point, I was reasonably confident both sides of Webmention would actually work to some degree, though maybe not as nicely as might be desired. There was clearly still work to be done, but I decided to leave it there for the day and come back to improve the content extraction later.

2021-10-25

It turned out my "leaving it there" ended up lasting over a week. When I did come back, I mostly implemented a rough, hand-made Microformats2 content extractor and made some other minor improvements. After a lot of rounds of testing against various pages, I eventually got it to parse the h-entrys close enough to nicely that it seemed like enough. Unfortunately, Microformats2 isn't used totally consistently; while there is a standard, it's fairly loose and people apply it in different ways. Since I wrote my own processing code instead of using an off-the-shelf Microformats2 parser, I ended up with something of a mess coming out of a lot of pages at first. A lot of the fighting was down to trying to get a summary out of pages that didn't have a p-summary; it ended up being a multi-way solution that tries to look for various things under the e-content, including a <p>, simple unwrapped text, and even a <li>! Eventually, I figured out enough common patterns that most of the pages I was plugging in were "just working," so I called it good enough for the day. I found that a lot of places still use Microformats1 (even some things on the Microformats2 h-entry examples list were actually Microformats1 hentry! >:( ), so I should probably implement that too and maybe some others like RDFa.

At this point, the tools felt solid enough to work on the site itself, so that's where XBlog development stopped for its first "major release". I'll likely come back to it, probably along with major refactorings to XSite itself. In particular, I'll probably rewrite my link fetching and context extraction around libraries written by people who are actually following specs like RDFa, Microdata, and Microformats closely, instead of my slap-dash "basically works" code (I will probably, however, still end up writing wrappers to fall back between multiple formats, probably including some custom "plain-old HTML" handling).

https://www.alm.website/blog/2021-10-07-note-2 Note 2 from 2021-10-07 2021-10-12 2021-10-12 This is my first test note.

This is my first test note.


It has a blank line to see if that becomes a <br />.

It also has some tag-soup HTML, like this!

If you don't know.

https://www.alm.website/blog/2021-05-18-xslt XSLT 2021-05-18 2021-05-18 As I planned in the 3.1 launch post, I've now implemented XSLT support in XSite. Using this tool, I've implemented some improvements to the site to restore some of the pieces that got cut to fit into XSite. This should improve accessibility, as well as making some things easier on myself.

As I planned in the 3.1 launch post, I've now implemented XSLT support in XSite. Using this tool, I've implemented some improvements to the site to restore some of the pieces that got cut to fit into XSite. This should improve accessibility, as well as making some things easier on myself.

I'm currently using two similar XSLT stylesheets, to handle the navbar and the blog index. Each of those is generated from what XSite calls a "dataset", in this case in the form of CSV files. These CSVs list the paths and titles of navbar entries and blog posts, along with the dates of the blog posts. XSite automatically converts the CSV data into XML for ease of processing with its tools, and the stylesheets then render them into <ul>s of links. This output can then automatically be slotted into an XSlots template.

While the navbar was relatively simple, I went through a few ideas on how to structure the blog index before landing on what I think is the best approach: The blog index page itself just contains the heading and the explanation at the top of the page, and the entire section containing the entries (header and <section> tag included) is generated by the XSLT. An XSlots template puts the two one after another in the <main> element and manages the job of passing the dataset input to the XSLT sheet (something that, as yet, only XSlots can do).

The main advantage provided by the change to the navbar is that it can once again update to indicate which page is current. Each of the pages on the navbar now has a path parameter, which the navbar XSLT can compare against the path attribute of each entry to figure out which (if any) is the current page. It then adds aria-current and <strong> to mark it as the current page and changes the destination to # so that navigating to the current page doesn't actually trigger a new request.

The blog index change doesn't currently provide much advantage, other than reducing how much I have to type to add blog posts to the index, but it will make it easier to take advantage of future blogging features I add to XSite; ideally, I should only have to switch the dataset from CSV to a dynamically-generated blog posts dataset to use that, with no changes to the XSLT necessary (though I'll probably want to make some changes once I have the option to, e.g., show the first paragraph as a teaser).

I do expect to continue adding features to XSite, in particular for blogging, but I think I've succeeded in creating a moderately-useful static site generator. It still isn't really ready for general use, since the documentation mostly hasn't been written yet, but I'll get around to that relatively soon (I've already partially written the XSlots documentation).

https://www.alm.website/blog/2021-04-25-v3.1-xsite Version 3.1 and XSite 2021-04-25 2021-04-25 If you're reading this, that means I've rolled out alm 3.1. This version is based on the XSite static site generator. I wrote this software myself, as is the expected rite of passage for programmers doing Web sites; it's based on XML and written in Python, with an eye towards both relative simplicity and a strict content structure. Currently, it's less than 300 lines of code, which feels small for how much it's managed to do.

If you're reading this, that means I've rolled out alm 3.1. This version is based on the XSite static site generator. I wrote this software myself, as is the expected rite of passage for programmers doing Web sites; it's based on XML and written in Python, with an eye towards both relative simplicity and a strict content structure. Currently, it's less than 300 lines of code, which feels small for how much it's managed to do.

XSite doesn't currently support Markdown, so the process of switching from Jekyll to XSite was fairly complicated. It took over a dozen Git commits and seven hours of work. I started by modifying the Jekyll templates to remove the Liquid tags, replacing them with XSite's equivalents or discarding features XSite can't yet implement, and to convert from HTML to XML syntax where the two differed. I also replaced <!DOCTYPE html> declarations with <?xml version="1.0" encoding="utf-8"?> declarations and XHTML namespace declarations and added XSite processing instructions for template inheritance.

The part that took the longest was converting the content. At first, I used the Markdown files as a starting point, manually converting the Markdown syntax to XHTML, but later I started using the output from Jekyll instead, which was faster, especially for big pages. I replaced the YAML front-matter Jekyll used with the XML processing instructions XSite uses, and added some meta keywords here and there while I was at it.

The reason this conversion took longer than the Jekyll conversion is largely down to the blog. XSite has no blogging features, so I had to manually implement the entire blog structure, including the blog index. Also contributing is that there is simply more content on this site than there was when I converted it to Jekyll, and XHTML is of course more verbose than Markdown, so there was more typing.

One of XSite's nice points is that it generates relatively clean output. If you look at the source of any of the pages on the current site, you'll won't see any of the blank lines or jumpy indentation that Jekyll and Liquid are prone to produce; this is because XSite is parsing the XML (using the lxml Python package), doing its processing at the element level, and then reserializing back to text, rather than working at the raw text level like Jekyll, so it can format its output nicely.

I wouldn't recommend using XSite for your own site just yet, though (if what I've already said doesn't dissuade you enough, it's probably buggy as well). I'll probably continue to hack on it and implement a few more of my ideas, probably including XSLT support. XSLT will make it a lot easier to implement blogging functionality, like automatically generating an index of entries, so I hope that it will be usable for the real world at some point soon.

You can see the process I went through for this migration in this GitLab merge request.

Changelog for version 3.1

  • Switched from Jekyll to XSite
  • Simplified blog index to be easy to maintain by hand
  • Remove conditional behavior from navigation (may have a negative impact on accessibility, to be addressed once XSite is capable enough)
https://www.alm.website/blog/2019-02-11-v3.0-custom-css alm 3.0 with custom CSS 2019-02-11 2019-02-11 If you're reading this, that means I've rolled out alm 3.0. There are a few big changes in this version, namely:

If you're reading this, that means I've rolled out alm 3.0. There are a few big changes in this version, namely:

  • I "rebranded" from "Programmingwave," which was only the name because that was the domain I had, to "alm," for my initials.
  • I threw out Bootstrap and wrote my own CSS for the navbar and for the site as a whole. It’s much smaller, so the site as a whole has become much lighter as a result.
  • Dropped old Programmingwave features, namely the "gallery" and the LENS pages (they’re to be moved to GitLab Pages).

Along with this, I've changed the domain (rather obviously) to alm.website and moved hosting to DigitalOcean.

https://www.alm.website/blog/2018-09-08-available-on-gitlab Available on GitLab 2018-09-08 2018-09-08 After quite a delay, this entire site's source is now released on GitLab here. This includes both the site's basic framework as well as its content.

After quite a delay, this entire site's source is now released on GitLab here. This includes both the site's basic framework as well as its content.

https://www.alm.website/blog/2018-06-09-azure-hosting Azure hosting 2018-06-09 2018-06-09 Programmingwave is now hosted on Microsoft Azure! This move was prompted by some (months-long) DNS issues, and also me just wanting to play with Azure. I'm using Azure's Web App Service at the F1 free tier, which limits disk space, bandwidth, and features, but not so much as to actually pose a problem for a site this small (~210 KB for the entire site).

Programmingwave is now hosted on Microsoft Azure! This move was prompted by some (months-long) DNS issues, and also me just wanting to play with Azure. I'm using Azure's Web App Service at the F1 free tier, which limits disk space, bandwidth, and features, but not so much as to actually pose a problem for a site this small (~210 KB for the entire site).

I had a few issues at first getting my error pages working; Azure's web hosting runs Microsoft IIS as a backend, which is, to be honest, not the best web server. I spent several hours over multiple days trying to figure out how to configure custom error pages; I eventually discovered the thing I was doing wrong: using / instead of \. IIS apparently doesn't do /-\ translation like most Windows components. But anyway, I got everything working* and now this site is hosted for free!

*As of writing this post, I don't have DNS working yet; I'm just going to configure the registrar's hosted DNS solution and call it a finished move**.

**It turns out one of the feature restrictions on Azure web hosting is not being able to bind a custom domain name. I'll look for solutions, but in the mean time Programmingwave will be at programmingwave.azurewebsites.net.

https://www.alm.website/blog/2018-01-24-v2.2-launch Programmingwave.com 2.2 has launched 2018-01-24 2018-01-24 If you’re reading this post, that means Programmingwave.com 2.2 has launched. This version is an upgrade to Bootstrap 4.0 with a few minor fixes. There may be some bugs with the navbar, but it should be usable.

If you’re reading this post, that means Programmingwave.com 2.2 has launched. This version is an upgrade to Bootstrap 4.0 with a few minor fixes. There may be some bugs with the navbar, but it should be usable.

https://www.alm.website/blog/2018-01-21-v2.2-working Now working on version 2.2 2018-01-21 2018-01-21 With the official release of Bootstrap 4, I am now working on Programmingwave.com 2.2 with Bootstrap 4. It should be a fairly simple migration, but there may be some hiccups. If you want, you can watch development of 2.2 here.

With the official release of Bootstrap 4, I am now working on Programmingwave.com 2.2 with Bootstrap 4. It should be a fairly simple migration, but there may be some hiccups. If you want, you can watch development of 2.2 here.

Release targets for version 2.2

  • Move to Bootstrap 4
https://www.alm.website/blog/2017-06-12-v2.1-jekyll Version 2.1 & Jekyll 2017-06-12 2017-06-12 If you're seeing this post, that means I've rolled out Programmingwave.com 2.1, based on Jekyll.

If you're seeing this post, that means I've rolled out Programmingwave.com 2.1, based on Jekyll.

I decided recently that maintaining the site as a set of raw HTML files was too difficult. After considering the options, I went with Jekyll, a static site generator with support for Markdown and Liquid-based templating.

I am not the first person to convert a more complex, harder to maintain site to Jekyll. It was, however, an interesting experience for me.

As many others before me, I began by installing Jekyll on my local machine and creating a new blank site. I then extracted the common components of all pages on the site and converted them to a Jekyll layout (_layouts/main.html).

I then copy-pasted most of the site's content into .md files, and converted them from HTML to Markdown. It only took a couple of hours to convert all of the 'normal' content.

Once I was done converting my 'normal' content (essentially static pages that don't use a submenu), I moved the navbar from _layouts/main.html to an include (_includes/navbar.html), and added some Liquid code to handle current-page highlighting and submenus. Submenus now worked.

At this point, everything except /gallery/ and blog.programmingwave.com worked on my local machine (jekyll serve is incredibly helpful). I converted the SQL-based PHP for /gallery/ to Jekyll datafile-based Liquid, and copied the content.

Only one challenge remained: this blog. You might think that this would be the hardest part, but Jekyll is actually built to support blogging, so it was a matter of creating a layout for blog posts (_layouts/blog.html, which actually 'inherits' from _layouts/main.html) and writing a bit of Liquid to display posts on the /blog/ page.

Converting the blog content was an easy job; I just copy-pasted the post contents from the Wordpress site into .md files, fixed some formatting, and added Jekyll YAML metadata.

Note: I did not migrate the entire Wordpress blog; you can still see some of the history that used to be there at the About page.

You may notice from looking at the source of the current pages that the indentation is completely screwed up, and that there are lots of seemingly random blank lines. This is a consequence of two things: Jekyll's blind include and layout content mechanisms (which simply inject the text without regard for indentation), and Liquid control tags leaving blank lines for no apparent reason.

All in all, 2.1 was a fairly easy move, taking me only a couple days, and making it far easier to develop the site (I no longer had to copy-paste everything if I changed common code). If you have a complex site that you'd like to simplify, I'd recommend Jekyll. I may release my 'framework' (common site-agnostic components) on GitLab sometime in the near future.

Changelog for version 2.1

  • Switched to Jekyll and used templating.
  • Removed Wordpress and switched to Jekyll for blogging as well.
  • Blog moved to /blog/, from blog.programmingwave.com.
https://www.alm.website/blog/2017-04-23-v2.0-launch Programmingwave.com 2.0 has launched 2017-04-23 2017-04-23 This article was migrated from the Wordpress blog on 2017-06-12.

This article was migrated from the Wordpress blog on 2017-06-12.

The long-awaited-by-no-one release of Programmingwave.com 2.0 has arrived. The new version is now out of Beta and available at the main site.

2.0 uses Bootstrap 3.x (and will be updated to 4.x once it releases), and is far more visually appealing. 2.0 also drops some parts of the site that aren't needed, and, thanks to Bootstrap, is responsively designed, capable of working just as well on mobile devices as on desktops.

https://www.alm.website/blog/2017-04-20-v2.0-confusion More confusion about version 2.0 2017-04-20 2017-04-20 This article was migrated from the Wordpress blog on 2017-06-12.

This article was migrated from the Wordpress blog on 2017-06-12.

Adding further to the mild confusion around 2.0, I have decided to use Bootstrap instead of Joomla!, and continue to partly custom-build the site. You can see progress as it advances at the Beta; it will likely be somewhat awful and even worse than the current site for a while as I try to get Bootstrap working like I want it to.

However, progress will now be made, unlike with Joomla!, as Bootstrap is far easier to set up.