(Brief aside before we start: Happy six years of us talking, Caby—8/18/18 <3)
One of the less-exciting and less-visible aspects of the archives overhaul is that I'm trying to reorganize the subdomain some in the background. I made it in 2020 with purposes slightly different than what I have in mind now, and as a result, things that shouldn't be on there are on there, things are in places I prefer them not to be in, and my methodology for storing what is staying on there has changed a bit as I've developed it out more.
To give you some examples:
- archives was used to reduce the size of our seasonal Somnol site network backups in ways I don't like now. FLAC copies of my music were in a directory called
/marfmusic/
on archives; I want those back on my actual site now, size of the backups be damned. Those aren't archives and have no reason to be on archives now that it's not simply a place to dump shit. - archives has occasionally been used as, not just a file dump, but also a dump for website archives that don't pertain to Somnol. I have an as-complete-as-was-able-to-be-grabbed scrape of the site for the Internet 1996 World Exposition that I think dcb did in
/web/
. It was never meant for Somnol, but it went on archives for lack of a better plan. - Sometimes I separated out component parts of sites in ways I don't like now. My Scratchpad and dcb's 32-bit Patio, our personal blogs, went in their own directories in
/web/
because they were meant to be site version agnostic. Now, I think it's just messy, and they can both be matched to versions of mari.somnol and dcb.somnol where they most fit in the timeline. - To reduce redundancy between grabs, I'd also split out subsites that didn't change across a series of site revisions into their own subdirectories and simply edit all the links to point to the split version. dcb's The Many Faces of Mozilla archive was like this. I hate that now, and I'd rather have the redundancy.
- Each of the file dumps were simply in directories in the root of archives, which is super messy to me now. The root directory, in my head, should be files that make the site itself work, and any content should be sorted accordingly.
Resorting and getting rid of stuff is easy, but with it comes the risk of link rot. These days, when I build a site, I try to plan everything out so there's no chance I will ever want to reorganize the files. Anything that gets moved runs the risk of breaking links someone dropped in a Discord somewhere, or on a forum post or in a Reddit comment somewhere, and I don't want any of that. I want links that will exist in ten, twenty years. For archives, this is extra important to me. All Links Are Permanent.
Thankfully, Apache (and DreamHost accordingly) support the use of .htaccess redirects. I can simply have the old links redirect to the new locations and then manually update links as I find them to reduce the need for the redirects. The stuff that's getting deleted, I just have it redirect 410 or target another page where the same information can be found, if necessary. Also nice is Notepad++'s Find in Files function, which lets me do a find and replace across an entire directory of files. I can just do a search for, say, /web/scratchpad/
and replace it with /web/mari_v3/blog/
and thousands of references are now updated to point to the right spot.
It takes a lot of work to maintain eternity, but technology makes it possible at all.