From my diary

The aftermath of the hack has taken a solid week to clean up.  But it is done.  Cross fingers, it won’t happen again, or not soon.  It’s not as if my little blog attracts enough traffic to be worth their while.  At least next time I will be better prepared.

This afternoon I have started looking at John Toy, English saints in the Medieval Liturgies of the Scandinavian churches, 2009, which has a section on St Botolph material.  He lists some 42 breviaries that contain readings from the Life of St Botolph.  Ouch.  I’ve also found that there is stuff in the York and Hereford breviaries, and no doubt others.

Today I located the Lund Breviary online, in Uppsala university, shelfmark C447, and downloaded it.  It is yet another scruffily written, hard to make out manuscript. There’s a long section of Botolph material, none of which I can read with ease.

I think it may be necessary to rule the breviaries “out of scope.”  There’s just too much material.  Possibly it needs a separate project to look at those, although really it’s for someone far more knowledgeable than myself to undertake.  The original idea was just to create an English translation of the medieval Life of St Botolph, remember.

But after a hard week, I think I shall bunk off and play a computer game or something!  Have a good weekend, everyone.

On the Fourth Day

Behind the blog surface, the posts are stored in a database.  The pharm spam hack inserted stuff into that database, and I have spent two days in cleaning it.  This I did by exporting the database to a .sql file using the PhpMyAdmin interface; importing it into a local WordPress instance running on my PC; importing the last sound backup into another local WordPress instance; exporting the sound wp-posts table from that, renaming it, importing it into the same database as the  corrupt one, and doing lots of SQL queries to locate the differences.  It has been time-consuming, but not different in kind to the sort of stuff that I used to do for  money, when I was working for insurance companies and fixing live problems in their databases.  I’ve found it rather relaxing.  You have to get into the right frame of mind to do it, to see the problem – and how to fix it – in terms of an SQL query.  But this I did professionally for 30 years, so it was not troublesome.  The main thing to remember is not to panic.

After all that, I hope that the annoying pharma links are gone, and that I haven’t broken anything!

I’ve brought up a new instance of WordPress on the server.  I’ve also changed the theme, although I may change it again.

A rather impressive security plugin located three files on the disk which the hacker had left there.  One of these was so poisonous that when I downloaded it to my PC in order to inspect it, my local antivirus promptly whipped it away into quarantine.  I will do more security work on the blog tomorrow.

In the meantime, there has been a little progress on St. Botolph, which I am very keen to finish.  A kind gentleman has sent me the modern text of the Linkoping breviary text.  Another commented and transcribed the manuscript that I could not read.  Finally the rather comprehensive book by John Toy arrived with massive information on Scandinavian breviaries.  I’ve not had time to look at any of these yet, but these have been a light in the darkness.

I’ve discovered that a breviary from Hereford probably also contains a text, and that I might be able to access this through Early English Books Online (EEBO).  I have no access to EEBO, but a nearby library probably does, and probably will allow me to use it.  Maybe next week!

On the Second Day…

Today has been spent researching how to rescue my content from the spam attack that is poisoning a lot of the older articles with unwanted links.

It’s becoming clear that WordPress’ built in export and import are barely useful; and are NOT used by professionals who run websites based around it.  What they do is use professional (and very expensive) tools to migrate the data from a “staging” copy of WordPress, where everything is developed and written, across to the “live” site.  When they do work with the content, they work directly with the database behind WordPress.  They export and import whole databases, when they’re not using migration tools.

For them, this means that an attack on the “live” site is meaningless, a minor interruption.  They just erase it and push a new copy across from their local site to the server.

This is excellent practice, and commonplace in IT in general.

So the way forward is to do all my content creation on a WordPress Instance running on my PC, and then migrate it to live, if I can.  Rather a come-down from entering my posts online directly, but it would work.

But that doesn’t help me with retrieving my data.  So I have been burrowing into the database underneath.  The very simple, very obvious database, if you are a retired professional database developer, as I am.

Today I have been setting up LocalWP on my PC.  This has gone reasonably well except that I have to pause my antivirus when creating a new WordPress instance.  And then remember to reenable it.  This is because it locks the hosts file, which LocalWP edits really rather often.  Daft design, really.  I’ve also been working with the command-line interface, WP-CLI, locally.  This also has a bug, where the DB_HOST variable in wp-config.php does not include the port number.  Everything works, other than WP-CLI doing database stuff.  A nuisance.

So I created an empty WordPress locally, and then tried to import the last valid backup.  The import failed.  It’s simply not designed to take a 70mb file.  That’s really wretched.  Come on WordPress, this is basic functionality!  I then ran the import using WP-CLI where – to my astonishment – it took a couple of hours to load 7,400 blog posts.  I learned from this that the professionals simply don’t use the WordPress Export/Import in any way.

But it did load.  Which gives me a WordPress instance with a clean copy of all the corrupted posts.

In theory, one should be able to connect to the live system database, and run a cross database update to restore the correct content fields.  I have some doubts that little old MySQL databases can handle that, unlike the Oracle monsters that I knew!  I imagine it would all time out.

But possibly I could simply start my own MySQL database, independently of WordPress; then import into it the clean database file that I have created, import it as a whole; then rename all the tables imported from WP_xyz to something else – maybe VALID_xyz.  Then I could import an export of the “live” system into the same database, somehow; and then do the update from one to the other locally?  UPDATE WP-POSTS from VALID_POSTS or something – I don’t know what the syntax would be as yet.  Then drop the VALID tables, export the whole database, now fixed; and create a new instance on the server using the new cleaned up database.  Or something like that.

Um yes.  I’m sure most of you swallowed.  I’m an old database programmer. I think in these terms when obliged to.

It is pathetic and ridiculous that such maneouvres should be necessary.  Not one blogger in ten thousand would think in such terms.  Blogging is for convenience.  How is all this “convenient”?  Tools that do not work, software that is insecure, timeouts all over the place?

Still, it’s clearly possible.  It will just take a ridiculous amount of time.

First attempts at recovery

It’s been an arduous afternoon, trying to work out how to recover from the hack last year that has poisoned hundreds of posts on this site.  The site currently includes 4,741 posts containing 3,096,019 words.  That’s a lot to go through manually.  And since I don’t know how the hack was done, or whether it is still active, or a backdoor is present, then it might be futile.

I’ve done a few grep searches on the most recent backup file.  A search for “cialis” alone gave over 250 results.  Of course I have no idea of all the possible spam terms.

But I have been a good boy, and made regular backups.  I do have a backup of the site, taken a month earlier.  In theory I should just be able to create a new WordPress installation, and restore that, and then handle the last year bit by bit.

So I created a new, clean WordPress installation.  Unfortunately… the backup times out.  It’s 70mb, which is too long for some timeout somewhere.  Why doesn’t it batch the thing?

No worries, there’s a command-line interface to wordpress, WP-CLI.  That runs… and gets killed by something or other, possibly the site operators, more likely a robot for running out of memory.

I’m leaving the damaged site up at the moment.  I will ponder.

PS:  It just occurred to me… maybe I should run WordPress on my PC, do the import there, and then export the contents in pieces, and load these?  What a faff.

From my diary

It seems that this blog was hacked on 22 July 2024 at 10:20, by some poor soul who poisoned a great number of the articles with spam links to pharmaceutical sites.  I gather that this is a standard attack, known as “spam link injection.”  I discovered this in an old article by accident last night, and I have spent some hours today attempting to discover the extent of the problem.  The attack was done cunningly, mainly on older articles or pages, which meant that I was oblivious.

The attack was not done by logging into the editing console, as the changed text is not present in the list of revisions.   I don’t know how it was done, in truth, which makes it hard to know how to prevent it again.  Possibly some WordPress plugin was responsible.  Possibly the theme that I use is insecure?

I don’t know how many posts are affected.  I don’t know how to fix this in any easy way.  Worse still, attempting to revert the changes through the UI has left some articles blank.

I do have backups from before the hack; one from the 18th of June, thankfully.  I would hope that posts after the 22 July 2024 are not affected.

Reading around for help, I find that WordPress is now a very insecure platform, which requires constant patching to be secure.  This is not something that I am competent to do.  Possibly a hosted solution would do this.

Likewise WordPress seems entirely disinterested in providing themes for bloggers.  All the themes are aimed at websites.  The last mainstream theme to focus on blogs, in 2019, does not handle mobile phones (!).

Blogging is getting increasingly difficult to do, it would seem.  The internet is changing, away from ordinary people towards something that only corporate infrastructure can handle.

I’m not quite sure what the way forward is. We’ll see.