Finally, finally, I found the time (spread over several months) to refurbish my blog. Not that it took so long, but spare time is rare these days. I decided to stick with Drupal and created a fresh and clean installation of 7 to replace the old Drupal 6. Now Drupal is somewhat overkill for a simple blog, but I the alternatives did not convince me for various reasons. Now I do not want to dive into why or why not this and that, but point out some remarkabilities with regard to Drupal. Setting up Drupal is straightforward of course, and everything works fine and smooth, but the real work comes with adjusting it to become what you want. Its flexibility allows to realize anything, on the other hand it is also a reason why some things are laborious to accomplish.
Layout and ModulesI chose Bamboo as base theme and sub-themed it. The blog looks now less stale, long code parts are presented properly and it has a mode for mobile devices. It required to dive in into Drupal theming a bit, but not too much. The documentation provided with Bamboo was already very helpful. The big plus of a sub-themed theme is that you easily can update the main theme without patching around endlessly. Theoretically it still can break your layout, if major changes would be applied. In the end, this task was pleasant to complete. Afterwards it was about selecting modules and do the configuration. Mostly it was searching, installing, configuring, done. Just three things i want to point out here:
- There's no easy solution for a guest preview as offered by Wordpress. You can achieve this by doing something complicated (it did not follow this) or using the view_unpublished module. It does not offer the same convenience, but is good enough.
- Also finally standard elements like captions and buttons are localized into most common languages, e.g. English, Spanish, Chinese, Russian, Arab and German. Drupal does not ship translations by default.
- Avoiding blog spam. On the old version I used reCaptcha. I believe the only type of commentators it kept away were authentic people, instead I had the doubtful pleasure to moderate tons of SEO spam. Now I use a honeypot approach and so far (in testing) it works incredibly good and does not get in the way of real people. I an very fond of this.
Upgrade and MaintenanceI wished I could get the latest Drupal from the repositories, either original Ubuntu ones or a PPA. Web software evolves fast, releases fast and often closes security issues. Unfortunately, neither is provided (only older packages in the 12.04 repositories). So I need to keep Drupal up to date by hand. Who has ever read the update instructions knows, that you don't want to do it by hand. A lot of stuff to do. Perfect condition for the lazy CS guy and a good opportunity to refresh my shell scripting. I could automate a lot of the ugly and boring stuff. What is left is for me is to kick off the script, and get in and out of the maintenance mode. Even this can be achieved without human interaction, so far i prefer to keep the control. In the end, I need to ensure everything works as expected anyway.
MigrationThe fun part. First, why did I not upgrade from Drupal 6 to 7, but made everything from scratch? Because I did some decision with the old configuration that were not so useful. Then, there were some modules that were discontinued or replaced with a lacking upgrade path. And somewhere in my head was stuck, that an upgrade was problematic or not recommended, though this is probably of goof of my own memory. Well, in the end almost everything was ready and was just waiting for the content. To migrate the content, i.e. blog posts, static pages, comments, tags, from Drupal 6 to 7 was easy in the end, once you found the way and fixed what was missing. There is a module that provides exactly this transfer from an old Drupal 6 installation to a new Drupal 7 one, providing a GUI. I really did not want to write an upgrade script, because I would have needed to get into those details again, while all the content types were standard ones. So, GUI was a plus. At that time there was no stable release including the GUI, though, so I took the development version. Took it, run it, was delighted. Only a little bit later I found out, that the tags were not assigned and node and term IDs (tags) were shuffled. Reassigning the tags worked with some SQL select and insert.
INSERT INTO field_data_field_tags (entity_type, bundle, deleted, entity_id, revision_id, language, delta, field_tags_tid) SELECT 'node' AS 'entity_type', 'blog' AS 'bundle', 0 AS 'deleted', node.nid AS 'entity_id', node.nid AS 'revision_id', 'und' AS 'language', (@jDelta := @jDelta +1) AS 'delta', taxonomy_term_data.tid AS 'field_tags_tid' FROM taxonomy_term_data, node, oldDatabase.term_data, oldDatabase.node, oldDatabase.term_node, (SELECT @jDelta := 0) AS jDelta WHERE oldDatabase.term_node.nid = oldDatabase.node.nid AND oldDatabase.term_node.tid = oldDatabase.term_data.tid AND taxonomy_term_data.name = oldDatabase.term_data.name AND node.title = oldDatabase.node.title ORDER BY entity_id;So, the Node IDs and Term IDs were left. This is a problem, because they are contained in the URLs. From a SEO point of view, keeping them different will confuse search engines. Likely that they get it right after a while, but as a former SEO consultant you want to do it the right way. Changing them back would work, but the IDs are used everywhere and there is a lot of tables. Before I decided for the migrate module I considered migrating the content just by copying it from the old to the new database, but things changed are without getting really down into it, many new tables and columns remained unclear. The lazy approach was to to redirect the old node IDs to the new ones.
SELECT CONCAT('redirect 301 /node/', oldDatabase.node.nid, ' http://www.arthur-schiwon.de/', alias) FROM node, url_alias, oldDatabase.node WHERE node.title = oldDatabase.node.title AND source = CONCAT('node/', node.nid);It redirects the old URLs containing the old node IDs to the clean URLs. For some reasons, something happened canonical tag in Drupal 6 so that the old clean URLs where not used, but the ugly ones. I do not want to have them in the search engines. Now, this is fixed as well. The result contained duplicate lines, somehow, but they could be easily dropped or the correct alias chosen. In few cases, I needed to update the alias, commas led to some problems. I pasted the result at the beginning of the .htaccess file. The same needed to be done for the term IDs. It is not the best approach, but given the limited time I could and wanted to spent this is OK. In the end, it's a private blog for fun and fame, but not for profit. It is essential to try whether all important old URLs will still be reachable to avoid broken links. Broken links are bad for visitors as well as search engines. I used linkchecker, available in Ubuntu repositories, to collect all the URLs from my old site.
linkchecker -Fcsv/urlstate.csv --stdin -t1 -r0A lot of stuff is gathered I took the whole path pointing to my domain, replaced the domain to my test domain, saved it in a text file and ran curl against them, I wrote a small script for this.
#!/bin/sh OUTPUTFILE=new-url-stats.csv for url in `cat urls-new-ws`; do status=`curl -I $url | grep "HTTP/1.1"` echo "$url,$status" >> $OUTPUTFILE doneIn the resulting CSV file I had the URL and the status, good enough for me. In LibreOffice, I auto-filtered it and sorted out the faulty or suspicious URLs, i.e. those throwing 4xx errors. If things needed to be fixed, I fixed them and rerun the script again until I was satisfied.