Piperka blog

New site, first impressions

I'd say that the site transition went quite smoothly. The crawler had a hiccup where one faulty parser code failed silently with old library versions but killed the whole run with the new ones. Some redirects went nowhere but I caught on to that bug in short order. The Perl crawler code used a funny memcached key name for notifying about crawler progress and I needed to switch to another library with that. This is one of those situations where I was happy to receive just a couple of emails from my users. All in all, it all was comparatively boring. Boring is good.

Last week, I've been setting up various background jobs and testing the old site maintenance scripts with the new site and making some updates to them. One thing that didn't quite work was the submission handling code. The users with moderator rights weren't actually able to make edits. That bug was to be expected since I had written that part in a hurry and very late in the project when I thought I was about to get the code live.

There was a problem with users' timestamps with regards to comic additions. You may have noticed that Piperka was telling you about a couple of new comic entries for most of a day last Thursday and that notification didn't reset properly. I added a new timestamp column for that logic and now they seem to work properly again.

I found out that the script that I use for adding the comic's entry after the initial crawl that adds the archive page's to Piperka's index failed badly with comics with banners. The error looked very much like a stack trace, I didn't even know that Perl had those. I very quickly decided to not even try to fix that code and instead made the first fully new feature for the new code base: Adding comics to Piperka now works via the web page. In its current form it still is only usable by me since the initial crawling needs to be done with the old maintenance scripts on a shell account. Nonetheless, it's a step towards reducing site maintenance being tied solely to me.

Piperka now offers HTTPS connections. Thanks to Let's Encrypt setting it up was really easy. I could have done it with the old site already but I wanted to have one fewer moving part for the transition, despite the long delay. I still didn't set up a redirect from the HTTP side to HTTPS. I want to do it on a Sunday when there's less traffic but I was busy coding yesterday. Next week, then. Feel free to update your bookmarks to use "https://" already.

I'm afraid the unofficial Piperka Android App is currently inoperative. It's due to changes to Piperka's backend. I tried to retain compatibility but with other time constraints I didn't use all that much time on it or test it. I'm not its author and I can't as such quite offer support for it but I'd still rather keep it working. I may need a few days to find time to figure out what's going on with it.

I just figured out that Piperka's email sending was offline for last week. The SMTP process was supposed to be running and I did test it but apparently it was shut down at one point. The transition process was a whirlwind and it may have been due to some half finished action of mine. I'll keep an eye on it. Piperka uses email for password recoveries and for confirmations of newly added comic entries so no that large harm with that. I assume people would have told me if they tried password recovery and never received one.

I've talked much about OAuth2 and it didn't immediately work since I still needed to refresh the configuration on the providers' side. It should work now. I'll yet add at least Facebook as a login option.

The new site keeps a couple of Piperka server processes running and nginx acts as a proxy to forward traffic to one of them. Deploying changes involves zero downtime as I replace the secondary server process with an updated one and once it's done, just update the proxy port in nginx's config and reload it. I also have the option to instantly rollback to the last version if the new one turns out fail somehow. A sysadmin at work suggested this setup for me and it seems to work just fine, so far.

The point of no return was on last Friday. I had three weeks of overlap with the server contracts. I was a bit hasty with signing up with the new one but it's all well in the end. The contract period for the old server has expired and it's gone now, forever. Luckily it was soon clear that I had no need for that contingency plan. Whatever odd ends there still remain are well manageable. I was able to retain user sessions and nobody (as far as I know) was kicked out and needed to relogin. One issue that I suspect is that the account page won't accept password without logging in once, first. No need to log out even, it can be done with an incognito window. The account page's password check doesn't know about old style passwords.

I'll be giving a talk at work on an internal functional programming event day about Haskell and how to write a web site with it in a couple of days.

I'm considering that the next feature to implement is a ticketing system. I still need to act on them with the same old clunky scripts and raw SQL queries and the uptake may still not be all that great initially but I know that crawler bug reports very likely end up being just ignored if I take any more to my email. My goal is, naturally, to have much less to ticket about in the end. It's all too easy to spot stale comic entries currently.

At first, I was simply exhausted after getting the new code live but I'm slowly starting to see the new possibilities. I'm happy about the new backend code and it's much faster to do new feature development based on it.

submit to reddit
Mon, 19 Feb 2018 20:23:34 UTC

The new site is live!

I'll write a longer message later on, but this is just to let you know that the new site is now live. The old server is still in place in case there's still something catastrophically wrong but so far things seem to be going smoothly. My inbox is peacefully quiet so far. The crawler and other scheduled tasks are still offline but I'll enable them shortly. My own blog post will pass finely for a test for it.

Thanks to everyone who tested the server during the beta week. I know it still had some pretty obvious bugs when I announced it but I wanted to move forward with it already then.

submit to reddit
Sun, 11 Feb 2018 23:08:00 UTC

New code, new server, next weekend

I've had a pretty intense couple of weeks of coding. I thought it would be a week but just when I was finishing up at the end of it, I discovered that I still missed the code for processing comic info page edits and new comic submissions. I certainly was thinking of it earlier but it had slipped my mind since. Nor was account creation via the usual route of giving a user name and password working. So I had another week of pretty intense coding evenings.

The new server is live! Go poke at it at. I'll allow a week for testing but that'll become the new Piperka next weekend. It's running a fairly recent snapshot of the database and anything done on it will be reset on transition. It's not running the crawler yet and it's not getting any comic updates. Go create users and click on things and tell me when you run into issues. I'll rather fix them now instead of after it's in production. I trust that the code is in a good enough state to not blatantly leak any user data.

The backend code is all new. The first time you log in there's a longer delay as it generates a new cryptographic hash for you. When the transition occurs, old sessions should still work transparently. Or even now if you set the cookie manually. I expect to have the site down for less than half an hour when the transition happens.

The outward changes are all pretty minor. The updates page had display options on it but I moved all those over to the account page. I added more sort options for the updates page. As of this writing, the new server's not yet set up to send email and you can't ask it to reset your password. The comic info pages are still missing related and subscription history sections. I'll be using this week for fixing any bugs I find and for adding any still missing things.

I've been so long at this that I'm almost at a loss at this point. I've had variable amounts of time and energy to give to the project but that's just the way it is with a side project like Piperka. I know that some comics are missing updates as I've been giving my main attention to the rewrite. I'm not promising immediate improvement on that front as I have further development goals now, which include improving the crawler and its management interface. But I have more time to give to Piperka from now on and some of that will be on site maintenance. Be that it is still more laborous than what it needs to be.

Expect to see new things soon.

submit to reddit
Mon, 05 Feb 2018 10:39:06 UTC

OAuth2 redux

I'm done with implementing OAuth2 logins for the new code base. I'm happy to say that the new implementation is a lot more robust than the one I bolted on top of my original code base three years ago.

I could well have moved forward with most everything I detailed on my last post without but I still feel that having OAuth2 available for user authentication is worth it. I want to get the barrier for any potential users from becoming actual users as low as possible and them not having to deal with one more password helps. When this code is in place I'll make sure to have more authentication providers than just Reddit, as there is now.

I'm using OAuth2 in two roles, where logging in is the obvious one. Another use is for account setting changes, like changing the password, can be verified by redoing OAuth2 authentication. As long as you're logged in to a provider's site account changes on Piperka's end can be done with just a mouse click. I'll add later on the option to use a provider for only logins, but I've tinkered with this quite enough for now as is. Going so far as to have two-factor authentication would likely be way overkill for a site like Piperka.

Implementing this at this stage did well to shape up the basics of the site code, even the parts that didn't immediately have anything to do with logging in. I had tied parts of user authentication too tightly to the template renderer and I took the chance to decouple the logic and the code is better off due to that. Likewise, the user account changes were processed in an awkward place and this too got fixed along the way.

I'm actually nearing the finishing line with this project. Of course, I'll pretty much end up nearly just where I am right now, but I'll have a far better platform for future development. Most of what's left is to add a few AJAX endpoints, test everything and tie up any straggling ends that I have yet lying around. I'll still have to see how to add support for the unofficial Android client without too much pain, how I did it was quite hacky in the first place. I'd welcome it if I had a client that used Google's OAuth2 for authentication instead.

I've developed the new site without CSS or javascript enabled so far but I've finally enabled them last weekend. Not everything agreed with the HTML I'm generating but those were some very minor issues. I took a week long vacation from work to help finish the rewrite and to get it to production. I plan to get the new code running on a new server in a few days and will allow some time for testing but I want it all done before February.

On a personal note, I made an arrangement with my employer and I'll be working as a part-timer starting next month, at least until autumn. I'll have one more day a week for working on Piperka. I may or may not make up for the part of the salary I'm not getting, but this just is something I wanted to do. I have everything set up for making Piperka the site I want to see.

submit to reddit
Sun, 21 Jan 2018 11:08:03 UTC

Progress on the Piperka backend rewrite, part 3

This is my third progress report on my web site backend rewrite. Previous parts: One Two

I would have expected to be ready with the rewrite already. I had even reserved a part of my summer vacation for finishing what remained. Turns out that I was in a need of a vacation, instead. And I decided to implement OAuth2 login code after all, which was not unexpectedly a major undertaking. It is one more delay for the project but on the other hand it is much less painful to work on it now instead of later since I would have only then ran into all the design issues a second authentication method would have exposed in my code and it would have taken another large transition to get it to use.

I had written a TODO list of needed features in my last progress report. Almost all of them are completed by now. I'm hoping to have the authentication code done in January and the rest soon after. I'm currently at the stage where I'm testing out the new code and fixing any issues I run into. The good news is that I'm still not disgusted at the new code base even after putting OAuth2 logic in it. That was the point where I decided to start a code rewrite with the current web site code.

Along with adopting the new backend code, I'll be migrating Piperka to run on a new server. Call me old fashioned but I like to use a dedicated server and the current one is seven years old and a replacement is due.

I think it's time to talk about what my plans are when I have the new code in place. I intend to implement several things that would help with maintaining the crawler and with users keeping up with the updates.

Self-learning parser

As it stands, every parser that Piperka uses to find that elusive link to the next page is written by hand. Even though every comic doesn't get their own and there are several common ones in use. I now know that I can do a lot better than that: I want to have a piece of software that I can input the few first archive page locations and it would automatically extract features from them that would help it recognize the link to the next page automatically. I'd say that roughly only 5% of comics would still need their custom parsers, which would help the situation immensely. Moreover, this would be an interface to crawler maintenance that would be much more amenable for use by others than me.

I'd expect that there is software like this already, though I would suspect that despite that writing my own would be advantageous. I'd be interested to hear if anyone's familiar with any existing solutions for this.

Crawler improvements

The crawler could act smarter than what it does now. There are several cases where it can get stuck in situations that it could well detect and work around instead. The current version uses loop detection to avoid inserting duplicate pages to the index. Sometimes the parsers are "leaky" and catch the link to the previous page and offer it as the link to the next page and the check is in place to avoid loops. But sometimes the archives itself have removed an old page and recycled the name for a new page. In those cases, the crawler could go and see where the old page's preceding page's next link would lead and stich the removed page from the index.

The current crawler does know how to do backtracking. I can optionally give it the parameter to start seeking for updates from a few pages earlier than from the latest page. Any pages found from there on would overwrite what was in index previously in addition to inserting the newly found page in place. This is all good and well when any pages rewritten this way had only changes like typo fixes on them and the content was the same. It is just the wrong thing to do when the page was a temporary page where the artist was telling that there's a delay in the scheduled update or something like that and it would later be removed. What the crawler should be doing is to apply a text distance algorithm to decide whether to readjust users' bookmarks to show them the pages in question.

I would likely tie these both goals to the same change. I'd rather not try to teach any more tricks to the old crawler code and instead have new code for calling the new parser.

Ticketing system

When I asked people to just write me emails about crawler updates, I had just dozens of comics listed on Piperka. I had not thought what the situation would be more than a decade later. Also, having requests in my inbox is yet another blocker for sharing the maintenance work with other people. I have a long backlog and if I haven't fixed a comic entry you told me about by the time I have a ticketing system in place then I'll have to ask you to resubmit it there. I won't be going through my old emails.

In-site messaging system

A more proactive crawler would do well to inform users about any actions it had to do with users' bookmarks. Tickets need to be replied to. Readers should be notified when a comic entry gets removed for some reason.

Crawler maintenance web interface

Finally, I would have a proper interface for letting others to do actual web site maintenace work and not just request fixes from me. With a better parser and a crawler it would enable, in most cases, anyone I would like to give access to to do operations without needing to read a single line of code or without entering any SQL queries by hand to the database.

But all of this is even further away. My usable time is still the major bottleneck with getting anything done with Piperka. It hasn't gone past me that there's been a recent uptick in Patreon donations. It wasn't my idea to run an ad for my own Patreon page on Piperka's ad box but I let it pass. I appreciate being compensated for what I do and I'm grateful for each one of you who contribute. That being said, I just don't have the user base in place to expect donations to reach a level where they would translate to more time for working on Piperka. There are people whose livelihood depends on Patreon and other sources like it and I'm not one of them. I have a stable day job and Piperka is still a side project. It's nice to have the server expenses covered now but even before that I've been compensated indirectly since Piperka's something I put in my resume.

I think I would be in a much better position to seek out new users when I had the basics of the site running smoother than currently. It's been my plan to seek out more income from Piperka only later on, but people asked to enable donations so I gave them that. And it does help to keep my motivation up to see that people care. It's just that I see all the things where Piperka could be improved on and I'd rather be paid for what could be instead of what is. If that makes sense.

Patreon changed their fee structure this month. They're pretty universally used among web comic artists and there's a pretty strong network effect going on. Those of my users who are likely to go for such a thing are likely to already have an account on Patreon. In case anyone's seeking for an alternative, I set up an account on Liberapay. They have fewer fees, especially if you're European.

submit to reddit
Mon, 11 Dec 2017 20:13:18 UTC