Piperka blog

Towards mobile

I'm not a big user of mobile devices myself. This old thing is what I usually have in my pocket. It is much more pleasant to hold in it than a large smart phone and I'm not going to worry about any scratches on it. I do have a couple of smart phones as well but they're usually at home unless I know I need to access Internet on the go. Which isn't at all often. But I suppose I wanted to get to learn something new. This is what my browser's tab bar looks like as of now:

I'm making a native Piperka client app. I've pretty much just got to begin with developing it. I've only cursory knowledge of Qt previously which I've chosen for the job so it's a bit slow start. What I have so far is a test app which downloads the list of comics and shows it as a list. For Sailfish, of all things. Qt is cross platform and I'll target Plasma Mobile or Kirigami later on but I have to start with something and I'm just fond of the platform. I still have and use the original Jolla phone. I'll bring it to Android and perhaps even iOS later on.

I know of one Piperka user who used a Jolla phone (other than me). He tested how water resistant his Jolla was. Not much at all as it turned out. I may not reach all that many users on Harbour but I like the idea of how any apps on it stand out a lot better.

There has been an unofficial Piperka client for Android for a long while. This'll be a completely separate implementation from it. I've never used it myself and I don't know if it even works currently. I was contacted once after my backend overhaul a year ago about it being broken and I took measures to unbreak it but I don't think I ever got a confirmation that it worked again. I expect to break the unofficial client for good when I'm done with my client app and I'm hoping that nobody will mind it at that point.

I'll put my work in progress code on a public git repository once I'm a bit further along with it. I'd like it to have some basic features like logging in first. Once I have the basics in place I'll have to figure out how a Piperka client app could be useful.

I can't seem to resist the niche (am I even one to talk about them). I have bought a Purism Librem 5. I suppose it'll be released in time for me to build the client app for it as well.

I've a couple of new moderators. I'm grateful for them for volunteering.

submit to reddit
Thu, 31 Jan 2019 20:20:50 UTC

Update watch

The first new feature of the year is update watch. I've added a checkbox on the updates page and clicking it will set the page to automatically wait for updates. In practical terms, it'll wait for a signal from the hourly update run which will trigger it to reload the updates list. It'll even add an exclamation mark to the favicon to show if any new comics had been added to the list.

This feature went live a week ago already. I didn't announce it right away and I haven't checked from the server logs whether any of you noticed it yet. Looks like it's worked well so far. At least for me. With this, you won't need to hit F5 anymore.

I'm still mulling over the watcher's UX. When it's enabled, it gives no indication about what it does unless it finds updates. It does show an error message if it fails to re-establish connection and reconnects get a message as does the actual moment of downloading a refreshed list. A possible future feature would be to (optionally!) push desktop notifications about updates.

Other than the update watch, I've been working on the moderator interface. I'm storing past entries in a history table now and next up is some sanity checking for moderated content. As it is now, moderators can inject any content as comic descriptions and I'd rather safe guard that it's at least valid HTML.

Editing comic entries and moderating change requests just is something I've wanted to delegate to other people. I'd rather focus my limited time on things that'd demand a bit more than just basic HTML knowledge. "Other people" has been a few local friends so far but that hasn't really worked out anymore. I can't blame them since they've been volunteering for the task in the first place and I'm grateful that they've been at it for so long.

I'm calling for new moderators at this time. You'd get access to the moderator queue and get direct access to edit comic's info pages. Archive and crawler maintenance are still limited to just my own account. The interface for those is in a lot better shape than what it used to be but there's still a few quirks to its use and I think it's still better to keep it more restricted.

If you're a long time user who'd like to help me with this then drop me a message. Or even a fresher one but I'm more comfortable to give edit privileges to a name I recognize. I'll still want to add a few more checks to the moderator interface first but I hope to have a few names to hand roles to in a week or two. I'm hoping to get to hand off editing comics' info pages and not have to think about them much. Piperka's not about comic reviews and I'm not going to try to cater to anything like that but it would do to have a few words on the site about what a comic is about. I don't have much in the line of moderatorial guidelines. I've pretty much just held a policy of rewriting a comic's description if it's using first person nouns. It's fine for a comic's own site but that same text won't suit Piperka.

Speaking of communication, Piperka has an IRC channel on OFTC. Feel free to drop by on the webchat and join #piperka to say hi. I've been idling on the channel for a long while but I don't think I've actually advertised that fact anywhere. No wonder it's just me there.

I took a break from Piperka and wrote a patch for Heist during Xmas. Though it is still tangentially related to Piperka. I'm not about to explain what a monad transformer is on Piperka's blog.

submit to reddit
Mon, 07 Jan 2019 18:50:04 UTC

Personal recommendations

I've set up a personal recommendations page. This time I resisted the temptation for implementing my own algorithm and just used R's recommenderlab library as is, with default UBCF settings. As far as I see the results look pretty reasonable, though not particularly striking. It won't necessarily offer xkcd to you in the hundred results it shows which I consider a success but it's unlikely to suggest anything outside of the top 500 of comics either. I'm not going to win any Netflix prices with this one but it's good to have some baseline.

The only input data the algorithm uses is plain user subscriptions, with no consideration for anything like the date of addition. As such, it's unlikely to suggest anything particularly new. Currently, there are 33 comics with over 200 readers on Piperka and the last one to reach that threshold was Stand Still, Stay Silent which was added five years ago. There's only a handful of comics from the past three years that have even reached the top 500. Inactive users are dropped from the counts after half a year of inactivity so it's not just that disused accounts with old subscriptions on old comics are inflating the measures. It would do well to give a bias to comics a user doesn't necessarily know yet.

I think my per comic recommendations (with the totally custom implementation I wrote for it) does a better job at picking more specific results when it doesn't get suffocated by the strong nucleus of most popular comics. But I have no clear idea how to turn that into giving per user results and as much as I like having Piperka as my personal playground I'm not sure it's worth it at this point.

With respect to comic discovery, I would like to add overlays to Piperka Map. To color the comics listed on it according to some variables. Like from PCA. I would like to come up with some application for ANNs also. I'm going to need an embedding. I'm not a data scientist by any measure but they do have some cool toys.

In other news, I'm hosting an ad for December. Piperka's been adless since Project Wonderful's demise but I'm still considering what to replace it with. I'd rather have ads that target my site over more bids, with no user tracking or personalisation and there's no obvious choice for that after PW. Web comic authors are always welcome to advertise on Piperka as far as I'm concerned. I do get some pretty imaginative offers for ads from time to time, just by the virtue of running a web site but I don't think you'd care to read about gambling and what else. This is an one off thing at this time and I'm setting the ad up manually but I would like to eventually have something just as convenient as PW.

If anyone else would like to run an ad then just drop me a message. I'm afraid I'm still a bit rubbish at replying to queries, especially if I'm in the middle of something.

It's been a fun month but I think I'll content myself with leaving Piperka to maintenance mode for a while. I guess I'll play Half-Life: Opposing Force next, I've had it waiting for the right moment for a while. I'm glad they still make good games.

submit to reddit
Fri, 30 Nov 2018 16:36:15 UTC

Crawler health check page (mostly) empty

I've had more and better data from crawler's actions since I reimplemented much of it this fall but I didn't do anything new with it until now. A week ago, I added a view to Piperka that shows the issues it has found in the log. I made it so that it shows all the comics that have had errors when the crawler tried to find that next page during the last week and no successes during the same time. I didn't want to see every transient timeout but only those that have little chance of resolving on their own. I was in for a ride. I've removed 884 comics during the last week and reindexed or removed disappeared pages from Piperka's index and made the crawl run again for a good couple of hundred. I didn't keep an exact count of that latter group.

I didn't quite get the list empty yet as some of those reflect bugs with the crawler itself and not real issues with comics. And I still found a few cases it should have caught but it didn't, I'll just need to adjust my query a bit. Even in the best case, not all crawler issues will show up on my health page so there's unavoidably still an element of waiting for Piperka's users to report about any issues. But there should be much less of those now that I've turned the crawler finally to flag me before anyone even necessarily notices.

I get to see the date of last successful crawler action for a comic from the log too. It's invaluable to know when deciding what comics to eject or just to monitor. The base rate at which the crawler checks on a comic if it doesn't update regularly is at about once every 20 hours.

I still don't have any kind of messaging functionality built in to Piperka. Removals for comics that you read ought to raise some kind of notifications. That will still have to wait for another day, but I added something that should provide pretty much the same thing. My removed comics page lists any removed comics that you were subscribed to. I didn't necessarily look everywhere over the Internet for whether they had new homes somewhere. One thing I won't do is to make an entry point to a former site that still hosts an older copy of their archive with some message saying that new updates will be on a hence gone site.

I didn't even implement my idea about running the crawler to check on old pages to see whether they can find a known subsequent page yet. When I have that I should catch even more dead comics. Not all domain squatters are nice enough to return an easy 404 error for a former comic page.

Piperka's comic index has never been in this good state. I got curious and took a few statistics from the database: For a removed comic, the average count of pages is 189.6 and the median page count is 91. For live comics the respective values are 394.7 and 166. Not surprisingly longer running comics are likely to have a longer life.

You may have noticed that I've added a bit of styling to comics in listings that have more frequent updates. I experimented a long while with CSS styles until I settled for a white corner to mark the more active comics. I try to avoid information overload but this felt like a valuable addition.

I've been coding and upkeeping Piperka pretty much non-stop for three weeks. I could easily have ready plans for another month but I'll need to ease a bit for now. I'll consider later on what to do with the thumbnail functionality I implemented early this month.

submit to reddit
Sun, 25 Nov 2018 10:29:25 UTC

Archive thumbnails

I implemented a new feature for Piperka: archive thumbnails. So far, there's only one comic I've enabled it for: Pepper & Carrot. The page number count has been linkified and clicking it will open two dialog windows, one with a listing of archive pages and a second one with thumbnails. Thumbnails would work better with a comic with a fixed page size but this is all I have for now.

The thumbnails are generated with Selenium which is used to render the page as a regular browser would do (which it does indeed use behind the scenes) and to save a screenshot which is then compressed into a smaller size both to save space and to allow showing the whole archive in a single view. Also to make sure that this form can't actually be used to read the comic.

I got a bit enthused about this feature while planning for it and implementing it but now I'm a bit uncertain about how to proceed. I certainly would like to go ahead and download and compress thumbnails for all the 2.2 million archive pages indexed on Piperka. I think I would have a pretty good case for fair use with what I'm doing as my use is transformative and it doesn't subtract from the content's original intended use, that is, reading. But I'm subject to Finnish and EU copyright laws and practices and not US ones and they don't recognize that concept over here.

I generally like living in this part of the world but EU's increasing copyright maximalism doesn't make me feel like singing Ode to Joy. I would expect that most authors wouldn't mind that I'd generate thumbnails. It's not hard to find most of their comics copied on archive.org and they have the originals in full size. It's a nice idea that I'd ask all the authors but at this scale and with me doing it alone it's more a matter of "can't" rather than "won't". Many of them wouldn't likely even respond even when they'd be fine with my use. Some may even be more annoyed to have me contact them at all and would rather have me do whatever I do without bothering them.

Regardless of copyrights, I'd be certain to drop the thumbnails for a comic on request. I couldn't be running Piperka without web comic artists' goodwill and that's not codified in any law. I'd just like it if I could assume that I had a better default position with fair use. If you're an author and would like to have thumbnails generated for your comic then feel free to drop me an email. I just won't get anywhere far with this feature if I make it opt in and wait for authors to contact me.

I'd love to hear your opinions about this feature. Especially if you're an author.

Even without thumbnails, the archive dialog is now openable for all comics on the info pages. I haven't stored the titles for any of the archive pages and the text used on them is a part of the raw archive URL. It's a bit crude but it works. The same dialog was available on Reader all along and I did plan to add it for the info page but I never returned to do it until now.

My next development goal is to add more automation and better reporting to crawl issue detection. With the recent crawler update I have much more data available on its actions in an easily processable form and I would do well to have an interface for reviewing it. I should also add an extra periodical run to check on the health of those long quiet comics. It should tell plenty if trying to download an old page with a known following page would fail the parse to find it.

submit to reddit
Thu, 15 Nov 2018 15:10:51 UTC