Piperka blog

Progress on the Piperka backend rewrite

First off, apologies. Recently, I had let the root partition become full. Twice. Backups take space and the partitioning could have been better. The second time was even due to my own action. I shouldn't try to do any sysadmin things when I'm dead tired already. The server itself still has plenty of space, I just used it poorly. I feel pretty embarassed about this and promise that it won't happen again. I would like to hold Piperka to a better standard.

You may have noticed that nothing much has happened to Piperka on the development front for quite a while now besides the usual gradual additions of comic entries. I'm still in the progress of rewriting the backend of Piperka. I purposely took the long road for doing it and I had to start with writing my own authentication backend for Snap. It was quite a dive to the deep end in an alien environment. For now, I'm using that single repository host the Piperka specific code together with the rest. I may still split two other libraries from it and make them all end up on Hackage. But that's still in the future. I'm using Heist for generating the site HTML and in the process I've ended up writing my own tutorial for using it and I've started contributing to Heist itself.

I'm hoping to have a running demo in a few weeks. I feel that I've finally got over most of the blockers for getting really into the action of writing a backend I'd really be happy about.

I got the question: Why Haskell? There's plenty of writing all around the web about why people like it. I, myself, find its purity pleasing and it has a static type system that is actually useful. I feel that programming, at best, is all about inferring connections and a good type system is the way to go about expressing that. I could also wax poetically about the Curry-Howard isomorphism. I have a math background and for me, ending up doing coding as a career still feels like a sidestep. CH means that programming and math are the same and Haskell is a programming language where I can really feel being in touch with that deep relationship. Types are like propositions in math and writing a function is like making a proof. I'm back home but I don't expect that I could convey how satisfying this feels to me. As I see it, the Haskell ecosystem is quite mature at this point and that it's practical to use it for writing a web site backend. I have no excuse for not going for it.

A lot of people have written me but I'm generally bad about responding. The main thing, for now, is that I'm dedicating my development efforts to the rewrite and I'm keeping touches to the current code base at a minimum. Even when it'd be something quick. Several people have asked for Piperka to use HTTPS and I hear you. I'll do it with the new backend.

I'll need to think of an issue tracker and some other development resources. It'd be better to stick the development ideas and requests I get there since I'm being slow about implementing them. I use github but I'm still feeling lukewarm about dedicating it for Piperka use.

People have requested for it: Piperka is now on Patreon! I would likely have done something like this sooner but I was worried that it would be too much of a hassle with regards to taxation and that it would interfere with my potential unemployment benefits. I've made some queries regarding those and I think I'm satisfied with the answers. It's still anybody's guess what it means with regard to a particularly odd Finnish law regarding soliciting donations but I've decided to not worry about that.

Please don't make any connection between Piperka's server having a partition full on two occasions recently and having Piperka come to Patreon. I'm certainly not throwing a switch just to make you feel like you should appreciate Piperka more. I've been thinking about this for a long time. Piperka's a bit ill fit for Patreon, since they emphasize content creation to such a degree and a service like Piperka falls outside of their model. They collect VAT from EU residents and there's a yearly limit in Finland below which you can sell things and not have to apply VAT and I'd expect to be well below that and Patreon's doing me no favors by being proactive on it. I'll be paying income tax for whatever I receive as is. Still, Piperka's better off going where the web comics reside.

Since I suspect that some of Piperka's users may end up going there: Worldcon's coming to Finland next year. Feel free to find me there if you're there.

submit to reddit
Mon, 12 Sep 2016 18:43:50 UTC

About crawler's source code

I tend to get occasional requests for getting access to Piperka's crawler code. I think I should make a statement about that, and about just what's involved with it.

Piperka's source code is available under a free license at a public darcs repository. It only includes a shim database, which specifically doesn't include the crawler code. I'll need to remind you that the main reason why the source code is available is that releasing it was easy for me once I had moved to use a version control system and FAI, with a reasonable expectation that doing so would lead to benefits for Piperka. While I do consider granting access to the source code to be a moral thing to do, with the background I have with free software and Debian, I still don't find it in myself to take the extra effort to make that happen just for the sake of it. I don't much care for the attitude how some people feel like it's something to call me out about.

As far as expectations go, I would expect to get pretty much no help from anyone even if I made the crawler code available. After all, I've released the rest of the site code and so far have yet to receive any patches based on that. You'll have to excuse me but someone would need to convince me that there's a single person out there who would be ready to get their hands dirty with Piperka's code base before I'd fulfill any request to put out even more.

Now, free software is dead without a process. Linux has one, Debian has one, but Piperka lacks one and therefore nothing is happening. I recognize that there is more I could do to further that. Things like that I should replace my email address at the bottom with a link to a contact page, which should encourage people to use the public development mailing list for discussing development ideas instead of just contacting me. I tend to be a bit of a black hole as far as email goes, and if there would ever be a thriving development community around Piperka, it can't be just about individual people talking to me. If you'd like a more web forum like experience, then the Piperka subreddit is as good a place as any. It would do good, also, if I were to list some development goals I have for Piperka. I tend to prefer to show instead of telling but I'd need to let go of that if I wanted to have others participate.

Process is just the reason why I'm reluctant to show the crawler code. With the web site code, most of the process is already in place due to the simple fact that it's on a version control system. No such thing is applicable to the crawler. No, patch is not a suitable tool for the use (as the most recent email I received about this topic suggested). I'm going to walk through a few examples of what my usual routine involves when I maintain Piperka.

Most of what I do is based on a few perl scripts and performing SQL queries by hand. The basic editing of the crawler code itself is done with editparsers script, which opens up all of the crawlers in the database. A particular crawler instance can be shared by multiple comics and I remember some of the most used ids and can pick a correct parser by looking at the site's source code alone. Most of the time I can get by with that alone, sometimes I have search the parsers list for something I could use for it or to add a case for that particular comic. Or I'll end up writing a new one for it. Web comic authors tend to do whatever renders on people's browsers and I'll just have to adapt to that. Here's an example of a common parser:

### 1
if ($tag eq 'a' && exists $attr->{rel} && $attr->{rel} eq 'next') {
if ($attr->{href} =~ m</([^/]+)/?$>) {
$next = $1;
}
$self->eof;
}

I could then use this and call something like

./inject_comic 1 http://www.paranatural.net/comic/ chapter-one /

and then I would finish the job with the genentry script. If I didn't get it right on the first try, I would try it again with getpages_init. With existing entries, I may use SQL queries to update the comics and crawler_config tables, delete the old archives from updates table and insert a new first page to the table and call getpages_single to set the crawler to rebuild the archive index. After that, I'll need to compare the old archive index and users' subscriptions to see if there were any pages that were left out with the reorganized archive on the comic's end. Sometimes, there are some hiatus or delay announcement posts or guest pages that had crept in that were cleaned out when the comic author rebuilt the archive, and I'd need to account for those somehow.

Don't worry if you didn't follow all of that. As it stands, all the scripts mentioned are included in the source and that's enough of an example to getting a comic added to the development environment, if you have one set up. My point with this is that the workflow is centered on me doing things on an SQL prompt and with perl scripts on a shell prompt. It's not perfect, it could certainly be improved on, but it's what I've ended up with and it works. If you are about to request to help me with maintaining Piperka's crawler and index, then you should be very mindful of what I'm doing currently. There's a lot of things that would need to happen if I could ever accept outside help with this. I'm not going to give anyone else direct access to the SQL database, I hope I don't need to go into detail about what a disaster that would be. I'd need to come up with a work flow that didn't involve using that, as a starter. To reiterate my point, I'm not releasing any crawler code until there is a reasonable case for seeing that benefit Piperka. Quid pro quo.

Another aspect of running the Piperka's crawler is that how it accesses web comic sites concerns its (that is, mine, in the end) relationships with comic authors. If they perceived it to cause unnecessary burden on their sites, they would sour on having their comics listed on Piperka at all. Unwarranted or not, it is something I need to be mindful of. If I had scores of people I had little control over but associated with me knocking on their sites, it may hurt Piperka's reputation with comic site admins. It can be a bit touchy subject.

To go forward with any of this, I'd need a new interface. The natural place for it would be as a part of the web site. But you know what. I've come to realize that I'm disgusted with the web site back end code, which dates back to 2005. If I feel that way about it then what hope do I have for anyone else to touch it. I made an effort to rewrite it at around 2008 but nothing came of that. Things improved vastly when I got to apply version control on it and build a real development environment for it, but the code still stinks. I'm ready to toss away the old perl code, built on top of Mason. I'm thinking of going for Snap. I have an idea of an ideal web framework I'd like to use. Snap isn't it, but it'd be a good step forward. It'll take time, but I'll hope to have a beta version running in a few months.

submit to reddit
Thu, 13 Aug 2015 19:20:0 UTC

OAuth2 logins

I've implemented OAuth2 logins for Piperka. If you've seen a "Sign in with Google" or somesuch link on a site, then Piperka has one now too.

I made it possible to have a Piperka account completely without a password, which made the changes needed a bit more intrusive than what it would have been without that. I kept the password protected section on account page as is, but now it is optionally OAuth2 protected.

The plan was to roll out an initial version with a choice of reddit or Google as identity providers, but the Google one didn't turn out to work on the production server. I suspect that it's because it tries to use IPv6 addresses there and they never worked after I set up bridging for a virtual server on the same host. I'll figure that out some other day. So the only option besides a plain old password, for now, is login with reddit.

The update for this change didn't go quite as smoothly as I'd liked to. Piperka was down for an hour or so, sorry for the inconvenience. Please test things and let me know if you see anything broken or unexpected.

I've been working on this on and off for several months now. Right now, I feel pretty burnt out regarding Piperka. I don't think I'll be pushing for any major changes until doing so would feel fun again. As you know, Piperka is a single man effort and I have a day job on top of that. I'd love to get others involved with the development and maintenance of Piperka, beyond asking me to fix crawlers, but I feel that getting that happen would require even more work than just doing whatever I do myself, whenever I feel like it.

I think I have griped about the exact same thing before on this blog. But you'll just have to excuse it. I have a month's worth of comics on the submit queue and a number of reports about broken crawlers. I'll work through those in the nearby future. After that, I'll do something fun, which most likely wouldn't involve Piperka.

submit to reddit
Sun, 08 Mar 2015 16:50:40 UTC

Whither Piperka

Let me tell you a story. There was a stone and a pot. And everyone was well fed by the stone soup. I may have skipped over a bit.

Let me talk about motivation, that is, mine, and let me talk about Piperka. I've been running it for over eight years now, which is a long time for doing anything. In many ways, it remains the same that it was from the start, the basic idea was there from the beginning and much of what I have been doing since has been auxiliary stuff or refinements. Besides the ever-so-present need for fixing crawler issues and adding a few thousand comics.

My motivation matters since I'm the only person behind Piperka and nothing much happens unless I do it. And it's not anything as simple as turning a profit, either. I had aspirations of commercial success when I started doing it, but that idea has been laid aside along the years. It's been a great way to practise my coding skills and accumulate proof of thereof. It has helped me get jobs. I can use it to explore techniques that I wouldn't have the opportunity for at my day job. I've been doing it for fun, and out of curiosity to see how people would react, and sometimes in anticipation of positive reviews.

It's been a fun ride, but right now I'm in a bit of an impasse. I've seen less progress last year than what I'd like. I changed jobs a year ago (to a much nicer one) and moved to a city where I can have a social life, again. Both of which have made me to neglect Piperka. I've had less time and when I've had it, I haven't necessarily felt like touching Piperka. Yet I'm not about to abandon Piperka, if only because I use it myself and I couldn't imagine reading web comics without, or with someone else's software. It's frustrating, seeing how Piperka could be better but feeling like I'm being the bottleneck, preventing that from happening.

I've tried. Build and they come, they say. I released Piperka's source code when it became feasible and I convinced myself to take the leap, inviting people to participate, and that's gone nowhere. One thing that I should have done already back then would be to arrange some other point of communication besides emailing me, personally. I've set up a public development mailing list, feel free to join to say what you'd like to see fixed or improved, or just follow along. A blog is fine for (sadly, infrequent) announcements but I hope that a mailing list would lend itself to a more open discussion.

So far, Piperka's story hasn't been that of the stone soup. It's been all me. I could have tried to commercialize it, but that's not the route that I wanted to try to take. I could have made look all serious and rename it as something less whimsical, as something that actually had something to do with web comics, but I didn't want to. Piperka's quirky and odd and so personal that there are days when I feel like taking it down just to avoid having others look at it, and I like it that way. I'm not sure how trying to get others to participate in running a web site would even work. I can release the source code but there will still be only one Piperka, which I control. That's something I can't even try to meaningfully change before the support would be there.

I'll say a few words about Piperka's crawler, comic updates and the database, to give a picture of what's involved in trying to share the burden of maintenance. It's about the plumbing, the hidden part which becomes visible only when it doesn't work properly. I take users' privacy seriously and that alone means that I'm not giving direct access to the database to anyone. I could, in principle, strip it of user data and make the crawler code contained in there available for people, in hopes that I would receive help in maintaining it. I'm not going to do it at this time. First off, I need to be convinced first that releasing the web site's source code was a good idea, even, before I release even more. I don't want to do it only to see nobody care. Maintaining it is unglorious janitorial work which even I find too much like work at times, and I don't expect people to jump in for that. Beyond that, there's the same problem I had with regard to the source code itself. If I made a dump of the data and someone improved on it, then there would be no easy way for someone to send me the fix, nor for me to accept the changes. It doesn't live in a version control system but in a database, and the web comics' archives it tries to reflect is quite an ephemeral bunch. Beyond that, it's not just a matter of fixing the comic archive index, but there are ways that the data is tied to users' settings. For example, when a page needs to be removed from Piperka's index, all the bookmarks need to be bumped down by one page to reflect the change. One future development plan I have is to allow users make annotations on a comic's archive page and that'd only make it more necessary to consider user data when updating the comic data.

What I'd really need to improve on Piperka's crawler would be a better user interface for the maintenance work. I do it all currently on a shell on the server, calling scripts to invoke the crawler or update the database directly via SQL queries. It works and updating a comic, on a comic per comic basis, doesn't usually take all that long. But it could be improved on, and all that effort does add up. I would have saved the time used for that many times over, had I used the time to build a better user interface for the work. There would be another benefit, too, if I had the interface: It would enable me to hand over parts of the maintenance to others.

Ultimately, this is a question of what kind of a site Piperka is. In a sense, its purpose is to stay out of the way. Nobody visits Piperka to see Piperka itself, but to get to read the web comics. But how does it do that, how does it present itself to people and who's working on it behind the scenes. I find myself unsatisfied with how the answers to these stand now, and I'd like to see that change. To put it simply, I want to make running Piperka feel like fun again.

submit to reddit
Mon, 06 Jan 2014 11:22:58 UTC

Piperka Map

Force-directed graph algorithms are fun. Imagine a collection of particles, each repelling each other much like electrical charges do. Add springs between some of the particles, and let them all bounce around until they attain a rest position.

Hence, Piperka Map. I've modelled user data as a force-directed graph and put it available on a map page. The bigger the circle, the more readers it has. Relative closeness means that the comics have common readers. I've deliberately de-emphasized the exact subscriber count from the map. Mouse wheel zooms and the map can be panned by dragging. Clicking a comic opens up controls, where you can open up the comic's info page or, if you are logged in, subscribe to a comic. The quick search dialog is available on the map.

I've added links to the map from comics' info pages and from users' profiles. There's the option of highlighting another user's comic picks. Only comics with readers get a place on the map.

The map viewer code is my own and I haven't paid all that much attention to cross browser compatibility. It's built using SVG, which can require more support from a browser than some other technique. It works best on Chrome, with some glitches on Firefox and I've no idea how IE copes though I'm certain nothing below version 10 would work. The map is a snapshot from today, but I'll enable daily updates later on. I may yet alter the algorithm in ways which would change its layout. This isn't something where there would be some intrinsic right way of doing this, but it's all determined by aesthetics. As for my graph layout program itself, I'm considering releasing it as an independent project.

I suppose Piperka's subscriber data is pretty challenging from a data mining perspective. Most people read xkcd and any algorithm would find that out first. I may yet write about the current related comics algorithm, the one that is still labelled as "experimental". It, too, involves graphs but in a more abstract sense and something alike Gaussian blur from image processing. It's my original work, which rarely is a good endorsement for something like this. I may replace it with something yet, but for now, it stays.

Piperka Map was inspired by Ruslan Enikeev's The Internet map. When I saw it, I got immediately the idea that I had similar data in my hands and wanted to do something similar. With the amount of data I had, I could well do it all with plain old CPU code. Kudos to AMD for my FX-8350 which dutifully crunched numbers during my numerous attempts at getting something sensible looking out. The first thing I did was to make my code run in parallel.

This all doesn't have much to do with catching comics' updates. Then again, Piperka isn't necessarily about that only. It was interesting so I wanted to do it.

submit to reddit
Sat, 26 Oct 2013 17:11:25 UTC