Force-directed graph algorithms are fun. Imagine a collection of particles, each repelling each other much like electrical charges do. Add springs between some of the particles, and let them all bounce around until they attain a rest position.
Hence, Piperka Map. I've modelled user data as a force-directed graph and put it available on a map page. The bigger the circle, the more readers it has. Relative closeness means that the comics have common readers. I've deliberately de-emphasized the exact subscriber count from the map. Mouse wheel zooms and the map can be panned by dragging. Clicking a comic opens up controls, where you can open up the comic's info page or, if you are logged in, subscribe to a comic. The quick search dialog is available on the map.
I've added links to the map from comics' info pages and from users' profiles. There's the option of highlighting another user's comic picks. Only comics with readers get a place on the map.
The map viewer code is my own and I haven't paid all that much attention to cross browser compatibility. It's built using SVG, which can require more support from a browser than some other technique. It works best on Chrome, with some glitches on Firefox and I've no idea how IE copes though I'm certain nothing below version 10 would work. The map is a snapshot from today, but I'll enable daily updates later on. I may yet alter the algorithm in ways which would change its layout. This isn't something where there would be some intrinsic right way of doing this, but it's all determined by aesthetics. As for my graph layout program itself, I'm considering releasing it as an independent project.
I suppose Piperka's subscriber data is pretty challenging from a data mining perspective. Most people read xkcd and any algorithm would find that out first. I may yet write about the current related comics algorithm, the one that is still labelled as "experimental". It, too, involves graphs but in a more abstract sense and something alike Gaussian blur from image processing. It's my original work, which rarely is a good endorsement for something like this. I may replace it with something yet, but for now, it stays.
Piperka Map was inspired by Ruslan Enikeev's The Internet map. When I saw it, I got immediately the idea that I had similar data in my hands and wanted to do something similar. With the amount of data I had, I could well do it all with plain old CPU code. Kudos to AMD for my FX-8350 which dutifully crunched numbers during my numerous attempts at getting something sensible looking out. The first thing I did was to make my code run in parallel.
This all doesn't have much to do with catching comics' updates. Then again, Piperka isn't necessarily about that only. It was interesting so I wanted to do it.
I'm introducing a few social networking features to Piperka. Never fear, none of this will affect you unless you choose to use them. I've found myself checking out the comics' lists of readers, to see if can spot familiar names, and I thought that there's a case for making that kind of information more prominently available.
You can now follow other users. That means that comics that they read get hilited (currently, with a small "F" letter) in the listings, and there's a new profile privacy setting, protected, which lets other users see your comic picks once you give them the permission.
When you follow another user, they (and only they) can see a link back to you on their followers page, unless your profile is set as private. This means that with the private setting, you can only follow public profiles. Following is an asymmetrical relation, so being followed doesn't mean that you need to reciprocate. Permissions to view a profile are separate from the followee status. Having a permission but no followee status just means that the profile is visible but it won't be hilited.
Private profiles are as hidden as ever. Trying to see them makes Piperka neither confirm or deny that the user exists. Viewing a protected profile without a permission lets you ask to see their profile. I've made all the existing profiles either private or public and new users will get protected as their default setting.
I'm still going to work on how Piperka lets you access this information and on the presentation, but the basic ideas should be in place already. I'll see first whether anyone else cares to use this feature besides me.
I may do some data mining on the relationship data and show the results publicly. Nothing that would identify any single user. Basically the same thing I do with the comic subscription data, or would do if that part still wasn't pretty underdeveloped. I won't be giving anything sensitive to any third parties and I hope that you already expected that of me.
As an aside, singular they is weird.
It's been a while. I've purposefully not tried to stick to any update schedule, but I guess I could have written something earlier, already. So far, there have been two larger, yet not all that visible changes to Piperka this year.
But before I go into that, I should tell of an old bug. I redid the account and login handling code two years ago. After that, the new account creation code discarded the user's email instead of storing it. Please go check your account information page and check that you have an email address there. You'd need it for password recovery.
The rest of this post is going to get, unavoidably, more technical. In March, I introduced support for banner uploads. Prior to that, they were submitted as plain old text links. It worked, but it was a rather ungainly way of doing it. The comics have banners submitted more frequently now, so it seems that it solved a problem. I had to clean up quite a bit of old code to make that work, starting with how I was calling Mason. And since I'm using AJAX submits instead of plain old Web 1.0 submits, I had to use File API. I opted for simpler code and just tell older browsers that banner uploads are not supported.
The second, more recent, change is a bit fancier. I made the listing pages (top, browse and profile) navigation use AJAX calls and History API. The (un)subscribe buttons use AJAX now, too. It was remarkably easy to leave the old code path in place for browsers without working History API support. All I needed to do was to not add event hooks to the navigation links and let the old links work as is. By my rough testing, it looks like I made page load time drop by 25% and the page loads won't flash content anymore. I suppose I could further squeeze a pageful of comics to MTU, but that's a project for some other day.
Both changes introduced some bugs that some of you ran into and thankfully told me about. They have been fixed.
I solved a few annoying bits along the way and got to practise some HTML5 techniques. If I had to work on invisible parts of Piperka, I suppose that making the crawler errors easier to manage would have been better used time. But my whim took me here.
If you're interested in even more details than this, then I should remind you that Piperka's source code is available. Of which not all that many people are aware, even those who have contacted me about technical matters. I ought to add a link to a page about the source code to Piperka's templates.
I updated the bookmarking code on Friday and it looks like that went smoothly. The front end interface is pretty much unchanged and most of what I did for that part was to touch up the messages. Pretty dull and it's something that just works. But the back end has been rewritten, fixing up a lot of old cruft and making my life easier all around. The best patches are those that remove the most lines. Since I don't have anything new to tell about this feature (it still works, just more so), I'll write about something old instead.
As I planned it originally, Piperka wasn't going to even keep a table of all the archive pages of a comic. Piperka got its start as a refinement of my personal comic grabbing scripts. You can find such programs on freshmeat.net, if you like. For that use, I only needed to store the last known page. But, add more users to that, who will have a different number of unread pages waiting for them, and the leap to just putting all the comic's pages in a table wasn't all that big. Using that stored information the other way around, to recognize which comic and which page it was, came naturally after that. It was just a too obvious a use to miss. But I didn't exactly plan on having it as an feature.
Truth to be told, not much of the initial Piperka was planned. I just started coding and features fell into place. I was surprised at having people trying to feed a comic's home page to the bookmarking code, but it made sense to add that.
Piperka stores archive pages in three parts. There's a common base containing the protocol, "http://", domain and a part of the path, and a tail, which is something like ".html" or just empty. The content part contains the variable part, be it a date, a number or something else. It does other preprocessing too, like strip out the protocol and any initial "www." from the URL. That would fail if somebody hosted different comics under "www.mycomic.com" and "mycomic.com".
My first version of bookmarking was as simple as (sorry for the raw SQL):
The next version of the code lived in a perl module. The biggest update was that it used the domain in the given URL to match with the corresponding comic before trying to match the specific page. To deal with multiple comics residing under the same domain, I added special cases to the code, where they would add a bit more to the initial part of the string used to match with a comic. I had to update that special case code by hand if I had new comics using the same domains and it eventually grew to be tens of lines long. Another weakness of the code was that it was unable to handle comics that have identical initial parts up until the variable part. Try feeding, for example, "http://www.meetmyminion.com/?p=" as a bookmark to see how it currently handles that case.
I thought, at that point, that I'd need some persistent data structure to store the initial parts, automatically separated to tell apart the comics with similar domains and initial paths. I wrote a daemon to handle bookmark requests. I was never quite happy with that approach, and it had some bugs that I never got around to fix. Worst of all, it had a habit of sometimeseating all memory until OOM killer reaped it. I had the web server code restart it if it wasn't running when it was needed, but it was still a problem. And I had to manually tell it to refresh its index whenever I made a change to a comic's entry.
My latest change threw out the daemon code. No more RPC and socket handling, it's all done as PostgreSQL procedures now. No more code maintaining an index in its own process, but instead just use PostgreSQL's own indices. They can do matches based on the initial part of a string just fine, and it can do its job in a few milliseconds. As usual, it's the approach I should have taken in the first place. I'm not sure if I would have come up with it if I had just read PostgreSQL's fine manuals a bit more and not just enough to get started. Perhaps, perhaps not. It's been seven years and I'm sure I have developed as a coder along the way. Pretty often the best design choice has been to not start coding, yet.
One other thing which got removed with this change was the code that offered support for using Piperka without registering as a user. That never worked as well as with using accounts and I suspect that it has been broken ever since I added the CSRF protection. I don't know if anyone ever used that at all. I'm thinking of adding an option for logging in with OpenID, which would (hopefully) lower the barrier for anyone to try Piperka out. Not that I'd expect that to matter all that much, but it'd still be a nice feature. There'd be no need to come up with a password for Piperka if I added that.
I made something new. Until I think of some other name for it, I'll call it Piperka Reader. Not very original but it was an easy name to pick.
Piperka Reader is a page that embeds comic archives in an iframe, with controls on a bar on the top. There's the usual buttons for going to the first, previous, next and the newest page, and a dialog window with a list of all the pages of a comic. The same navigation buttons work for any and all comics listed on Piperka. If you've logged in, it can automatically move your bookmark as you read, or you can set it yourself. When reading comics sequentially, it uses an iframe to preload the following page in the background. Meaning that the next page is already there ready for viewing by the time you've read the current page and click next.
I used parts of the comic's URLs as the page names in the archive dialog. Archive pages' titles would likely be a better choice for that role but as I haven't stored those on Piperka, this'll have to do.
I've labeled Reader as "beta" for now. I'll yet add more functionality to it and it could use some polish. It's suitable for reading longer stretches of archives but it'd take a bit more to let it browse daily updates easily, with one or a few unread pages at most. I've developed it using Chrome and I can hope that it'll work with other browsers too.
If you look closely, you'll find that this means that I've made Piperka's comic index easily downloadable. All the 1546619 pages in it. I don't mind if you access them independently of Reader, but I'd appreciate it if you'd credit the source and let me know if you use them for anything. No guarantee that they continue to be available or that they'd be useful for any particular purpose.
I hope that no comic author minds that I embed their content like this. I'm not trying to misrepresent whatever they host as mine or that they'd be associated with Piperka. Technically, Piperka itself doesn't access any more content than what it did before, it just allows a user to do so, in a bit different manner, but arguing that would be sophistry. Let me know what you think.