Piperka blog

Recommendations 2.0

Piperka has a new recommendations algorithm in use. The old one was using a user-based collaborative filtering algorithm, the new one is based on alternating least squares. I still didn't try to evaluate the algorithms' performance but I liked the new results well enough to stick with them.

In a sense, UBCF and ALS are of different generations of algorithms. The old school way is to try to understand a problem domain and to reason about it and to craft the algorithm accordingly. In contrast, ALS tosses all that into the wind and just siphons the data through an n-dimensional straw. Somehow, that builds generalizations that make for sensible recommendations.

Another internal difference is that the new algorithm is lives within the backend process itself and it's no longer an external dependency. It's been implemented in pure Haskell and I've released it as a separate library. A native solution makes it easier to do more with the results. One new thing is that I've added recommendations as another sort option for the comic listings. I found it interesting to see the comics I'm already reading ranked within the results. You can think of the top picks as the algorithm's idea of what comics are the most characteristic for you to read. I'm not surprised that it would think that I would like the space elf girls comic the most. Also, it's easy to check what the algorithm thinks that you shouldn't read.

Looking at my personal recommendations, I can see that it lists a lot fewer generally popular comics. That's good since they're easy to find on the top list as is. On the flip side, it lists a lot of old and possibly abandoned or completed comics. That's fair as I do have, as the most senior Piperka user, a lot of those in my own selections. Still, I'd like it to prefer comics that are still updating or at least complete. The current implementation doesn't score the input data in any way but the algorithm could be extended to use those and I could then add positive scores for those comics.

I wanted to do more with the recommendations besides just list the results and I added a way to view them all in one view. I've added overlays to the Piperka Map. With them, you can try to see if there's any interesting hot spots within your results. Also, the recommender algorithm gives 10-dimensional coordinates for all the comics as a byproduct. Those represent comics on some feature axes. I have no a priori idea what those would represent and they might change along with the data over time. I was curious to see what they would show and added them to the overlays as well. For those, I've listed a few comics from the extreme ends to give some idea what they might be about. Mind you that this is not PCA and there's no implied order of significance with the dimensions.

I found it fascinating that the axes would match so well with the areas found by my graph layout algorithm. They operate wholly differently and I had no reason to expect to see them line up so well. It's there if you wanted to see what electric sheep are like. Not that even I would consider it practical knowledge.

The related comics algorithm has been overhauled as well. I've retired my homebrew graph thing and replaced it with ALS. As described above, it gives coordinates for all the comics, so how would you find comics close by to each other? By comparing their Euclidean distances, of course. Comics close by in that sense turn out to be close by semantically, as well. I haven't measured that in any way either, other than just eyeing through the new results and they make sense to me. I'm sure someone has an academic paper somewhere about why that works.

So go see the new recommendations or check on comics that you like to see if the new algorithm has found anything you'd like as well.

I'm still tempted to go creative with the recommendations. Linear approximations are so mundane, I'm sure I could use an ANN as the straw instead. There's bound to be an XOR of comics in there somewhere. Maybe some day.

Mon, 27 Jul 2020 06:27:31 UTC

Maintenance

I haven't much to tell about new site features at this time, but I thought it'd be a good time for an update, nonetheless. Most of the time I've used on Piperka during the last couple of months has gone on plain old maintenance. I've improved the crawler health page to reveal more issues from the crawl log. Consequently, I've reindexed and/or unstuck a bunch of comics (I didn't try to count how many) and removed 296 stale entries since the last time I wrote on the blog.

I've joined Discord. I'm not one to care much about social media, but seeing how actually meeting people has been a scarce occurrence this spring, I gave in to it. I set up a Piperka server if anyone'd like to talk about comics with other Piperka users. I've tried to offer IRC as a contact option even before and I used to have a mailing list but those have never had much if any participation. Emailing me still works and I'll admit that if it's about something that'd need coding or otherwise accessing the database then nobody but me would be doing it so it's not wrong to do so in that sense. But it'd be an improvement if discussions about Piperka weren't always one to one. Also, Gitlab issues is another place to talk about bugs, features and ideas. If it's not something I'd start working immediately I may file it away on Gitlab.

I've finally removed the blurb for logged in users using a mobile device who haven't explicitly chosen to use the mobile site version about having it as a new feature. I meant to have the message up there for a couple of months at most but it fell off my mind and I only came across it almost a year later, when I logged in to Piperka with a new phone. I think you know it has one by now.

I've been thinking about how Piperka could do a better job about helping users find new comics to read. Ads work, seeing how running one for a few days on Piperka can easily bring one a dozen new readers. But I've been doing some occasional and unintentional comic promotion myself. Sometimes, I miss that a comic was already on Piperka and end up making duplicate entries. When I merge them I usually see that it has gained a number of readers. Like God-killer which was initially added in January and got reintroduced in May. It had 8 readers initially but it increased to 13 after spending a few days at the top of the new comics' list before I merged them. Foxy Flavored Cookie went from 5 to 8 readers after a similar treatment.

I suppose I could try to improve Piperka's recommendations. I would have liked it if it had listed these two comics near the top for matching users already and they would have picked them from there already. Instead of running into them due to a mistake I made. Years ago, I used to show at random the banners that have been submitted to Piperka on the header ad slot and I suppose that had helped someone find a comic as well. In that vein, I've added a button to view a random comic to the recommend page. I know that it's not a terribly original feature. It selects comics that are either actively updating or have been flagged as complete. That's 1566 comics out of 5083, as of this writing.

One further thing I would like to do is to finally run some tests and evaluate different recommendation algorithms. The one in use currently is pretty much the default one from a library. It works reasonably well but I'm sure it has some alternatives and knobs I could turn. Working on recommendations always makes me hope that I'd have more users to turn into input data.

So, if you like, drop by and come say hi on Discord. I've added a link to the footer a while ago and wrote about it on the about page but nobody's joined with those yet. All the server has currently is a bunch of Finns but (like all Finns) we'll switch to English as soon as anyone who doesn't speak it comes forward.

Tue, 23 Jun 2020 12:16:06 UTC

Discover mode

I've added a new site feature that should make browsing new web comics easier. Discover mode is a way to use Piperka Reader to view comics, one by one. It's got the same archive controls as Reader does but also a button to move over to the next comic in the queue. It's more convenient than clicking the info pages open from the listing and then clicking to see the comic on them.

I haven't wanted to add links to the comic sites to the listings itself, since I've felt that I should give some idea of what the content is like before exposing users to it. It's hard to add anything more to the listing views without cramping the layout. To avoid seeing too much of tab A into slot B action unwittingly, I've added the option (by default, enabled) to mask comics that have been tagged as NSFW in the discover mode.

Though many sites use techniques like X-Frame-Options or Content-Security-Policy to block embedding their sites with iframes, making this feature less useful than what it would be. In my opinion, those flags should be overrideable in the browser. They're just as much an anti-feature as auto-playing videos. I may end up writing a browser extension if I'd want to go further with doing Reader-like things.

Discover mode is also available for viewing recommendations. Go check the recommend page to view that queue.

I've simplified tag usage, with regards to flagging comics as NSFW. I've used to use a bunch of tags that would make them show up as NSFW, but I reduced that to only using advisory::nsfw for that. I've also looked through all the comics flagged with old::adult and added NSFW for all of them that seemed to warrant it.

Teksti's had a slow start. I had added a few initial entries that I follow myself, and a few based on requests. I suppose I could've inserted a few dozen based on some sites ranking them but I decided against it. I was thinking of putting Google's ads on it, but they aren't accepting new sites, citing the COVID pandemic as a reason. Go figure. As I laid out in my last post, I've added SSO so that you don't need to log in to both of them separately. I was thinking of merging the per user update lists of both sites to one page but I used a simple message that the other site has updates, instead. I could pretty well leverage the existing push feature for that.

Other updates last month include some work on the crawler. I've set it to try to audit long silent comics by running the parser to some existing page from the middle of their archive and to try to match the next page result with that's on the database. Another bug fixed just last evening was with redirects and accented characters. Tumblr has been a source of frustration for a couple of evenings. If you'd like to have my advice, don't use them for hosting comics. They have a wonky idea of what a permanent archive page is and to top it, they seem to silently lose pages. They just vanish from their archive view and the page navigation links on the individual pages start leading to pages that result in 404 errors. Even Tapas.io is more pleasant to handle nowadays.

If I go silent for multiple months, it might be because I got annoyed enough at nuisances like that to launch my own web comic hosting site, if only to get people to use Tumblr less. Or it might be because I'd've started with a new job. Which I may, in fact, be doing in the nearby future. Though I can't say that I've been maximally productive with Piperka's development even now. Even though Cataclysm: Dark Days Ahead may be a bit too realistic, currently.

Fri, 01 May 2020 07:41:47 UTC

Teksti

I'm announcing a new site: Teksti. The name's the Finnish word for "text" and it's pronounced like "texty". It's just like Piperka but this time it's for web fiction. In fact, they share the same user database and if you have an account on one then you have one on the other as well. It's got only a few web fictions listed on it so far but feel free to submit more.

People tend to try to submit web fictions to Piperka from time to time. While their format is generally quite suitable for Piperka, I've skipped them since they, well, aren't web comics. I thought that I could still do something about them and it felt more natural to have them on their own site instead of sectioning them somehow within Piperka. I'm not sure what I'm getting into but that didn't stop me from coming up with Piperka either. I'm not going to try to index just any random forums that people may use to write their novels to.

I plan to add SSO between the two sites later on. For now, you'll need to log in to both, separately. Another upcoming feature would be to unify their updates pages, to list bookmarks from both sites on it.

On the technical side, I used PostgreSQL's schemas to duplicate the comic tables for both sites, leaving the account specific bits to the public schema. The site logic was duplicated within a snaplet which embeds both of them and most of the templates are shared, with the one splicing in "comic" and the other "web fiction" where appropriate. Both sites even share the same backend process.

Not much else to tell about it. It's pretty much just a "brown Piperka".

Other things I've been doing since the last post include doing a time lapse animation of Piperka Map over the last 6 years. It may not quite be what'd move the site forward but I was curious to see what it would look like and I had a few bugs in my code that I wanted to fix and needed to do a fair amount of runs to narrow them down. I may have entertained the idea of having a fraction of the DataIsBeautiful community like my animation and perhaps having a fraction of them end up sign up for Piperka. But that didn't happen. Playing with data is fun, I wish I had more of it.

I was a bit worried about seeing my ad revenue decrease by switching over to Comic Ad Network, but turns out that it's been on par with what I had previously. I'm not sure how they do that since PW was never that good. Though what I get from there pretty much relies on individual advertisers and it may drop from time to time, when campaigns expire. Piperka's been at the top of their by earnings rankings for much of the time so it's not like I have anything to complain about. Thanks to all of you who've ran ads on Piperka so far, hopefully you've got your money's worth out of it.

I hope that everyone's doing okay. Having a pandemic going on makes it feel like all the news are like from a science fiction novel. My area's not been hit that hard, at least not yet. Most everything's been canceled and anywhere I go there are just a few people around and they are staying well apart from each other. Even more so than what Finns are prone to, usually. So stay at home and read some web comics. Or web fiction, as well.

Wed, 01 Apr 2020 16:13:07 UTC

Claiming comics, ads and Piperka tour

You may have noticed a change in the ads since a couple of weeks ago. Piperka joined Comic Ad Network. They're pretty much a successor to the late Project Wonderful. The payout might be less than with Google's ads but I'd rather use ones that are actually relevant to the site and don't try to track and profile you. If the amount of clicks these get is any indication then it's safe to say that they're better received than what I had before. It's a delight to see a comic gain a dozen of new readers on Piperka after running an ad for a few days.

The reason why I went looking for new ads right now was that Google hit me with a second soft suspend due to "invalid traffic concerns". I just don't care to deal with that anymore and I was without ads for a while. I tried a different general purpose ad network in between and I got feedback in just a couple of hours from a user that they redirected them to a malware site. In all likelihood they gave cancer as well. I'm not naming names but I removed those ads immediately afterwards. It's definitely not what I want from my site and I'm apologizing to anyone who got hit by that. I spotted CAN on a comic and asked to join them. I was a bit concerned whether they'd have me since they seem to strongly target comic authors as publishers but they accepted me nonetheless. Even though it's a bit odd that they call Piperka a comic and me an artist on their site.

I suppose I could offset the lower ad revenue somewhat by promoting Brave. It's a privacy oriented browser and I'd get a few euros for every install that's been minimally active for a month or so. I had to even go through a KYC circus to get validated. What I think of Piperka and the revenue I get from it is that I could choose to lean more heavily on ads and possibly gain tens of euros each month. Maybe a hundred, at maximum. But as neither a hundred a month nor zilch is enough to live by, I'd rather try to focus on encouraging Piperka's user base to grow, even if it means foregoing advertising opportunities. Piperka's large enough to get contact attempts once in a while from all sorts of ad networks, trying to entice me to use their ads. I think CAN's ads are useful enough that they rather feel like a service by itself. I think I've found what I'll stick with, for now.

So if you have a comic to advertise then try it out. It's a bargain for what you get.

Speaking of growing the user base, I made an interactive tour of Piperka. I haven't yet linked to it from anywhere on the site. I have a steady amount of hits from search engines and it might not be apparent right away what Piperka even is about and I wanted to try to give them an idea about it, without expecting them to create an account first. I used to have an "example" page that demonstrated the interface but that had no interactive bits to it and was a bit of a wall of text so I retired it a couple of years ago along with the site backend rewrite. I'd welcome any feedback about the tour page, especially since I may be blind of what issues there might be to using Piperka and my perspective to it is irrevocably marred by the fact that I created it. I haven't yet tried to make the tour mobile friendly so it might be better to try it on a regular browser.

I'm considering advertising Piperka. I've almost never done so but I suppose there's no harm in trying. Especially since CAN seems to give me a perfect audience for it. But if all I gave them as a landing page was my current home page then they might not even figure out what they clicked on.

I made a short example comic for the tour. It's hilariously bad. If anyone would want to contribute anything better then I might replace it. Or perhaps it's just what Piperka needs, who knows.

One new feature I made last month is comic claiming. With it, I can attach a user account to comics and it'll give you a few controls over the comic. Like direct edits to their descriptions. It's pretty basic at the moment but I can add more features to it if there's interest.

Sun, 02 Feb 2020 20:44:43 UTC