Piperka blog

Recommendations 2.0

Piperka has a new recommendations algorithm in use. The old one was using a user-based collaborative filtering algorithm, the new one is based on alternating least squares. I still didn't try to evaluate the algorithms' performance but I liked the new results well enough to stick with them.

In a sense, UBCF and ALS are of different generations of algorithms. The old school way is to try to understand a problem domain and to reason about it and to craft the algorithm accordingly. In contrast, ALS tosses all that into the wind and just siphons the data through an n-dimensional straw. Somehow, that builds generalizations that make for sensible recommendations.

Another internal difference is that the new algorithm is lives within the backend process itself and it's no longer an external dependency. It's been implemented in pure Haskell and I've released it as a separate library. A native solution makes it easier to do more with the results. One new thing is that I've added recommendations as another sort option for the comic listings. I found it interesting to see the comics I'm already reading ranked within the results. You can think of the top picks as the algorithm's idea of what comics are the most characteristic for you to read. I'm not surprised that it would think that I would like the space elf girls comic the most. Also, it's easy to check what the algorithm thinks that you shouldn't read.

Looking at my personal recommendations, I can see that it lists a lot fewer generally popular comics. That's good since they're easy to find on the top list as is. On the flip side, it lists a lot of old and possibly abandoned or completed comics. That's fair as I do have, as the most senior Piperka user, a lot of those in my own selections. Still, I'd like it to prefer comics that are still updating or at least complete. The current implementation doesn't score the input data in any way but the algorithm could be extended to use those and I could then add positive scores for those comics.

I wanted to do more with the recommendations besides just list the results and I added a way to view them all in one view. I've added overlays to the Piperka Map. With them, you can try to see if there's any interesting hot spots within your results. Also, the recommender algorithm gives 10-dimensional coordinates for all the comics as a byproduct. Those represent comics on some feature axes. I have no a priori idea what those would represent and they might change along with the data over time. I was curious to see what they would show and added them to the overlays as well. For those, I've listed a few comics from the extreme ends to give some idea what they might be about. Mind you that this is not PCA and there's no implied order of significance with the dimensions.

I found it fascinating that the axes would match so well with the areas found by my graph layout algorithm. They operate wholly differently and I had no reason to expect to see them line up so well. It's there if you wanted to see what electric sheep are like. Not that even I would consider it practical knowledge.

The related comics algorithm has been overhauled as well. I've retired my homebrew graph thing and replaced it with ALS. As described above, it gives coordinates for all the comics, so how would you find comics close by to each other? By comparing their Euclidean distances, of course. Comics close by in that sense turn out to be close by semantically, as well. I haven't measured that in any way either, other than just eyeing through the new results and they make sense to me. I'm sure someone has an academic paper somewhere about why that works.

So go see the new recommendations or check on comics that you like to see if the new algorithm has found anything you'd like as well.

I'm still tempted to go creative with the recommendations. Linear approximations are so mundane, I'm sure I could use an ANN as the straw instead. There's bound to be an XOR of comics in there somewhere. Maybe some day.

Mon, 27 Jul 2020 06:27:31 UTC