Skip to main content

Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems. It has a simple interface and encourages reproducible results.

Project description

Persine, the Persona Engine

Persine is an automated tool to study and reverse-engineer algorithmic recommendation systems. It has a simple interface and encourages reproducible results. You tell Persine to drive around YouTube and it gives back a spreadsheet of what else YouTube suggests you watch!

Persine => Pers[ona Eng]ine

For example!

People have suggested that if you watch a few lightly political videos, YouTube starts suggesting more and more extreme content – but does it really?

The theory is difficult to test since it involves a lot of boring clicking and YouTube already knows what you usually watch. Persine to the rescue!

  1. Persine starts a new fresh-as-snow Chrome
  2. You provide a list of videos to watch and buttons to click (like, dislike, "next up" etc)
  3. As it watches and clicks more and more, YouTube customizes and customizes
  4. When you're all done, Persine will save your winding path and the video/playlist/channel recommendations to nice neat CSV files.

Beyond analysis, these files can be used to repeat the experiment again later, seeing if recommendations change by time, location, user history, etc.

If you didn't quite get enough data, don't worry – you can resume your exploration later, picking up right where you left off. Since each "persona" is based on Chrome profiles, all your cookies and history will be safely stored until your next run.

An actual example

See Persine in action on Google Colab. Includes a few examples for analysis, too.

Installation

pip install persine

Persine will automatically install Selenium and BeautifulSoup for browsing/scraping, pandas for data analysis, and pillow for processing screenshots.

You will need to install chromedriver to allow Selenium to control Chrome. Persine won't work without it!

  • Installing chromedriver on OS X: I hear you can install it using homebrew, but I've never done it! You can also follow the link above and click the "latest stable release" link, then download chromedriver_mac64.zip. Unzip it, then move the chromedriver file into your PATH. I typically put it in /usr/local/bin.
  • Installing chromedriver on Windows: Follow the link above, click the "latest stable release" link. Download chromedriver_win32.zip, unzip it, and move chromedriver.exe into your PATH (in the spirit of anarchy I just put it in C:\Windows).
  • Installing chromedriver on Debian/Ubuntu: Just run apt install chromium-chromedriver and it'll work.

Quickstart

In this example, we start a new session by visiting a YouTube video and clicking the "next up" video three times to see where it leads us. We then save the results for later analysis.

from persine import PersonaEngine

engine = PersonaEngine(headless=False)

with engine.persona() as persona:
    persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
    persona.run("youtube:next_up#3")
    persona.history.to_csv("history.csv")
    persona.recommendations.to_csv("recs.csv")

We turn off headless mode because it's fun to watch!

Persine basics

Persine is built around an engine that stores all of your global settings, and personas that represent the individual users who browse the web.

Creating Personas

Personas are always generated by an engine.

from persine import PersonaEngine

engine = PersonaEngine()
persona = engine.persona()

By default, personas are single-use and their browsing history will be discarded after your script is run. If you give them a name, though, they'll save their browsing/recommendation history so you can resume them later.

persona = engine.persona('Mulberry')

This is useful in conjunction with signing in to YouTube (see below), allowing you to imitate a real user watching videos over multiple sessions.

Launching Chrome and visiting pages

You can use with to automatically start/stop Chrome. Makes life easy.

with engine.persona() as persona:
    persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
    persona.run("youtube:next_up#3")

If you prefer more control or to visit sites one-by-one, you can manually call .quit() when you're done.

persona.run("https://www.youtube.com/watch?v=hZw23sWlyG0")
persona.run("youtube:next_up#3")

# Quit Chrome
persona.quit()

We can turn headless mode off or on depending on whether we want to actually watch what Chrome is up to. When running in non-headless mode, Persine automatically installs uBlock Origin so you don't have to deal with ads.

engine = PersonaEngine(headless=False)

Headless mode doesn't support extensions, so by default our invisible Chrome is unfortunately watching ads. We should probably switch to Firefox but it has its own problems.

Seeing and saving results

History is all of your commands you've run and the pages you've visited, while recommendations are what you've been recommended. Recommendations include video sidebars, homepage listings, and search results.

Right now recommendations also include ads and unrelated promoted content. I'm on the fence about whether they should stay or go.

For convenience, you can use .to_df() to see history and recommendations as pandas DataFrames.

persona.recommendations.to_df()
persona.history.to_df()

If you'd prefer to do your analysis elsewhere, you can save them to CSV files.

persona.recommendations.to_csv('recs.csv')
persona.history.to_csv('hist.csv')

Bridges

Bridges are site-specific scrapers that tell Persine what to click, what to scrape, and other site-specific commands. Right now the only completed bridge we have is for YouTube, while an Amazon one is in the works.

YouTube commands

Tthe YouTube bridge supports the following custom commands:

command action
youtube:homepage Visits youtube.com
youtube:search?SEARCHTERM Searches YouTube for the specified term
youtube:next_up When on a video page, clicks the "next up" video
youtube:like Clicks the like button
youtube:dislike Clicks the dislike button
youtube:subscribe Clicks the subscribe button
youtube:unsubscribe Clicks the unsubscribe button
youtube:sign_in Begins the signin process. You'll need to complete the process manually, but Persine will resume as soon as it notices you're logged in.

Repeating commands

If you'd like to repeat a command multiple times, you can append #[NUMBER] to it. For example, youtube:next_up#50 will watch the next fifty "next up" videos.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

persine-0.1.1.tar.gz (2.8 MB view hashes)

Uploaded Source

Built Distribution

persine-0.1.1-py3-none-any.whl (2.8 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page