Skip to content

What Data Does WordPress Send Back to the Mothership

In recent discussions about the undisclosed WordPress “phone home” data grab on the wp-hackers mailing list and in discussions elsewhere, two common threads are emerging. One group says they don't care what information WordPress and Automattic gather, the other group insists that they have the right to choose what information they are prepared to disclose about their own sites.

One thing appears to be common - most people don't know what data is sent back to the WordPress mothership.

Phone home is the slang term used to describe a software installation function which will use an Internet connection, if one is found, to send information back to the developer, usually without the users knowledge.
Webopedia

For technically-minded WordPress users there are three ways to find out what information is being phoned home. You can view the source code and run the queries to extract the data, intercept the HTTP data and study it, or use Dion Hulse's excellent Core Control plugin and read the output from Administrator ->Tools ->Core Control after enabling external HTTP access logging.

I ran my own tests in an attempt to discover the extent of the data capture. The results may surprise those of you who blindly trust Automattic to “do the right thing” in the absence of any disclosure about what is captured, why, how it is used and how long it is stored for.

Data Capture

Core Update Check

This check is performed every 12 hours. It sends:

  • your server IP address (or local IP address if you are running WordPress on your computer);
  • your blog URL;
  • the version of WordPress you are using;
  • your locale;
  • your PHP version;
  • your MySQL version.

Theme Update Check

Every 12 hours the following information is sent about the themes you have within the /wp-content/themes directory:

  • Your IP address;
  • Your blog URL;
  • The version of WordPress you are using;
  • Theme Name;
  • Theme Title;
  • Theme Version Number;
  • The full content of the theme description;
  • Author, including author URL;
  • Description;
  • Tags;
  • Full list of template file names for every template included in the theme;
  • Name of the stylesheet;
  • Name of the screenshot image file.
  • Identification of which theme is the currently active theme.

Plugin Update Check

Again, this is run every 12 hours automatically. It sends information about every plugin on your site, whether they are active or not. The information that is sent back to WordPress includes the following:

  • Your IP address;
  • Your blog URL;
  • The version of WordPress you are using;
  • Plugin Name;
  • Plugin Title;
  • Plugin URI;
  • Author(s) Names;
  • Author URI;
  • Directory name and name of the main plugin file (eg. akismet/akismet.php);
  • Plugin Version Number;
  • Everything contained within the plugin description field.

An example of the type of information that is sent is below:

s:7:"plugins";a:10:{s:19:"akismet/akismet.php";
a:9:{s:4:"Name";s:7:"Akismet";
s:5:"Title";s:7:"Akismet";
s:9:"PluginURI";s:19:"http://akismet.com/";
s:11:"Description";s:429:"Akismet checks your comments against the Akismet web service to see if they look like spam or not. 
You need a WordPress.com API key to use it. 
You can review the spam it catches under "Comments." To show off your Akismet stats just put 
<?php akismet_counter(); ?> in your template. 
See also: WP Stats plugin.";
s:6:"Author";s:14:"Matt Mullenweg";
s:9:"AuthorURI";s:13:"http://ma.tt/";
s:7:"Version";s:5:"2.2.6";
s:10:"TextDomain";s:0:"";
s:10:"DomainPath";s:0:"";

If your custom plugin contains staff names, email addresses or phone numbers, or information that is confidential to your site, it's too bad - WordPress gets all content from those fields, including that personal data.

It's only an educated guess, but I assume most people are unaware of the extent of the data capture. It is reasonable to assume that if you use a plugin or theme from wordpress.org then update checks would only check for updates on those. Did you know that every theme and plugin is captured - even those you write inhouse and which are private only to you?

What other information are you sending?

Where the data capture becomes even more dangerous is when you use popular plugins that are provided by Automattic. Automattic does not disclose its data retention policies nor does it undertake that it doesn't create user profiles from the mass of data it controls. None of its plugins on the wordpress.org repository disclose what data is sent back or link to a privacy policy. Let's look at some of the most commonly-used plugins.

Akismet is widely-used. While it should be obvious to everyone that it collects email addresses and IP addresses when it scans comments, there is no privacy policy and no indication of whether it stores this data or what other purposes it may be getting used for.

Since WordPress 2.5, Gravatars are built-in and require no additional plugins for basic usage and management. Most WordPress-based sites use them. Try finding a privacy policy for the names and email addresses stored by this Automattic service - there isn't one! Once your email address is added to the Gravatar service it's apparently theirs forever.

I’m sorry, we currently don’t delete accounts. If you remove all the images from your account, however, this simulates your account being deleted as far as the outside world is concerned.
http://en.gravatar.com/site/faq/

IntenseDebate is used as an alternative to the default commenting system. If you use IntenseDebate you need to be aware of the privacy implications. Read the http://intensedebate.com/tos - particularly the part titled, User Submissions.

WordPress.com Stats is another plugin of concern. People who download this through their WordPress backend or from wordpress.org are not given any information about how the data is retained, used, or whether it is matched with the other information Automattic collects.

Once it's running it'll begin collecting information about your pageviews, which posts and pages are the most popular, where your traffic is coming from, and what people click on when they leave. It'll also add a link to your dashboard which allows you to see all your stats on a single page.
http://wordpress.org/extend/plugins/stats/

I'm sure you can find many more examples of plugins that send data back to Automattic. If all this data is collected together under the identifying IP and blog URL combination the potential for misuse is enormous. However, the default install of WordPress alone rings alarm bells.

While some in the WordPress core team seem happy to dismiss privacy concerns nobody has justified the use of the blog URL in updates, nor have they justified pulling in extensive data from all themes and plugins. There is no justification for harvesting people's names and URI's from descriptions, or indeed for gathering information on themes and plugins that do not originate on the wordpress.org repository.

When I download open source software it's because I want to use that software. I almost always contribute back. But I expect that files I add to my server, and the names and contact details included in these, are private. Just as I expect that when I write a plugin for private use by someone else, my plugin details are private to the person or company who commissioned the work.

The WordPress update check is a valuable tool for notifying users of available updates. It runs perfectly well without the blog URL (which many people have been removing since 2007) and should be gathering only the minimum data needed in order to perform the check. Themes and plugins that are not on the repository should not be included in the check.

I hope, at least, that this post is useful to anyone considering using WordPress.

Related:

If you enjoyed this post, make sure you subscribe to my RSS feed!

Topic: WordPress
Tagged as: blogging, privacy, WordPress

Share on FriendFeed

{ 9 comments… read them below or jump to the comment form to add your thoughts }

  1. 1 John Kolbert December 14th, 2009 at 7:05 pm

    Excellent write up. I've used and freelanced with WordPress for years and although I knew it sent information back periodically, I was unaware that it was so extensive. Just because nothing damaging has appeared to occur with the collected data thus far doesn't mean that we shouldn't be cautious with it. I'm all for an opt-out feature built into the core.

  2. 2 Mark Jaquith December 14th, 2009 at 9:04 pm

    It is reasonable to assume that if you use a plugin or theme from wordpress.org then update checks would only check for updates on those. Did you know that every theme and plugin is captured - even those you write inhouse and which are private only to you?

    Your WordPress install can't determine which plugins or themes were downloaded from the WordPress.org repositories. It checks them all because the repository knows, given enough information, which plugins and themes are in the repositories. There is no unique, reliable identifier for a theme or a plugin, which is why the plugin/theme header information is sent — to assist in the identification process.

  3. 3 Andreas Nurbo December 14th, 2009 at 11:14 pm

    Your WordPress install can't determine which plugins or themes were downloaded from the WordPress.org repositories.

    Not yet. Adding a simple tag in the plugin description that the plugin does NOT reside with wordpress.org is easy to add and solves the problem with sending data not stored on wordpress.org to wordpress.org. In the future adding a seperate update check link would be awesome.

    There is no unique, reliable identifier for a theme or a plugin, which is why the plugin/theme header information is sent — to assist in the identification process.

    As far as I can tell there is. The folder name of the plugin. It seems to always correspond with the subversion folder. But I can be wrong and if you can change this I sure would like to know how.

  4. 4 Mark Jaquith December 15th, 2009 at 12:01 am

    I've been mentioning that this is feasible, but I hadn't seen anyone do it. As a gesture of good faith:

    If someone is writing a private plugin that they don't want to check WP.org for updates, this code will remove a plugin from the update array. Just namespace the function and the add_filter() call and you can just drop this in the private plugin.

  5. 5 Mark Jaquith December 15th, 2009 at 12:45 am

    As far as I can tell there is. The folder name of the plugin. It seems to always correspond with the subversion folder. But I can be wrong and if you can change this I sure would like to know how.

    You can change it. People do. People also install plugins in the root /plugins/ directory and not in a subdirectory at all. This is not a problem with plugins installed via the built-in plugin installer, which always get the correct folder name, but with hand-installed plugins or plugins that were automatically installed and were then fiddled with.

    For backwards compatibility reasons, I don't see this going away. We have to adapt to how people have been using their plugins.

  6. 6 Lynne Pope December 15th, 2009 at 3:37 am Lynne Pope

    @Mark Jaquith: I thank you for providing the code. Would you mind also providing the code for excluding themes from the update array? If you post this on your blog I am happy to link to it. Otherwise, if you give your permission, I am happy to post both examples here in a follow-up post.

    Much as its good to have plugins filtering out private data, this unfortunately doesn't help the average user.

    Removing plugins and themes from the update array is all well and good. However, this is essentially an opt-out and requires that people (many of whom are not PHP coders) know what data is collected and how to remove plugins and themes from the array. This is not practical - it would be far better for everything to be excluded and a new field added to the header for plugins and themes to opt-in.

  7. 7 Mark Jaquith December 15th, 2009 at 12:27 pm

    Would you mind also providing the code for excluding themes from the update array?

    Here is a post about how to exclude a plugin or a theme from update checks. Naturally, consider that content/code to be as liberally licensed as you like.

    Much as its good to have plugins filtering out private data, this unfortunately doesn't help the average user.

    The average user doesn't have custom plugins. The average user is running plugins from the repository with zero customizations.

    This is not practical - it would be far better for everything to be excluded and a new field added to the header for plugins and themes to opt-in.

    We've considered it, but dismissed it as impractical. Getting plugin authors to do anything in a standardized fashion has proven to be near-impossible. We've found that working with what they provide is the more pragmatic solution. Again, private plugins with private header data are the exception!

  8. 8 dmtrs (dmtrs) December 15th, 2009 at 7:00 pm

    What Data Does WordPress Send Back to the Mothership: http://lynnepope.net/data-wordpress-sends

  9. 9 hakre August 30th, 2010 at 5:51 am

    I was not aware of the dimension, your absolute right with your educated guess.

{ 1 trackback }

  1. uberVU - social comments December 14th, 2009

Leave a Comment

You can use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Any comments that look like spam will be treated as spam - this includes SEO titles and use of spurious keywords.

By submitting a comment here you grant this site a perpetual license to reproduce your words and name/web site in attribution.