What Data is Collected | How is the Data Used
WordPress core outputs a user-agent that discloses identifying information about the site. The user-agent string outputs: "WordPress/2.8; http://www.example.com" where 2.8 is the WordPress version number and www.example.com is the blog URL. This means that every time your site queries wordpress.org for information about version updates, plugin updates, and dashboard feeds your site URL is being identified to WordPress. And that's not all.
This is not new behaviour. Starting with WordPress 2.3 back in 2007, wordpress.org has collected your blog URL, IP address, PHP version, MySQL version, WordPress version, localisation information and full details of the plugins your site uses. This caused controversy at the time, not helped by Matt Mullenweg's responses about the possible uses for such data:
URLs are useful unique identifiers and in my opinion the best one to use on the web. You can normalize them, organize them by domains and subdomains, look for odd characters or paths, create stats by TLDs, map them to hosting providers, use them as a basis for a crawl, and associate them with WordPress.org profiles. MD5s are unique, but don't have a lot of value beyond that, and even a capitalization or trailing slash change will change the whole MD5. There are also things I think we haven't imagined yet that could make URLs useful. Maybe a .org toolbar that ties into your .org profile and makes it easy to manage multiple blogs and tie them together. If by the time 2.5 comes around we're still not doing anything useful with it then we can re-examine it.
WP-hackers
Let's look at what is transmitted when your blog checks if plugin updates are available, which it does every 12 hours:
- Your IP Address
- Your blog URL
- WordPress version number
- PHP Version
- MySQL Version
- Locale setting (if there is one)
- List of all plugins, both active and inactive: Title, Description, Author, Author URL and Version. Including any names and URL's that are contained in these.
The data collection has since been extended to also collect all information about themes, both used and inactive. All three dashboard feeds transmit your blog URL, as do updates to Ping-O-Matic. This information is sent to wordpress.org servers from all WordPress installs that have access to the Internet. If you have a local test development site on your computer any time you load the plugins or theme pages on your computer, if WordPress has not checked for updates within 12 hours, your IP address is disclosed to WordPress.
This information is sent to wordpress.org regardless of whether your install is on your local PC, is a private Intranet, and regardless of the privacy settings on the Privacy page within WordPress administration.
When any software collects information there needs to be (a) user awareness that this is happening, what is collected, why, and (b) a statement about how this data is protected and used. Neither of these are available with WordPress. In many jurisdictions, IP addresses and site URI's are classified as personal information and law proscribes the storing of such information without explicit consent. WordPress does not provide a way in which to opt out of this data collection while still retaining update checks.
Three months after the 2007 requests for this data collection to be anonymised, Matt closed the ticket, saying, “let's just close it until there are compelling new arguments”. The issue was not reviewed as promised and an attempt to get it revisited this month met with comments about “paranoia” and “tin hats” and ultimately went nowhere, Matt saying, “All in all though, not a high priority. I've never met anyone in person who disables update checks”. That makes it alright then!
How to Stop your WordPress Site from Sending Personally-Identifiable Information to WordPress
Blocking Through Configuration | Blocking with Plugins | Blocking with Filters
Configuration
Following concern expressed on the wp-hackers mailing list in February about the data that is sent as soon as WordPress is installed, before plugins are activated, new Constants were added for wp-config.php, the WordPress configuration file.
Add the following to wp-config.php to block requests to localhost and your blog. Do this before you install WordPress. Adding it anytime after WordPress is first installed will only block data transmission from that time on.
define('WP_HTTP_BLOCK_EXTERNAL', true); // block external requests
NOTE: Some plugins require external access. If you get errors due to blocking external requests you will need to either remove the above Constant from wp-config.php or add the external host to a whitelist using this:
define('WP_ACCESSIBLE_HOSTS', 'example.com'); // whitelist hosts, comma separated
Plugins
There are three reliable plugins available that disable the update functions. If you use these no data is sent but you must then manually monitor updates to plugins and themes, and announcements of new WordPress releases.
For developers or high-level users there is also the Core Control plugin. This plugin can log outgoing HTTP requests (and more) as well as disabling any or all of core, theme and plugin update checks.
There are currently no (that I know of) plugins that anonymise the data or which provide options for which data you are prepared to send. Plugins that were written for WordPress 2.3, such as the “Tin Hat” plugin and the “Anonymous WordPress Plugin Updates” plugin have not been updated for two years. Both rely on replacing core files. It's not a good idea to replace key core files with old, modified WP 2.3 versions so steer clear of these plugins.
Filter
If you wish to simply filter out the blog URL from the data that is sent to wordpress.org and you are not using the WP_HTTP_BLOCK_EXTERNAL constant in your wp-config.php you can use this:
function privacy_remove_url($default)
{
global $wp_version;
return 'WordPress/'.$wp_version;
}
add_filter('http_headers_useragent', 'privacy_remove_url');
This should really be in a plugin, activated through the admin interface, but you can put it directly into your theme's functions.php file. Note that this does not remove all instances of your blog URL from data sent back to WordPress.
WordPress Privacy - Summary
WordPress collects personally-identifying information from users of its blogging software with no notice and no permission. Core team members have confirmed that much of this data is not necessary in order to provide information on whether updates are available. Matt Mullenweg indicated some ways in which this data may be used in the future but nobody has advised how it is used today. Automattic, Inc. a private company headed up by Matt Mullenweg, controls the servers and, ultimately, the data. In combination with the data collected from Automattic's and WordPress' plugins and services, such as Akismet, IntenseDebate, PollDaddy, Gravatar, Ping-O-Matic, and user data from wordpress.org forums, Codex and Trac, along with the WordPress.com Stats plugin that tracks all site traffic and referrers, there is a huge concentration of personal information stored on US servers in the control of this one company.
WordPress.org which obstensibly collects the data is not a legal entity.
There is a privacy statement at http://wordpress.org/about/privacy/
The key part of this statement as it relates to users of the open source WordPress is:
Protection of Certain Personally-Identifying Information
WordPress.org discloses potentially personally-identifying and personally-identifying information only to those of its employees, contractors, and affiliated organizations that (i) need to know that information in order to process it on WordPress.org's behalf or to provide services available at WordPress.org's websites, and (ii) that have agreed not to disclose it to others. Some of those employees, contractors and affiliated organizations may be located outside of your home country; by using WordPress.org's websites, you consent to the transfer of such information to them.
Personal data is a valuable commodity. WordPress has a history of spying on its users and seems just as determined as ever to ignore privacy concerns.
Were you aware that this data was being gathered from your server? Do you block the update checks due to privacy concerns? Please share your thoughts in the comments below.
If you enjoyed this post, make sure you subscribe to my RSS feed!














{ 18 comments… read them below or jump to the comment form to add your thoughts }
Thanks for bringing all this information together. I really am baffled by the "Just get over it" crowd. I can't imagine more than a few percent of WordPress.org users understand just what they're exchanging for the automatic update notifications they receive.
I had a basic idea before this week, and understand how some of the reported data might be helpful, but am mostly concerned about the lack of respect for those that don't feel comfortable with the steady flow of detailed system information to parts unknown.
@Jeff: Matt's response to these concerns is here: http://j.mp/8PpRqL. It's not very reassuring.
@themelab http://lynnepope.net/wordpress-privacy
This is incredibly hyperbolic. It was a hotlinked image. One hotlinked image on a server with no logging does not equal "a history of spying."
My response to this whole issue: Firefox. Did you know that Firefox sends to Mozilla your IP address, a unique identifier for your Firefox install, and a list of all of your Add-ons? Did you know that there is no built-in way to get update notifications without sending this personally identifying information?
If you would like to know how WordPress treats the technical data your blog sends, please read the privacy policy on WordPress.org.
If you have an issue with sending this incredibly boring information to WordPress, the trio of plugins you mentioned should do the trick — or you could shut off external HTTP access altogether. If this is an important issue to you, I'd honestly like to know whether you've turned off update notifications in Firefox. If not, why? Why do you consider WordPress to be an unworthy recipient of IP/plugin data, but not Mozilla? That's a serious question. Is there something about WordPress that makes you distrustful?
@Mark Jaquith: With Firefox everything is disclosed. See: http://www.mozilla.com/en-US/l.....ox-en.html Including information about how to opt-out. WordPress does not disclose details of all the data it collects. The WordPress privacy statement is both inadequate and misleading.
Firefox is downloaded from Mozilla's site which means its users visit its site. WordPress may be installed from any number of auto-install scripts and is provided bundled with several other apps. This means users may never visit the official site. It is not unreasonable to assume that most users of open source software would never consider whether the software captured and transmitted personal data without notice and without permission. And without ever seeing the privacy statement.
Mozilla is also a legal entity that is legally accountable for how it handles the data it collects. wordpress.org is not. Mozilla gathers its data through a secure server. WordPress transmits across HTTP, unsecured and vulnerable to snooping.
The issue I have is that WordPress collects far more data than is needed for the update check, does not inform users at the time of installation what data is transmitted back to servers under the control of Automattic, does not have a published policy on data collection and retention, and has no way to opt-out of this while still retaining the ability to perform update checks. Because this information is lacking and opt-in is not provided, WordPress breaches the law in many countries. This can potentially be very dangerous for anyone who recommends WordPress to others.
It's not a matter of whether WordPress.org can be trusted. It's a matter of having control of your own personal information and being able to make an informed decision about what parts of it you wish to divulge.
It's hard for a non-tech person to figure out if this is an issue or not. What exactly is the risk? Or is it a philosophical issue about transparency and personal boundaries? My sense is that Windows Update, McAfee (and probaby all antivirus software), Dell Support and probably lots of others are doing at least this much data gathering.
Further, if you had your own hosted web site, wouldn't your host have all this information and more? So if we are willing to use WP servers, software, themes, etc., isn't it reasonable to expect them to know something about us? Or am I missing something? Thanks.
@Jim Hagen: Most desktop software applications come with a EULA and disclose their privacy practices. Those that don't are usually called spyware. Privacy isn't a philosophical issue. It's about having control over what data about yourself you are willing to disclose to others. The Electronic Frontier Foundation recently posted an article that may explain concerns a bit better: http://www.eff.org/deeplinks/2.....es-privacy.
In answer to your last question - no, web hosts do not have the information that is gathered by WordPress.
Worst case scenario I an think of is that an evil third party gets ahold of the data, and uses their knowledge of which plugins you have installed to compromise one with security vulnerabilities. But in practice, this isn't how vulnerabilities are exploited. It's much easier to just write a dumb crawler that attempts the specific exploit without any knowledge of whether the site is running that plugin, or even if it is running WordPress. My server gets hits with exploit attempts for all kinds of software I'm not running.
Another scenario Lynne suggested is that you have a customized plugin with a description that contains private info (like a developer's phone number). If that info were stolen, that would be a disclosure of private, personal information. Plugins that contain such private info in their description should probably just remove themselves from the plugin update array (as WP.org won't know about it and won't be able to provide update information anyway). It's a moot issue for publicly available plugins, like the ones in the repository, as all their code deficiencies are visible to everyone through the public repository.
Sure they do. Many web hosts will unilaterally deactivate WordPress plugins that are causing performance or security concerns. All of your data is viewable to them.
@Mark Jaquith: Yes, you are right - some web hosts, primarily where your WordPress install is on a shared, managed server may view the files on your server. They do not gather data from these files and use this for their own purposes however.
If your WordPress install is on a VPS or dedicated server hosting companies don't necessarily have any access.
Apart from the fact that I don't believe anyone has any right to collect and store personal data without informed consent, it is the totality of the data capture that concerns me most. Automattic owns IntenseDebate, which includes this in their TOS:
They own Gravatar, which stores user images and email addresses but has no privacy policy or terms of service. (Why anyone just hands over email addresses is beyond me. I did though! LOL)
Then there's Akismet. Akismet doesn't say what it collects and stores. Not so long ago it was taking the full server environment details.
VideoPress requires a WordPress.com account so use of this is tied to names and email addresses. It's reasonable to assume it also collects IP addresses.
So, worst case scenario is that Automattic collates all information it stores to build a profile of individual users. This profile could potentially contain more information than any other individual company has about that user. Automattic is a company that needs to satisfy the profit requirements of investors and stakeholders and we are expected to simply trust that they will do the right thing? Personal data is a valuable commodity. The personal data of millions of users is potentially worth multiple millions of dollars.
On the other hand, the US Government could just ask for the whole lot to be handed over, as is their right under the Patriot Act.
Thanks to @elpie for pointing those out - @elpie - http://lynnepope.net/wordpress-privacy
That has no bearing on the fate of WordPress..org data. Additionally, the WordPress.org data collected via the API isn't personal so much as server- and install-specific. No user account data is transmitted. A subpoena for data about an individual couldn't "resolve" to the api.wordpress.org data because it is data specific to a WordPress install, not an individual person.
Thanks for the additional info. BTW, if you are worried about WordPress, what is your take on Facebook?
@Mark Jaquith: It's relevant as long as Automattic has access to the data. None of it would be a concern if the following three things happened - WordPress only collected the minimum data necessary for update checks, an undertaking was given about what it collected, why, how it was used, and for how long it was stored, and, most importantly, this was not collected unless users opted-in.
The data sent is personal and by no means anoymous. Might have to take that to another post to avoid cluttering these comments
@Jim Hagen: "Worried about WordPress" isn't quite the term I'd use. I know what is collected and how to prevent sending anything I don't want to hand over
I am worried that most users of WordPress don't understand what data is collected and the implications for privacy.
As a strong advocate for open source software I feel that what WordPress is doing can potentially harm all FLOSS projects. If people have to inspect code to see if it phones home then there can be no confidence in the safety of the FLOSS apps people download. One of the arguments for using FLOSS is that many proprietary apps gather data about their users. FLOSS is supposed to be free of this.
I can't compare Facebook's privacy to wordpress.com (both of these are services) however WordPress stand-alone, which is what this discussion is about, is potentially worse than Facebook because it invades privacy without informing users that data is sent back to the mothership and without asking permission to do so.
Access is different than ownership or licensing.
Privacy Policy:
That line in the privacy policy precludes the building up of a site--user hybrid profile using data from Automattic properties and WordPress.org. There has to be (and is) a church--state separation of WordPress.org data and Automattic data.
We're pretty close to bare bones. Blog URL isn't necessary for update checks, but as I've elucidated elsewhere, it is preferable to a one-way hashed ID as it is verifiable --- useful if we start to report on aggregate plugin/theme usage and want to avoid fraudulent popularity boosting by plugin/theme authors. The MySQL/PHP versions aren't necessary now, but may be later as WP changes its requirements and offers different upgrades for different server configurations (say, a PHP 4 compatible legacy branch, or a MySQL 4 compatible legacy branch).
The plugin/theme information is necessary for update notification for those components, as is your current WordPress version for core update notification. Plugins and themes that aren't in the repository and want to keep their existence a secret can remove themselves from the update array.
Honestly I think the objection based on the data that is sent is incredibly weak. A better objection is this:
That's a fair point. There is no link to the privacy policy from within the WordPress software, and there is no notification that update checks will take place. I'd support the addition of an install-time "I understand that WordPress periodically checks for updates and that the data that is sent as part of these update checks is subject to the WordPress.org privacy policy" form checkbox.
@Mark Jaquith: I guess we need to agree to disagree that the objection based on the data being sent is weak. Or that it is close to bare-bones. I've got a new post up that talks about this "bare bones" data.
The average user doesn't know about data being sent back, let alone how to remove plugins and themes from the update array.
Disclosure, plus options for opting-in, or removing themes and plugins from the array are trivial. With the greatest of respect to you personally, I can't understand why there is such vehement resistance to this from the core team.
I'm in favor of disclosure, as I stated in my last comment. To my knowledge I've not ever resisted that, vehemently or otherwise.
Opting in? As in, you have to opt-in before you get update notifications? That would be a huge step backwards for WordPress. We're trying to make it easier to keep your blog secure and up-to-date. Requiring people to seek out the upgrade notifications feature would be tremendously harmful to that goal. Again, looking to Firefox, they don't require that you opt in.
Themes/plugins can do this. As I've explained elsewhere, WordPress the software doesn't know whether your theme or plugin was downloaded from the WP.org repositories. The repository, given the theme/plugin header meta information, can figure it out. YOU, as a plugin/theme creator have the information, and WordPress.org has the information. One of you has to tell WordPress the software. If you don't, then WordPress.org has to, and it can't do it without knowing the full list of plugins/themes you have installed.
@Mark Jaquith: Opting in to the sending of any personally-identifying information. Update checks should be mandatory however users should have control over what is sent with those checks.
I understand the desire to satisfy all users and plugin developers but this is currently at the expense of personal privacy. If WordPress were to set criteria for including plugins in the update check the onus would be on the plugin developers to comply. A notice could be sent through dashboard feeds, and returned with update checks to say that some plugins may not be displaying updates, with a link back to an announcement. If a decision was made quickly a notice could go out in 2.9.
{ 2 trackbacks }
Leave a Comment