Setting up and using memcached & memcache on Linux CentOS 5/Plesk 9

The second part of my tutorial on optimizing your server for hosting high traffic websites: installing and configuring memcached and the memcache php extension for your server. This is a little easier than the first step (setting up nginx as reverse proxy, see article below) and can be applied to any kind of dynamic website.


What's the point ?
On highly dynamic websites such as forums, news sites or any user content based website, the database server load is often very high. The more traffic you get, the more cluttered your database server becomes, sometimes rendering your website completely unavailable to visitors. Using a data caching daemon will allow you to save some data in memory instead of fetching the data from the database every time. You should know that memcached is used by major websites such as Wikipedia, SourceForge, SlashDot... need I say more?

What is memcached ?
Memcached is the daemon running on your server. Its usage is extremely simple, there are no configuration files, all you do is start the daemon on a given port, and your websites will connect to this daemon to store data in memory. Yes, the data is stored in your RAM. So when starting memcached you'll have to decide how much RAM memcached will be allowed to use. If you start memcached with a 1GB memory space, memcached will store this much data; when the cache is full some of the older data will begin to disappear from the cache.

What is memcache ?
Memcache, in our case, is the PHP extension that will allow us to connect to and make use of Memcached. This PHP extension is not part of the default ones so you'll have to download and install it (see step 2). It provides classes as well as functions that I must admit are very easy to use and understandable. In this article, I provide a mysql+memcache wrapper class for anyone to use.

What is the difference between memcached and memcache ?
Well if you've read the two points above, you should already know. In short, memcached is the daemon running on your machine; memcache is the PHP extension allowing you to make use of memcached.


1. Setting up memcached

I haven't found memcached in my repositories (might aswell try # yum install memcached just in case?) so I'll download the source and compile it. First go get the latest version from the official website.
# wget http://memcached.googlecode.com/files/memcached-1.4.1.tar.gz
# tar zxvf memcached-1.4.1.tar.gz
# cd memcached-1.4.1
# ./configure
If like me you get this message "libevent is missing" or something, you can run this command:
# yum install libevent-devel
And then run configure again:
# ./configure
Install memcached:
# make install
That's it, you're set! That was pretty easy wasn't it? We'll now see the command line arguments for starting memcached:
# memcached -d -m 1024 -l 127.0.0.1 -p 11211 -u nobody
The arguments are:
-d : start as daemon, running in the background
-m 1024: allow memcached to use up to 1024 MB of RAM (1GB)
-l 127.0.0.1: listen on local interface
-p 11211: listen on port 11211
-u nobody: run as user "nobody"

If you're not sure how much memory you should allocate to memcached, try running this command first:
# free
It will tell you how much free RAM you've got left.
Note that upon starting memcached, if all is OK, you will see no output message. To see if memcached is correctly started, run this command:
# ps aux | grep memcached
You should be seeing something like this:
nobody   13133  0.0  0.0  43580   732 ?        Ssl  07:11   0:00 memcached -d -m 128 -l 127.0.0.1 -p 11211 -u nobody
user     13143  0.0  0.0   4152   648 pts/0    R+   07:11   0:00 grep memcached



2. Setting up memcache PHP extension
The memcache PHP extension should be found in the classic repositories, so try this command:
# yum install php-pecl-memcache
If you're lucky (why should you be unlucky anyway?) the install will work fine and you'll be seeing these messages:
Installed: php-pecl-memcache.i386 0:2.2.3-1.el5_2
Dependency Installed: php-pear.noarch 1:1.4.9-4.el5.1
Complete!
Just for reference, here's a link to the official memcache website, if you need to grab the sources.

 Let's see if memcache was installed properly. First restart the httpd:
 # service httpd restart
Then place a simple php file on your website containing the following code:
phpinfo(); 
Open the PHP file in your browser (eg. http://mydomain.com/phpinfo.php ) and have a look at the output. If you can find a "memcache" section looking like the following picture, it means memcache was successfully installed.

We will now have a look at the memcache configuration file. First locate your php module configuration files folder, in my case /etc/php.d/ . You should find the newly installed "memcache.ini" configuration file. Open it up to see a list of configuration keys and their meaning.
The default options are just fine, but if you're interested, you should know that memcache offers load-balancing features through the "allow_failover" configuration key. I'm not going to make use of this feature so I will not be editing any of the settings.


3. Using memcache in your code
Unfortunately, installing both components isn't enough. You'll have to edit your code in order to make use of the caching features. Be reassured though, it couldn't be easier! There are a couple of functions you'll need to use, nothing complex.
If you want to find out the complete listing of the memcache php functions, visit the official website. Basically we'll be using 5 methods:
- Memcache::connect($host, $port, $timeout): connect to your daemon
- Memcache::get($key) : fetch data from your cache
- Memcache::set($key, $var, $flag, $expire): store data in your cache
- Memcache::delete($key): remove data from your cache
- Memcache::close(): disconnect.

You can cache any data that you want:
$mc = new Memcache;
$mc->connect("localhost", 11211);
$saved_data = $mc->get("saved_data");
if (!$saved_data) {
  $saved_data = file_get_contents("myfile.txt");
  $mc->set("saved_data", $saved_data, MEMCACHE_COMPRESSED, 60*60*24*7); // store for 7 days
}
echo $saved_data;
$mc->close();

Applied to MySQL queries:

$mc = new Memcache;
$mc->connect("localhost", 11211);
$news_articles = $mc->get("news_articles");
if (!$news_articles) {
  $news_articles = array();
  $query = "SELECT * FROM news_articles ORDER BY article_id DESC LIMIT 0,10";
  $result = mysql_query($query);
  while($row = mysql_fetch_assoc($result)) $news_articles[] = $row;
  $mc->set("news_articles", serialize($news_articles), MEMCACHE_COMPRESSED, 60*60*24*7); // store for 7 days, but don't forget to rebuild the cache when a new article is posted!
} else {
  $news_articles = unserialize($news_articles);
}
// Display articles..
$mc->close();
As you can see in the example above, I use the "serialize" and "unserialize" php functions. Why is that? The reason is because the Memcache::get() function always returns a string. So if you want to store an array of data (or an object), you'll have to serialize said array, and unserialize it after having read it from the database.
If you know a better workaround for this problem please feel free to leave a comment.


4. Wrapper class for memcache & mysql
I have just written a simple wrapper class for MySQL, making use of the powerful caching system offered by memcache. You can download the class here, I included a simple example for testing the class.
The principle is very simple: when executing a query, the script will check if the query result is already in the cache. If the data is in the cache, it is returned immediatly (no query executed). if the data is not in the cache, the query is executed, and the results are then placed in the cache with the specified "time to live".

Here are the wrapper class functions:
function MySQLMemcache($mysql_info, $memcache_info, $autoconnect=true, $enable_logging=true);
function connect();
function disconnect();
function dataQuery($query, $usecache=true, $ttl=0);
function nonDataQuery($query);
function fieldDataQuery($query, $field, $usecache=true, $ttl=0);

More documentation is provided inside the actual php file.

Thank you for reading, feel free to leave a comment if this article has been helpful to you.
I'm finished with my server optimization thematic.
TTFN!

Comments

Mohit Gupta said…
hello frnd. nice work done by u on ur blog. hay i wanna link exchange with u if u r interested.mine blog address is www.hakingtips.blogspot.com
waiting for ur reply.mail me at mohit.gupta700@gmail.com
Anonymous said…
Hello man,

Nice article. But you don't care about the cache invalidation. What's your opinion on this subject? How do you invalidate a cached query when an update is done on the table concerned in DB ?

Sorry if my english is very bad, i'm not an american guy.

Thanks!!
Hello,
cache invalidation is handled in the source code of your website anyway:

1) you can set an expiry time, for example for statistics that don't need to be refreshed too often

2) when you make an edit on the data, simply delete the cache reference.
If the data is no longer present in the cache, it will be fetched from the source (and then placed in the cache again).

See what I mean?
Anonymous said…
I see what you mean but let me give you a little example. I cache this queries : "SELECT price FROM article WHERE article_id=9" , "SELECT name FROM article WHERE article_id=9". So i put them in the cache, and a few hours later I do "UPDATE article SET price=5400". How can I invalidate the frist SELECT query? There are too many cached queries about article (for each id, whith JOIN, etc...) and i don't know how to seek in the cache this query in particular.

I hope i explain as well as possible the problem.

Thanks!
My answer is the same: when you run the update, delete the cache entry.

Retrieving the data:
--------------------

$id = 59;
$article = $memcache->get("art$id");
if (!$article) {
$resource = mysql_query("SELECT * FROM articles WHERE id=$id");
$article = mysql_fetch_array($resource);
} else {
$article = unserialize($article);
}


Updating the data:
------------------
$id = 59;
mysql_query("UPDATE articles SET price=500 WHERE id=$id");
$memcache->delete("art$id");


The cache entry is deleted. Read the "retrieving the data" section again to see that it actually works. I use this mechanism on 2 of my websites now.
Anonymous said…
I'm sorry i don't explain the problem as well as I was thinking.

Let's go:

q1 = SELECT price FROM article where article_id=9;

q2 = SELECT price FROM article;

q3 = SELECT price FROM article a JOIN references r ON r.article_id=a.article_id

I put them into the cache.

After I do:
UPDATE article SET price=5400 WHERE article_id=9;

So this update changes the result of the three queries and not only q1!

We can have so many queries that have to be invalidated after only one update.

You see what I mean?
Perhaps you do not fully understand how caching works?
You do not cache queries, you cache data extracted from the queries.

In your example, simply cache all 3 query results with different keys and delete all 3 of them when you update the data.
Anonymous said…
3 queries is only an example! In practice you can have 100 or 200 queries cached to be invalidated after an UPDATE or DELETE in the DB. How can we record the list of queries that should be invalidated after an UPDATE?

For example, I am working on a website whith 400 tables and about 15000 unique queries...

You see where is the problem ?
The problem I understand is that you need to rewrite a lot of code if you want to fully make use of the caching possibilities, hehe.
What you can do is make a data access layer that directly uses caching, find some sort of way to record the cached queries and delete part of the cache when the data is updated. It all comes down to your code.
Anonymous said…
Thank you for your ansewers.

At the time of lunch, i've implemented a PHP class object that represents a Journal LOG. Each one of my tables has its owns Journal LOG and I put the md5(query of the table) in this Journal LOG. I serialize and put each Journal LOG in the cache and i use it each time i put a query in the cache. Then when i do an update on a table i get the Journal LOG of this table from the cache and extract the md5(queries) and put them out of memcache.

What is your feeling?
Hi there colleagues, its fantastic post about tutoringand completely explained, keep it up all the time.

Popular posts from this blog

Nginx error 413: Request entity too large Quick Fix

Dealing with Nginx 400 Bad Request HTTP errors

Affiliate module for Interspire Shopping Cart