Security in PHP-MySQL systems
This article tries to explain at a basic level the many vulnerabilities that a PHP-MySQL application - or any server-side application in general - faces when the programmer/s do not dedicate enough care to or lack enough knowledge about good security practices. My career with PHP is really short but still so I learned some good points that I want to share in this article. I have applied all of these techniques to my latest release - as of September 2015 - which is the mobile social website VorKuta. Many security holes exist in applications of such complexity and the programmer should be attentive to every detail regarding this often neglected aspect.
Storing passwords as plain text in a database is not only unsafe, but unethical. I say unethical because not only it exists the possibility that some hacker could access by any means this particularly sensitive information, but also the very administrator/s could be tempted to login into the accounts of their users (this would be like having the fox guarding the henhouse). This is why passwords must be always encrypted with one-way encryption codes, which should ensure that in the event of a database intrusion the passwords could not be decrypted. The PHP function crypt() is useful for achieving this. I also used this method to encrypt the numbers of my CAPTCHA system (I will tell about this further on).
SQL statements are very vulnerable. Malicious - and cunning - users know that if an adequate filter is not placed in the data input, they can pass to the application pieces of SQL code that will be treated ad executable code instead of inert information. This means that the attacker could perform operations like deleting tables from the database or gaining access to a certain user account without the need of knowing the corresponding password. The best method for achieving security when the data input will be included in a SQL query is the one called of "prepared statements". I myself never used it as I learnt PHP from a basic book. I instead went on with the PHP function mysqli_real_escape_string(), which acts as a filter that escapes any character that could be dangerous on a SQL statement, interpreting it as harmless text and not as executable code.
A typical example of how bad is to leave open doors to SQL injection is the one of simplistic, unprofessional login forms. These forms rely in a simple SQL query to check a username and its corresponding password, retrieving a positive result only if both match. The problem is that by altering the SQL query by means of injected code the result can be retrieved without a password match. Of course we should use a filter here to avoid this, but the most proper solution is to not be so simplistic and change the way we check the passwords. Instead of checking both username and password with a simple SQL query, we would use the SQL query only to check if the username requested exists in the database. If a positive result is returned, then we would pass the relevant information of that username (the password in this case) to a PHP variable towards which we would perform the password check. Since we are using PHP statements instead of vulnerable SQL statements to check the passwords, the attacker has no means to deceive the program by altering its execution code.
Generally the information is sanitized when it is printed or displayed into the webpages, but the filtering could be applied as well during input, albeit depending on the requirements of the data to be stored this might be inadequate. Sometimes we would like to print HTML code generated by the application along with the text submitted by the users; for such cases we would have to apply a strict validation (concept which comes in the next paragraphs) to prevent the users from submitting their own HTML codes.
There is a world beyond prefabricated filtering functions to control what a user can introduce into our system. As for now, I have briefly exposed the havoc that a minority of skilled users could deal to our system. But in the other side are the masses of average users, whom even if technically unskilled, are often problematic because of their large number. Security is not only about "they can break my system" but also about "they can fill my system with garbage". We should think about our application as a castle under siege; it simply should resist as many attacks or offenses as possible. Data validation is different from the previously mentioned filters in that it is "manually" designed by the programmer/s; it is he/she who has to decide which validation routines the application should have. In a PHP application validation is effectuated when the form is submitted by the user. A good validation system can be tiresome to program, but it will save many headaches to the future administrator/s. It has multiple benefits such as saving space in the database or file system, avoiding large amounts of spam or preventing the submission of abusive content.
For another example, imagine that a certain programmer created a submission form for a certain website, but he/she neglected to add validation because it is so boring to write. Now imagine that some jerk access the form and fill all the fields with garbage like letters repeated many, many times... without any validation, this "data" would enter the system. Now imagine another jerk - or the same one - who with the sole purpose of being and annoying cretin submits immensely large sections of text into the system, occupying valuable space in the database. And imagine now a spam bot, which can automatically load multiple times a form sending spam multiple times to the system. Without any validation, all of this would go inside, polluting the system. Many data fields - if not most - have some kind of format that differentiate them from the "free text" that we would write in a textarea. A ZIP code, a phone number, an e-mail address or a Youtube link, for example, have clear format patterns. By using PHP functions like preg_match() or strlen(), we can check if the user has submitted something that looks like what we allow into our application.
Of course, the data could be false, but we have achieved several benefits: the jerk who likes to input garbage will have to adapt to the imposed formats, so the easiness and "fun" is gone; the jerk who tries to take up our valuable space in the database will find strict lengths accepted for the data; the spam bot will be unable to flood our system because it is not likely programmed to recognize the complex validation patterns that we can add to our system; and that user who tries to publish in our system a video from a pornographic website will see that it is rejected because it does not match with the format of a Youtube link. And all of this means peace for the administrator/s. Regex filters can also be used to strictly filter the range of characters than a user can submit to the system, hence preventing code injection; however very secure, this is a very strict approach that is likely to piss those users who like to express themselves via fancy characters. It is time to learn that security and user experience are often proportionally opposite.
This one is deemed as the most annoying security instrument in a website, but precisely because of this it is incredibly effective stopping spam. Bots are largely unable to bypass a good CAPTCHA implementation, hence automated spam will unlikely bother us. Certainly, many spammers use their own fingers to send spam, but even in this scenario the CAPTCHA is useful, because the annoyance that it represents will put their patience into test. We have to understand that spam is not only those unsolicited commercial messages; spam are also all of those messages whose value is meaningless, which can be even easier to type than the CAPTCHA field itself. In this case, CAPTCHA helps as well to reduce the amount of this garbage. As you can see, security in a social website can face moments of insanity. CAPTCHA systems are criticized due to the poor user experience that they create and many designers have opted to replace them by the so called "honeypots", hidden input fields that humans cannot see and hence will not fill, but which spam bots would likely fill (for they are deemed as "ambitious" and willing to fill every text field), triggering so the "alarm". The problem with "honeypots" is that while they do not disturb the user, their security is not so good, since a skilled spammer would just have to look at the source code of the webpage to check if a "honeypot" exists in the form, and in such case it would be easy for him/her to instruct the spam bot to ignore the trap.
This aspect probably looks irrelevant, but I like to digress about it. As you have probably noticed, large amounts of websites use prefabricated solutions for their websites. Not many love to program and nearly nobody would bother coding his/her own CAPTCHA system; they prefer to install in their websites implementations like ReCAPTCHA, which is for sure great. The problem is that famous, widely distributed systems are the prime target for hackers; if these manage to break them, a large amount of websites will be exposed. I programmed my own CAPTCHA because I love to create my own things and to fit them seamlessly with my designs; I did not want a prefabricated element that would look like a sticker in my pages. But of course, beyond that, I had to achieve a functional, secure system and this was not easy; furthermore I was discouraged by others from even attempting it and use ReCAPTCHA instead. I finally came to my version in which numbers are stored as files with randomly encrypted filenames; since the encryptation works only in one way, a bot should be unable to guess them. Also, since only my website uses this system, it is unlikely that anyone would bother to program a bot specially made against it.
In the design of an effective login form we can see again the antagonism between security and user-friendliness. The only treath left for a well designed login form could be brute force attacks. These automatized attacks are managed by a program which attempts countless passwords in the password field of the form, reading the error message after each attempt, pretty much the same that a human would do, but this time at an astonishing speed, of course. To start the attack, the hacker will feed some data into the program, such as the names of the input fields and the error message reported. If the hacker is lucky enough and the program introduces the right password, the absence of an error message will indicate that the program introduced the correct password, so the attack will be stopped and the password shown to the hacker. Beating this kind of attack is simple enough, by adding a CAPTCHA in the form and unifying the error message for the three input fields: username, password and CAPTCHA. If the same error message is displayed regardless of which one of the input fields has an incorrect value, the program is unable to detect whether it sent a correct password or not, because it is unable to correctly fill the CAPTCHA field.
We can consider this security system as practically perfect, but it may bring a bad user experience, since the user is not informed about which fields have the error. I designed my login forms like this and kept this method until arriving to the VorKuta project, in whose login form I separated the CAPTCHA error from the other errors. This improves user experience but compromises security to a certain level. However security is still extremely high, because I have always enforced in my login forms passwords which use letters and numbers. Brute force programs use a dictionary as source for the passwords that they try (hence brute force attacks are also referred as "dictionary attacks"); this may have certain effectiveness against passwords composed only by letters, but they become practically useless with the adoption of passwords that use a mixture of letters and numbers, for the number of possible combinations is practically infinite.
From the standpoint of an administrator, it is a problem the widespread existence of proxies and variable IPs in general, but still so they are many those who are attached to a static IP or a reduced number of them. Hence, a simple check of the IP address of the registrant user will always be a good thing, helping to reduce the number of users registered with multiple accounts. Duplicated accounts create stress in the system and also in the administrator/s if they are used to abuse the policies of the website. Only one account per IP address is always a good thing for whom has to deal with "the defense of the castle".
Easy to overlook, this one is actually a serious treath for the system and the administrator/s. The average user will use impersonation as a joke, being able to create some social havoc in the website. Specially dangerous is when a malicious user impersonates the administrator itself (if you have read so far as here, you are probably destined to be an administrator one day...), exploiting this position to deceive the users, gathering sensitive data from them or ruining the reputation of the administrator. But the danger does not end here, for a hole in the system could grant administrative rights - at least partially - to the intruder. Imagine that in a certain system the administrator is registered as any other user and his/her username is known by many - or all-. If the username were for example "Sakhal", the impersonator could register an account with the username "sakhal".
When I realized about this danger, I checked whether the system would distinguish between "Sakhal" and "sakhal" or not; fortunately it did so, but I did not want to rely solely on that for security. I could not be sure whether under different circumstances this interpretation could change, and there is also the risk of making a mistake while writing the code: if I wrote "sakhal" instead of "Sakhal" in a certain page, then the impersonator would gain administrator rights on that page. This would be disastrous if that page were the administration panel... So I decided that once "Sakhal" was registered, no other variation of the same username ("sakhal", "SAKHAL", "SaKHaL", etc...) should be allowed, and the same would apply for any other username possible. To achieve this, instead of comparing the registrant username with the usernames already registered in the database by using their verbatim form, we have to lowercase the usernames before performing the comparison by using the PHP function strtolower().
Reporting and monitoring
Users are not always the problem regarding security; with a well elaborated reporting system, the administrator can have some relief in his/her work thanks to the reports sent by the users. For achieving such state of things, the users should be provided with an easy to use reporting system to encourage its utilization - but always making clear that the abuse of this feature will be penalized -. Certain informations should be easy to see for the administrator by just navigating on the website itself (particularly the IP addresses and the number of warnings accumulated by the users). An identification number for users, messages and other entities should be also visible, not only for the administrator, but for every user, as a mean to ease reporting. I display these numbers even if the reporting system automatically add them into the reports.
To finish with this, I will stress the necessity of a tool specifically programmed to easily perform the most common moderation and cleaning operations, allowing to perform them from any device. In fact, being able to administer important aspects of our database from a phone is a true luxury. Without this system, we would have to login into phpMyAdmin (a desktop application) every time we needed to perform any operation, which is not specially comfortable given the complex interface. Even worse, some technical error - not unusual in such a complex application - could prevent us from accessing the control panel and then we would have no means to administer the database.