Behind virtually all websites and web applications, hides an SQL database. And when designing it, it is necessary to choose with accuracy the different types of columns. Something that sometimes proves more difficult than expected … What is the difference between one
VARCHAR? What are they
BLOB? This small guide will avoid some errors that could well, ultimately cost us a few extra gigabytes and unnecessary slowdowns.
We will concentrate on MySQL because it is the DBopen source the most widespread, but either MySQL, PostgreSQL or another does not change much, given that they are all expected to respect the SQL standard – although ‘In practice, they each have some specificities – one will easily find in another DBMS.
Strings of characters
CHAR & VARCHAR
Many are unaware of the difference between the two as they allow both to store strings up to 255 bytes. The difference lies in the size needed to store the chain. It will
CHAR(x)store on x bytes, so if you choose one
CHAR(255)for your column, each field will take 255 bytes, even if you store one word. Conversely,
VARCHARas the name implies, it has a variable size. It
VARCHAR(x)stores it on x bytes maximum, so one
VARCHAR(255)can weigh only 64 bytes.
In this case, why not always opt for the
VARCHAR? It is a bit heavier for the same amount of data stored, since in addition to your string, it stores the size of it (since it is variable).
So we will use it
CHARfor a string that is always of the same length, for example a client reference that always makes the same number of characters (sorry, I have no more example that comes to mind: P ) While one
VARCHARwill be perfect for a pseudo.
Note: From MySQL 5.5,
VARCHARcan be up to 65,535 bytes, the same size as
TEXT & BLOB
TEXT, which range from
LONGTEXT, are used to store character strings from 256 bytes and up to 2 ^ 32, or 4,294,967,296 bytes. For example, in WordPress, the content of each article is stored in one
LONGTEXT, with that, one can therefore write novels!
BLOBare similar to
TEXT, however, they store the data in binary and not in string. Two important points differentiate them from their counterparts
- As they store the strings in binary, it is possible to use several sets of characters indifferently, which is not possible with
TEXTsince one defines a set for the column and one holds there.
- Moreover, the type
TEXTdoes not take into account case, so it is possible to compare a string uppercase to a lowercase, with
BLOBthis is not possible since it will treat the code representing the signs of the chain, not the signs themselves.
L Corresponds to the size of the stored string, and the + x is the size in bytes necessary to store the string size.
|Type||Maximum size||Space Required|
||(256)||L + 1|
||(65,536)||L + 2|
||2 24 (16 777 216)||L + 3|
||2 ^ 32 (4 294 967 296)||L + 4|
Integer numbers are used with type columns
INT, which range from
|Column type||Maximum size||Space Required|
||From -128 to 127||1 byte|
||From -32768 to 32767||2 bytes|
||From -8388608 to 8388607||3 bytes|
||From -2147483648 to 2147483647||4 bytes|
||From -9223372036854775808 to 9223372036854775807||8 bytes|
Two attributes are particularly interesting with integers. It is about
UNSIGNED. The first one adds 0 to the left of your number so that it reaches a predefined number of characters. The second allows “doubling” the capacity of a column.
- It is possible to specify a minimum character size with
INT. If the number entered does not reach the minimum number of digits specified, the attribute
ZEROFILLwill add a 0 to the left. For example, to have at least 5 characters you can do
ZEROFILL, if you enter 8, then it will
ZEROFILLcomplete it to save 00008.
- The attribute
UNSIGNED, on the other hand, allows to ignore the sign. We therefore keep the same storage interval, but by allowing only the positive, we double the maximum allowed number. For
TINYINTexample, instead of having an interval between -128 and 127, the interval is between 0 and 255, which is very convenient in some cases!
They are equivalent and can take two parameters (but this is not required), precision and scale.
- The precision represents the number of significant digits (not the 0 on the left)
- The scale gives the number of digits after the decimal point
DECIMAL(6,3)may store: 5,11; 005.108; 123,108 … Attention, the precision includes the digits before AND after the comma. So with an accuracy of 6 and a scale of 3, you can store numbers with three digits before the decimal point and three digits after.
FLOATCan be used without parameters, however in this case four additional bytes will be used for the value of the column. It is also possible to use it with precision and scale, in the same way as
REALused without parameters.
DOUBLEIs normally double
REALin the SQL standard, ie there is an accuracy of 4 bytes for
REALand 8 for
DOUBLE. Nevertheless, MySQL addresses an accuracy of 8 in both cases, so much use
DOUBLE. In this way, if we change DBMS, we will avoid surprises.
What is the difference between
DECIMALstore the values as a character string, while
FLOAT, but also
REAL, store them as a number. It’s cool to know, but not great interest you will tell. Well if, since a comma number is never quite exact, so as a number, the values are approximated.
That is to say that your 2,045 can be stored at its approximate value 2,045,0001. Which is not quite the same thing … and that may pose a problem if we have to make comparisons on it. On the other hand, in the form of a string, your numbers do not move. It is always good to know, depending on the applications, the difference can be important.
Hours and dates
As their respective names indicate, these four types are used to store the date, time, time / date and year. They are stored in the following formats:
- YYYY-MM-DD (but YYYYMMDD or YYMMDD also work)
- HH: MM: SS (but also HHMMSS)
- YYYY-MM-DD HH: MM: SS (but also YYYYMMDDHHMMSS)
Warning, in the format where you do not specify the century, MYSQL will take the 20th century by default for dates after 70 while it will infer that you are talking about XXI if you have a date between 00 and 69 inclusive. Both provide an AAAA format to avoid errors;)
MySQL supports dates between 1001-01-01 and 9999-12-31, except for those
YEARthat support an interval between 1901 and 2155, included (but which has the advantage of being very light because encoded on a single byte) .
It is rarely used because it does not correspond to the TIMESTAMP UNIX (which is the one of the majority of programming languages), in which the TIMESTAMP corresponds to the number of seconds elapsed between the 1st January 1970 to OOh00m00s and the date in question, logically Expressed in seconds. MySQL actually stores them in the same format as the
DATETIME, so impractical, for him to pass a TIMESTAMP generated by php, ruby or other.
Multiple choice columns
ENUM, As the name indicates, allows to enumerate choices. It operates a little differently from the other columns because it must first inform the different values that it can take. It is therefore assigned several choices by default. For example, let’s take a genre column in which we want to list several genres of cinematography:
Genre ENUM ( 'horror' , 'comedy' , 'thriller' , 'reporting' )
Here is an example of a column
This column may not contain values other than those specified in the definition of
NULL(If allowed when saving the column)
- An empty string if one attempts to insert an invalid value.
The index of the empty string is 0. To insert a value, one can either directly submit the value in question, or provide its index as an argument. The type
ENUMmay have 65535 different values. Its strength is that it really does not take place since it is coded on 1 byte!
SETIs similar to the fact
ENUMthat a single line can take several values. For example, for the column below:
SET genre ( 'horror' , 'comedy' , 'thriller' , 'reportage' )
An entry can take as ‘horror’ or ‘thriller’ value, but also ‘horror, thriller’. To record the values, the operation is again very similar to that of
ENUM, in the case where the value of this one is directly submitted. To give several values, you have to separate them by commas, for example, to have thriller and comedy in the same line we will do ‘thriller, comedy’.
It is also possible to use indexes to record values. We have in order horror, comedy, thriller and reporting. So a four digit number (since four values). If the number corresponding to the value is 1, it is recorded, if it is at 0, it is not recorded. A small picture would perhaps be welcome:
So 0000 will give an empty line, 0001 will give report, 1010 will give horror, thriller etc …
SETcan only take 64 values.
Finally, be careful,
ENUMare specific to MySQL, if you use them, it will subsequently be difficult to change DBMS.