Intro to Bitmap Index: what is it and when to use
There are bunch of different indices in the database world, you are probably most familiar with B-tree index. Different kind of index will have great impact on performance, bitmap index can be very helpful in some use case.
Case
Suppose Columbia university has a following table for the database course:
And suppose the course is so popular that millions of people take it, so the table is going to be huge.
Now we want to run the following SQL statement:
SELECT * FROM table WHERE Gender='male' AND Pass='false';
No index on ‘Pass’ and ‘Gender’ column
If we don’t use any index, the process will be scanning the whole table record by record million times and return the result set. But I’m sure we can do better than this.
If we don’t use any index, the process will be scanning the whole table record by record million times and return the result set. But I’m sure we can do better than this.
B-tree index on ‘Pass’ and ‘Gender’ column
So everyone knows index is going to help searching in database by sorting the entries. Is it going to faster if we put index on those columns we care about?
Not necessarily.
Index is not magic. If 50% of the records in table is ‘Gender=male’ and 50% of the records is ‘Pass=true’. Then the sql statement is going to return a very large dataset, maybe 30% of the whole data. In this case, the percentage of result set is considerable large compares to the original table, than the database optimizer will choose not to use the index even you have the index built.
Most probably, b-tree index is going to end up the same path as no index situation, scan the whole table record by record.
B-tree index on ‘Pass’ and ‘Gender’ column
So we know B-tree index is not going to help much on low cardinality situations. Here comes the use case of bitmap index.
We create one set of vectors for each of the column we care, in this case, ‘Gender’ and ‘Pass’. So now we have the 2 following set of vectors:
When we try to execute SELECT * FROM table WHERE Gender='male' AND Pass='false';
, it will do a vector AND operation:
Voila.
When to use bitmap index
* One thing to notice is MySQL is not supporting bitmap index for now (the only database that is supporting bitmap index I know about is Oracle database) , it’s in there work log for future feature though: https://dev.mysql.com/worklog/task/?id=1524