High Performance Sqlite3
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Drew Bednar 5258f41475 Saving sqlite3 notes 2 weeks ago
.gitignore Saving sqlite3 notes 2 weeks ago
README.md Saving sqlite3 notes 2 weeks ago

README.md

High Performance SQLite3

https://highperformancesqlite.com

brew install sqlite3

Dot Commands

Be sure to learn the .commands for sqlite. In a session you can see theses with

.help

You can also execute these commands from the commandline like so

sqlite3 your_database.db -cmd ".tables"

PRAGMA Statements

https://www.sqlite.org/pragma.html

The PRAGMA statements are sqlite3 specific and used to modify the behavior of the library.

To see all the pragmas supported:

sqlite> pragma pragma_list;
sqlite> pragma page_size;
4096
sqlite> pragma page_count;
5

Sure enough 4096 * 5 = 20480 bytes as we can see with ls

-rw-r--r-- 1 toor toor 20480 Jul  6 21:28 hellosql.db

Some of these are connection only, other's are persistent. Example busy_timeout will only be non-persistent.

An example of an important and persistent setting is the journal_mode. This is rollback mode where it will delete the journal when it's done. WAL mode is an alternative we will look into.

sqlite> pragma journal_mode;
delete

A VERY important one to consider is the pragma compile_options; Where you can see how this database was compiled.

Example, this shows that the brew version was compiled with Full Text Search:

ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5

If you look at the compile options for the MacBook version you will see an OMIT_LOAD_EXTENSIONS flag that means you can't use extensions.

Another useful dot command to consider is the .dbconfig but you get less information.

sqlite> .dbconfig
          defensive on
            dqs_ddl off
            dqs_dml off
        enable_fkey off
        enable_qpsg off
     enable_trigger on
        enable_view on
     fts3_tokenizer off
 legacy_alter_table off
 legacy_file_format off
     load_extension on
   no_ckpt_on_close off
     reset_database off
  reverse_scanorder off
    stmt_scanstatus off
        trigger_eqp off
     trusted_schema off
    writable_schema off

Another important one is foreign key constraints. Remember they prioritized backwards support over new features, so you need to turn this on.

sqlite> pragma foreign_keys;
0

We can simply turn that on with

pragma foreign_keys=1;

This is a session level setting. You will need set this at the beginning off every connection.

Chat Gipity actually can give a good starting point for pragma's to set with reasons for why it would be beneficial. This is a starting point though. You should do your own load testing, and figure out what settings work best for your specific appliction.

Example "Give me the set of pragma's for sqlite that will benefit performance in service of a web application backend database.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -2000;  -- 2MB
PRAGMA temp_store = MEMORY;
PRAGMA foreign_keys = ON;
PRAGMA mmap_size = 268435456;  -- 256MB
PRAGMA optimize;
PRAGMA busy_timeout = 30000;  -- 30 seconds
PRAGMA automatic_index = ON;
PRAGMA locking_mode = EXCLUSIVE;

Remember to look these up.

Loadable extensions

https://www.sqlite.org/loadext.html Additional Extensions: https://github.com/nalgeon/sqlean

You saw above that the Full Text Search extensions were part of the compile options output, thosw are extensions that where compiled into the binary. We can also load extensions if we take on the compilations ourselves Let's follow the instructions to compile and load the csv extension that will let us treat a csv file as a virtual table.

Download the extension source.

A loadable extension obviously can be user code too, but for now we will just stick with the csv extension created by the sqlite team https://www.sqlite.org/src/tree?name=ext.

git clone https://github.com/sqlite/sqlite && cd sqlite

Since I am using version 3.46.0 I will checkout that tag and navigate to the source code of the extension.

git checkout version-3.46.0
cd ext/misc/

Using the example command we see from compilation example

gcc -g -fPIC -shared YourCode.c -o YourCode.so

We will compile the csv.so object.

gcc -g -fPIC -shared csv.c -o csv.so

Now create a new directory to share this with all users and relocate the object.

mkdir /usr/local/lib/sqlite3-ext/
sudo mv csv.so /usr/local/lib/sqlite3-ext/

Be sure to add this new directory to your LD_LIBRARY_PATH in your .bashrc file.

export LD_LIBRARY_PATH=/usr/local/lib/sqlite3-ext:$LD_LIBRARY_PATH

source ~/.bashrc and now you should be able to load this extenstion from the sqlite3 cli.

sqlite> .load csv
sqlite> 

There are other extensions that add user authentication to a sqlite database, there is also an extension that can encrypt the file at rest.

Virtual Tables

Allow you to query a table that is not actually in your database.

Let's take advantage of the csv extension we compiled in the previous step and use that to query a csv file that is actually not part of the database.

Just create some data:

cat <<EOF > ./data/mydata.csv
id,name,level,speed,power
1,Morbinian,20,44,65
2,Tomoro,15,23,23
3,Wonku,99,87,93
EOF

Now we can create a virtual table using the contents, and query them just like you would expect.

sqlite> .load csv
sqlite> CREATE VIRTUAL TABLE npcs USING csv(filename="./data/mydata.csv", header=true);
sqlite> .mode box
sqlite> SELECT * FROM npcs;
┌────┬───────────┬───────┬───────┬───────┐
│ id │   name    │ level │ speed │ power │
├────┼───────────┼───────┼───────┼───────┤
│ 1  │ Morbinian │ 20    │ 44    │ 65    │
│ 2  │ Tomoro    │ 15    │ 23    │ 23    │
│ 3  │ Wonku     │ 99    │ 87    │ 93    │
└────┴───────────┴───────┴───────┴───────┘

There are other virtual tricks you can use too like

SELECT * FROM generate_series(1,100);

Which you could use in a join, or filter statement.

Look at json_tree and json_each functions also. Another virtual table backed feature is Full Text Search 5.

Floating Point

There is no Decimal type for SQLite. There is only the REAL data type that represents a floating point number. The real datatype gaurentees 15 digits of precision.

Let's look at the example of floating point precision.

CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3sqlite> CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3), (-10.52), (-15.78);
sqlite> .mode box
sqlite> SELECT val FROM floats;
┌────────┐
│  val   │
├────────┤
│ 26.3   │
│ -10.52 │
│ -15.78 │
└────────┘

sqlite> SELECT sum(val) FROM floats;
┌──────────────────────┐
│       sum(val)       │
├──────────────────────┤
│ 1.77635683940025e-15 │
└──────────────────────┘

If you do this on paper 26.3 + -10.52 + -15.78 = 0 it should have come out to zero.

There is a function that could have helped us with this. This decimal_sum comes from a decimal extension. NOTE this extension ships in the cli, but not built into the actual sqlite binary.

sqlite> SELECT decimal_sum(val) FROM floats;
┌──────────────────┐
│ decimal_sum(val) │
├──────────────────┤
│ 0.00             │
└──────────────────┘

Optimizing SQLite

Locking

SQLite can only support one writer process at a time, but it can support many readers at any given time.

Locking States:

  • Unlocked: No reading / writing
  • Shared Lock: Process will acquire a shared lock and all other processes can read the DB too, but no process can write
  • Reserved Lock: Process announces it is going to write. In the reserved lock state new shared locks can be acquired.
  • Pending: No new readers can start. We are waiting for all the open shared locks to close out.
  • ** Exclusive Lock**: Process writes as fast as it can to the DB then gives up the lock allowing for other processes to acquire shared locks again.

Rollback mode

Before the db goes to write something it takes the page being modified, sets aside, then when the new page is successfully written to disk it will disregard the previous page. The act of throughing that data away is what completes the commit process.

WAL mode (Write Ahead Log)

The most important thing for SQLite Performance

pragma journal_mode;

To view the current mode. By default it's delete. This IS a persistent setting across sessions. So set once and all subsequent connections will also have this setting.

In wall mode a new file contains the new writes. At some point a checkpoint will occur, and the WAL log is taken and merged with the db, and the process happens again. Checkpointing is configurable. It's a whole lot faster. We still have a single writer but can have many concurrent readers still. The difference is that the reader will have knowledge of WAL transactions up to a specific point. In this mode writer doesn't block the reader anymore.# High Performance SQLite

https://highperformancesqlite.com

brew install sqlite3

Dot Commands

Be sure to learn the .commands for sqlite. In a session you can see theses with

.help

You can also execute these commands from the commandline like so

sqlite3 your_database.db -cmd ".tables"

PRAGMA Statements

https://www.sqlite.org/pragma.html

The PRAGMA statements are sqlite3 specific and used to modify the behavior of the library.

To see all the pragmas supported:

sqlite> pragma pragma_list;
sqlite> pragma page_size;
4096
sqlite> pragma page_count;
5

Sure enough 4096 * 5 = 20480 bytes as we can see with ls

-rw-r--r-- 1 toor toor 20480 Jul  6 21:28 hellosql.db

Some of these are connection only, other's are persistent. Example busy_timeout will only be non-persistent.

An example of an important and persistent setting is the journal_mode. This is rollback mode where it will delete the journal when it's done. WAL mode is an alternative we will look into.

sqlite> pragma journal_mode;
delete

A VERY important one to consider is the pragma compile_options; Where you can see how this database was compiled.

Example, this shows that the brew version was compiled with Full Text Search:

ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5

If you look at the compile options for the MacBook version you will see an OMIT_LOAD_EXTENSIONS flag that means you can't use extensions.

Another useful dot command to consider is the .dbconfig but you get less information.

sqlite> .dbconfig
          defensive on
            dqs_ddl off
            dqs_dml off
        enable_fkey off
        enable_qpsg off
     enable_trigger on
        enable_view on
     fts3_tokenizer off
 legacy_alter_table off
 legacy_file_format off
     load_extension on
   no_ckpt_on_close off
     reset_database off
  reverse_scanorder off
    stmt_scanstatus off
        trigger_eqp off
     trusted_schema off
    writable_schema off

Another important one is foreign key constraints. Remember they prioritized backwards support over new features, so you need to turn this on.

sqlite> pragma foreign_keys;
0

We can simply turn that on with

pragma foreign_keys=1;

This is a session level setting. You will need set this at the beginning off every connection.

Chat Gipity actually can give a good starting point for pragma's to set with reasons for why it would be beneficial. This is a starting point though. You should do your own load testing, and figure out what settings work best for your specific appliction.

Example "Give me the set of pragma's for sqlite that will benefit performance in service of a web application backend database.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -2000;  -- 2MB
PRAGMA temp_store = MEMORY;
PRAGMA foreign_keys = ON;
PRAGMA mmap_size = 268435456;  -- 256MB
PRAGMA optimize;
PRAGMA busy_timeout = 30000;  -- 30 seconds
PRAGMA automatic_index = ON;
PRAGMA locking_mode = EXCLUSIVE;

Remember to look these up.

Loadable extensions

https://www.sqlite.org/loadext.html Additional Extensions: https://github.com/nalgeon/sqlean

You saw above that the Full Text Search extensions were part of the compile options output, thosw are extensions that where compiled into the binary. We can also load extensions if we take on the compilations ourselves Let's follow the instructions to compile and load the csv extension that will let us treat a csv file as a virtual table.

Download the extension source.

A loadable extension obviously can be user code too, but for now we will just stick with the csv extension created by the sqlite team https://www.sqlite.org/src/tree?name=ext.

git clone https://github.com/sqlite/sqlite && cd sqlite

Since I am using version 3.46.0 I will checkout that tag and navigate to the source code of the extension.

git checkout version-3.46.0
cd ext/misc/

Using the example command we see from compilation example

gcc -g -fPIC -shared YourCode.c -o YourCode.so

We will compile the csv.so object.

gcc -g -fPIC -shared csv.c -o csv.so

Now create a new directory to share this with all users and relocate the object.

mkdir /usr/local/lib/sqlite3-ext/
sudo mv csv.so /usr/local/lib/sqlite3-ext/

Be sure to add this new directory to your LD_LIBRARY_PATH in your .bashrc file.

export LD_LIBRARY_PATH=/usr/local/lib/sqlite3-ext:$LD_LIBRARY_PATH

source ~/.bashrc and now you should be able to load this extenstion from the sqlite3 cli.

sqlite> .load csv
sqlite> 

There are other extensions that add user authentication to a sqlite database, there is also an extension that can encrypt the file at rest.

Virtual Tables

Allow you to query a table that is not actually in your database.

Let's take advantage of the csv extension we compiled in the previous step and use that to query a csv file that is actually not part of the database.

Just create some data:

cat <<EOF > ./data/mydata.csv
id,name,level,speed,power
1,Morbinian,20,44,65
2,Tomoro,15,23,23
3,Wonku,99,87,93
EOF

Now we can create a virtual table using the contents, and query them just like you would expect.

sqlite> .load csv
sqlite> CREATE VIRTUAL TABLE npcs USING csv(filename="./data/mydata.csv", header=true);
sqlite> .mode box
sqlite> SELECT * FROM npcs;
┌────┬───────────┬───────┬───────┬───────┐
│ id │   name    │ level │ speed │ power │
├────┼───────────┼───────┼───────┼───────┤
│ 1  │ Morbinian │ 20    │ 44    │ 65    │
│ 2  │ Tomoro    │ 15    │ 23    │ 23    │
│ 3  │ Wonku     │ 99    │ 87    │ 93    │
└────┴───────────┴───────┴───────┴───────┘

There are other virtual tricks you can use too like

SELECT * FROM generate_series(1,100);

Which you could use in a join, or filter statement.

Look at json_tree and json_each functions also. Another virtual table backed feature is Full Text Search 5.

Floating Point

There is no Decimal type for SQLite. There is only the REAL data type that represents a floating point number. The real datatype gaurentees 15 digits of precision.

Let's look at the example of floating point precision.

CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3sqlite> CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3), (-10.52), (-15.78);
sqlite> .mode box
sqlite> SELECT val FROM floats;
┌────────┐
│  val   │
├────────┤
│ 26.3   │
│ -10.52 │
│ -15.78 │
└────────┘

sqlite> SELECT sum(val) FROM floats;
┌──────────────────────┐
│       sum(val)       │
├──────────────────────┤
│ 1.77635683940025e-15 │
└──────────────────────┘

If you do this on paper 26.3 + -10.52 + -15.78 = 0 it should have come out to zero.

There is a function that could have helped us with this. This decimal_sum comes from a decimal extension. NOTE this extension ships in the cli, but not built into the actual sqlite binary.

sqlite> SELECT decimal_sum(val) FROM floats;
┌──────────────────┐
│ decimal_sum(val) │
├──────────────────┤
│ 0.00             │
└──────────────────┘

Optimizing SQLite

Locking

SQLite can only support one writer process at a time, but it can support many readers at any given time.

Locking States:

  • Unlocked: No reading / writing
  • Shared Lock: Process will acquire a shared lock and all other processes can read the DB too, but no process can write
  • Reserved Lock: Process announces it is going to write. In the reserved lock state new shared locks can be acquired.
  • Pending: No new readers can start. We are waiting for all the open shared locks to close out.
  • ** Exclusive Lock**: Process writes as fast as it can to the DB then gives up the lock allowing for other processes to acquire shared locks again.

Rollback mode

Before the db goes to write something it takes the page being modified, sets aside, then when the new page is successfully written to disk it will disregard the previous page. The act of throughing that data away is what completes the commit process.

WAL mode (Write Ahead Log)

The most important thing for SQLite Performance

pragma journal_mode;

To view the current mode. By default it's delete. This IS a persistent setting across sessions. So set once and all subsequent connections will also have this setting.

In wall mode a new file contains the new writes. At some point a checkpoint will occur, and the WAL log is taken and merged with the db, and the process happens again. Checkpointing is configurable. It's a whole lot faster. We still have a single writer but can have many concurrent readers still. The difference is that the reader will have knowledge of WAL transactions up to a specific point. In this mode writer doesn't block the reader anymore.# High Performance SQLite

https://highperformancesqlite.com

brew install sqlite3

Dot Commands

Be sure to learn the .commands for sqlite. In a session you can see theses with

.help

You can also execute these commands from the commandline like so

sqlite3 your_database.db -cmd ".tables"

PRAGMA Statements

https://www.sqlite.org/pragma.html

The PRAGMA statements are sqlite3 specific and used to modify the behavior of the library.

To see all the pragmas supported:

sqlite> pragma pragma_list;
sqlite> pragma page_size;
4096
sqlite> pragma page_count;
5

Sure enough 4096 * 5 = 20480 bytes as we can see with ls

-rw-r--r-- 1 toor toor 20480 Jul  6 21:28 hellosql.db

Some of these are connection only, other's are persistent. Example busy_timeout will only be non-persistent.

An example of an important and persistent setting is the journal_mode. This is rollback mode where it will delete the journal when it's done. WAL mode is an alternative we will look into.

sqlite> pragma journal_mode;
delete

A VERY important one to consider is the pragma compile_options; Where you can see how this database was compiled.

Example, this shows that the brew version was compiled with Full Text Search:

ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5

If you look at the compile options for the MacBook version you will see an OMIT_LOAD_EXTENSIONS flag that means you can't use extensions.

Another useful dot command to consider is the .dbconfig but you get less information.

sqlite> .dbconfig
          defensive on
            dqs_ddl off
            dqs_dml off
        enable_fkey off
        enable_qpsg off
     enable_trigger on
        enable_view on
     fts3_tokenizer off
 legacy_alter_table off
 legacy_file_format off
     load_extension on
   no_ckpt_on_close off
     reset_database off
  reverse_scanorder off
    stmt_scanstatus off
        trigger_eqp off
     trusted_schema off
    writable_schema off

Another important one is foreign key constraints. Remember they prioritized backwards support over new features, so you need to turn this on.

sqlite> pragma foreign_keys;
0

We can simply turn that on with

pragma foreign_keys=1;

This is a session level setting. You will need set this at the beginning off every connection.

Chat Gipity actually can give a good starting point for pragma's to set with reasons for why it would be beneficial. This is a starting point though. You should do your own load testing, and figure out what settings work best for your specific appliction.

Example "Give me the set of pragma's for sqlite that will benefit performance in service of a web application backend database.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -2000;  -- 2MB
PRAGMA temp_store = MEMORY;
PRAGMA foreign_keys = ON;
PRAGMA mmap_size = 268435456;  -- 256MB
PRAGMA optimize;
PRAGMA busy_timeout = 30000;  -- 30 seconds
PRAGMA automatic_index = ON;
PRAGMA locking_mode = EXCLUSIVE;

Remember to look these up.

Loadable extensions

https://www.sqlite.org/loadext.html Additional Extensions: https://github.com/nalgeon/sqlean

You saw above that the Full Text Search extensions were part of the compile options output, thosw are extensions that where compiled into the binary. We can also load extensions if we take on the compilations ourselves Let's follow the instructions to compile and load the csv extension that will let us treat a csv file as a virtual table.

Download the extension source.

A loadable extension obviously can be user code too, but for now we will just stick with the csv extension created by the sqlite team https://www.sqlite.org/src/tree?name=ext.

git clone https://github.com/sqlite/sqlite && cd sqlite

Since I am using version 3.46.0 I will checkout that tag and navigate to the source code of the extension.

git checkout version-3.46.0
cd ext/misc/

Using the example command we see from compilation example

gcc -g -fPIC -shared YourCode.c -o YourCode.so

We will compile the csv.so object.

gcc -g -fPIC -shared csv.c -o csv.so

Now create a new directory to share this with all users and relocate the object.

mkdir /usr/local/lib/sqlite3-ext/
sudo mv csv.so /usr/local/lib/sqlite3-ext/

Be sure to add this new directory to your LD_LIBRARY_PATH in your .bashrc file.

export LD_LIBRARY_PATH=/usr/local/lib/sqlite3-ext:$LD_LIBRARY_PATH

source ~/.bashrc and now you should be able to load this extenstion from the sqlite3 cli.

sqlite> .load csv
sqlite> 

There are other extensions that add user authentication to a sqlite database, there is also an extension that can encrypt the file at rest.

Virtual Tables

Allow you to query a table that is not actually in your database.

Let's take advantage of the csv extension we compiled in the previous step and use that to query a csv file that is actually not part of the database.

Just create some data:

cat <<EOF > ./data/mydata.csv
id,name,level,speed,power
1,Morbinian,20,44,65
2,Tomoro,15,23,23
3,Wonku,99,87,93
EOF

Now we can create a virtual table using the contents, and query them just like you would expect.

sqlite> .load csv
sqlite> CREATE VIRTUAL TABLE npcs USING csv(filename="./data/mydata.csv", header=true);
sqlite> .mode box
sqlite> SELECT * FROM npcs;
┌────┬───────────┬───────┬───────┬───────┐
│ id │   name    │ level │ speed │ power │
├────┼───────────┼───────┼───────┼───────┤
│ 1  │ Morbinian │ 20    │ 44    │ 65    │
│ 2  │ Tomoro    │ 15    │ 23    │ 23    │
│ 3  │ Wonku     │ 99    │ 87    │ 93    │
└────┴───────────┴───────┴───────┴───────┘

There are other virtual tricks you can use too like

SELECT * FROM generate_series(1,100);

Which you could use in a join, or filter statement.

Look at json_tree and json_each functions also. Another virtual table backed feature is Full Text Search 5.

Floating Point

There is no Decimal type for SQLite. There is only the REAL data type that represents a floating point number. The real datatype gaurentees 15 digits of precision.

Let's look at the example of floating point precision.

CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3sqlite> CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3), (-10.52), (-15.78);
sqlite> .mode box
sqlite> SELECT val FROM floats;
┌────────┐
│  val   │
├────────┤
│ 26.3   │
│ -10.52 │
│ -15.78 │
└────────┘

sqlite> SELECT sum(val) FROM floats;
┌──────────────────────┐
│       sum(val)       │
├──────────────────────┤
│ 1.77635683940025e-15 │
└──────────────────────┘

If you do this on paper 26.3 + -10.52 + -15.78 = 0 it should have come out to zero.

There is a function that could have helped us with this. This decimal_sum comes from a decimal extension. NOTE this extension ships in the cli, but not built into the actual sqlite binary.

sqlite> SELECT decimal_sum(val) FROM floats;
┌──────────────────┐
│ decimal_sum(val) │
├──────────────────┤
│ 0.00             │
└──────────────────┘

Optimizing SQLite

Locking

SQLite can only support one writer process at a time, but it can support many readers at any given time.

Locking States:

  • Unlocked: No reading / writing
  • Shared Lock: Process will acquire a shared lock and all other processes can read the DB too, but no process can write
  • Reserved Lock: Process announces it is going to write. In the reserved lock state new shared locks can be acquired.
  • Pending: No new readers can start. We are waiting for all the open shared locks to close out.
  • ** Exclusive Lock**: Process writes as fast as it can to the DB then gives up the lock allowing for other processes to acquire shared locks again.

Rollback mode

Before the db goes to write something it takes the page being modified, sets aside, then when the new page is successfully written to disk it will disregard the previous page. The act of throughing that data away is what completes the commit process.

WAL mode (Write Ahead Log)

The most important thing for SQLite Performance

pragma journal_mode;

To view the current mode. By default it's delete. This IS a persistent setting across sessions. So set once and all subsequent connections will also have this setting.

In wall mode a new file contains the new writes. At some point a checkpoint will occur, and the WAL log is taken and merged with the db, and the process happens again. Checkpointing is configurable. It's a whole lot faster. We still have a single writer but can have many concurrent readers still. The difference is that the reader will have knowledge of WAL transactions up to a specific point. In this mode writer doesn't block the reader anymore.# High Performance SQLite

https://highperformancesqlite.com

brew install sqlite3

Dot Commands

Be sure to learn the .commands for sqlite. In a session you can see theses with

.help

You can also execute these commands from the commandline like so

sqlite3 your_database.db -cmd ".tables"

PRAGMA Statements

https://www.sqlite.org/pragma.html

The PRAGMA statements are sqlite3 specific and used to modify the behavior of the library.

To see all the pragmas supported:

sqlite> pragma pragma_list;
sqlite> pragma page_size;
4096
sqlite> pragma page_count;
5

Sure enough 4096 * 5 = 20480 bytes as we can see with ls

-rw-r--r-- 1 toor toor 20480 Jul  6 21:28 hellosql.db

Some of these are connection only, other's are persistent. Example busy_timeout will only be non-persistent.

An example of an important and persistent setting is the journal_mode. This is rollback mode where it will delete the journal when it's done. WAL mode is an alternative we will look into.

sqlite> pragma journal_mode;
delete

A VERY important one to consider is the pragma compile_options; Where you can see how this database was compiled.

Example, this shows that the brew version was compiled with Full Text Search:

ENABLE_FTS3
ENABLE_FTS3_PARENTHESIS
ENABLE_FTS4
ENABLE_FTS5

If you look at the compile options for the MacBook version you will see an OMIT_LOAD_EXTENSIONS flag that means you can't use extensions.

Another useful dot command to consider is the .dbconfig but you get less information.

sqlite> .dbconfig
          defensive on
            dqs_ddl off
            dqs_dml off
        enable_fkey off
        enable_qpsg off
     enable_trigger on
        enable_view on
     fts3_tokenizer off
 legacy_alter_table off
 legacy_file_format off
     load_extension on
   no_ckpt_on_close off
     reset_database off
  reverse_scanorder off
    stmt_scanstatus off
        trigger_eqp off
     trusted_schema off
    writable_schema off

Another important one is foreign key constraints. Remember they prioritized backwards support over new features, so you need to turn this on.

sqlite> pragma foreign_keys;
0

We can simply turn that on with

pragma foreign_keys=1;

This is a session level setting. You will need set this at the beginning off every connection.

Chat Gipity actually can give a good starting point for pragma's to set with reasons for why it would be beneficial. This is a starting point though. You should do your own load testing, and figure out what settings work best for your specific appliction.

Example "Give me the set of pragma's for sqlite that will benefit performance in service of a web application backend database.

PRAGMA journal_mode = WAL;
PRAGMA synchronous = NORMAL;
PRAGMA cache_size = -2000;  -- 2MB
PRAGMA temp_store = MEMORY;
PRAGMA foreign_keys = ON;
PRAGMA mmap_size = 268435456;  -- 256MB
PRAGMA optimize;
PRAGMA busy_timeout = 30000;  -- 30 seconds
PRAGMA automatic_index = ON;
PRAGMA locking_mode = EXCLUSIVE;

Remember to look these up.

Loadable extensions

https://www.sqlite.org/loadext.html Additional Extensions: https://github.com/nalgeon/sqlean

You saw above that the Full Text Search extensions were part of the compile options output, thosw are extensions that where compiled into the binary. We can also load extensions if we take on the compilations ourselves Let's follow the instructions to compile and load the csv extension that will let us treat a csv file as a virtual table.

Download the extension source.

A loadable extension obviously can be user code too, but for now we will just stick with the csv extension created by the sqlite team https://www.sqlite.org/src/tree?name=ext.

git clone https://github.com/sqlite/sqlite && cd sqlite

Since I am using version 3.46.0 I will checkout that tag and navigate to the source code of the extension.

git checkout version-3.46.0
cd ext/misc/

Using the example command we see from compilation example

gcc -g -fPIC -shared YourCode.c -o YourCode.so

We will compile the csv.so object.

gcc -g -fPIC -shared csv.c -o csv.so

Now create a new directory to share this with all users and relocate the object.

mkdir /usr/local/lib/sqlite3-ext/
sudo mv csv.so /usr/local/lib/sqlite3-ext/

Be sure to add this new directory to your LD_LIBRARY_PATH in your .bashrc file.

export LD_LIBRARY_PATH=/usr/local/lib/sqlite3-ext:$LD_LIBRARY_PATH

source ~/.bashrc and now you should be able to load this extenstion from the sqlite3 cli.

sqlite> .load csv
sqlite> 

There are other extensions that add user authentication to a sqlite database, there is also an extension that can encrypt the file at rest.

Virtual Tables

Allow you to query a table that is not actually in your database.

Let's take advantage of the csv extension we compiled in the previous step and use that to query a csv file that is actually not part of the database.

Just create some data:

cat <<EOF > ./data/mydata.csv
id,name,level,speed,power
1,Morbinian,20,44,65
2,Tomoro,15,23,23
3,Wonku,99,87,93
EOF

Now we can create a virtual table using the contents, and query them just like you would expect.

sqlite> .load csv
sqlite> CREATE VIRTUAL TABLE npcs USING csv(filename="./data/mydata.csv", header=true);
sqlite> .mode box
sqlite> SELECT * FROM npcs;
┌────┬───────────┬───────┬───────┬───────┐
│ id │   name    │ level │ speed │ power │
├────┼───────────┼───────┼───────┼───────┤
│ 1  │ Morbinian │ 20    │ 44    │ 65    │
│ 2  │ Tomoro    │ 15    │ 23    │ 23    │
│ 3  │ Wonku     │ 99    │ 87    │ 93    │
└────┴───────────┴───────┴───────┴───────┘

There are other virtual tricks you can use too like

SELECT * FROM generate_series(1,100);

Which you could use in a join, or filter statement.

Look at json_tree and json_each functions also. Another virtual table backed feature is Full Text Search 5.

Floating Point

There is no Decimal type for SQLite. There is only the REAL data type that represents a floating point number. The real datatype gaurentees 15 digits of precision.

Let's look at the example of floating point precision.

CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3sqlite> CREATE TABLE floats (val REAL);
INSERT INTO floats VALUES (26.3), (-10.52), (-15.78);
sqlite> .mode box
sqlite> SELECT val FROM floats;
┌────────┐
│  val   │
├────────┤
│ 26.3   │
│ -10.52 │
│ -15.78 │
└────────┘

sqlite> SELECT sum(val) FROM floats;
┌──────────────────────┐
│       sum(val)       │
├──────────────────────┤
│ 1.77635683940025e-15 │
└──────────────────────┘

If you do this on paper 26.3 + -10.52 + -15.78 = 0 it should have come out to zero.

There is a function that could have helped us with this. This decimal_sum comes from a decimal extension. NOTE this extension ships in the cli, but not built into the actual sqlite binary.

sqlite> SELECT decimal_sum(val) FROM floats;
┌──────────────────┐
│ decimal_sum(val) │
├──────────────────┤
│ 0.00             │
└──────────────────┘

Optimizing SQLite

Locking

SQLite can only support one writer process at a time, but it can support many readers at any given time.

Locking States:

  • Unlocked: No reading / writing
  • Shared Lock: Process will acquire a shared lock and all other processes can read the DB too, but no process can write
  • Reserved Lock: Process announces it is going to write. In the reserved lock state new shared locks can be acquired.
  • Pending: No new readers can start. We are waiting for all the open shared locks to close out.
  • ** Exclusive Lock**: Process writes as fast as it can to the DB then gives up the lock allowing for other processes to acquire shared locks again.

Rollback mode

Before the db goes to write something it takes the page being modified, sets aside, then when the new page is successfully written to disk it will disregard the previous page. The act of throughing that data away is what completes the commit process.

WAL mode (Write Ahead Log)

The most important thing for SQLite Performance

pragma journal_mode;

To view the current mode. By default it's delete. This IS a persistent setting across sessions. So set once and all subsequent connections will also have this setting.

In wall mode a new file contains the new writes. At some point a checkpoint will occur, and the WAL log is taken and merged with the db, and the process happens again. Checkpointing is configurable. It's a whole lot faster. We still have a single writer but can have many concurrent readers still. The difference is that the reader will have knowledge of WAL transactions up to a specific point. In this mode writer doesn't block the reader anymore.