MySQL and Postgres as an External Data Source
- Setting up your database
- Updating your firewall rules
- Creating the connection
- Creating queries
- Examples
You can connect Kissmetrics to your MySQL or Postgres database. This will import all existing data and then continuously monitor your database for any new data and pass that into Kissmetrics. This is a powerful and easy way to integrate Kissmetrics with your app.
If you have a different SQL database (Oracle, SQLite, Mongo), we have not yet built support to directly read from these database types. Let us know that you are interested in seeing such an integration! In the meantime, please consider converting your data into .csv
files and using our recurring CSV import method.
It’s recommended you read People, Events, and Properties before getting started here.
Setting up your database
The Kissmetrics SQL integration will only work with a read-only database. We enforce this for all users for our users peace of mind. Additionally we recommend that you create a separate MySQL user for Kissmetrics to use. The following SQL will accomplish this:
Particularly for Postgres, we check for read-only status by ensuring the user you created does not have table privileges to INSERT
, UPDATE
, DELETE
, TRUNCATE
.
Just make sure you note the username (kissmetrics
in this example) and password (some_password
in this example) because you’ll need those when you create the connection.
This SQL provides access to all tables in a database. You can also grant access to the specific tables you will be querying with Kissmetrics.
Updating your firewall rules
Kissmetrics will connect to your MySQL server using one of the following IP addresses:
75.101.132.88
75.101.161.87
75.101.165.228
184.73.221.24
174.129.16.231
You’ll want to make sure that your firewall allows connections to the port you run MySQL on (which by default is 3306
) from these IP addresses. If you have other security needs please contact support so that we can help you figure out the best solution for your situation.
If your database is on AWS
If your database is hosted on Amazon Web Services in the US East (Northern Virginia) Region (us-east-1), then we would be locating your databases using the internal Amazon network. In that case, Kissmetrics may connect to your MySQL server using one of the following internal Amazon IP addresses:
10.187.169.236
10.180.220.215
10.154.138.252
Creating the connection
To create the connection:
- Go to Settings (the small gear tab)
- Click Data Integrations
- Click Add a MySQL Data Source
- Fill out the connection details you used in the previous step. You’ll need to specify the remote address of the MySQL server. If your server is not accessible over the internet please contact us so we can help you determine your best option.
Creating queries
You will need to create one query for each event.
- Name: you can describe what the query is importing.
- Record Kissmetrics Event as: name the Kissmetrics event that is recorded by this query.
When creating your SQL query you’ll need to make sure that the results contain the following fields:
KM_PERSON
- this field will be used as the identity of the person. Typically you’ll want to select the email or username field from your database.KM_TIME
- this field will be used as the timestamp of the event. Typically you’ll want to select a date field from your database.KM_LAST_RAN
- this field is used so that Kissmetrics knows which rows have already been synced. Typically you’ll want to select the id or timestamp field from your database.
See below for more details about each.
KM_PERSON
The value you select for KM_PERSON
will be used as the identity for the person performing the event. As mentioned the best choice, when available, is the e-mail field form your table. To rename email
as KM_PERSON
you can use the SELECT...AS...
syntax to name the field in the results (e.g. SELECT email AS KM_PERSON
).
It’s possible that the identity field you want to use is in a different table. In this case you might need to use a SQL JOIN to join the two tables. For example if I have a payments
table that I’m creating a query for I might do:
KM_TIME
The value you select for KM_TIME
will be used as the date/time for the event you syncing. Kissmetrics will not sync without this field. The column type can be any valid date field, including just a plain INT
field with a UNIX epoch-based timestamp. You can use the SELECT...AS...
syntax to name the field in the results (e.g. SELECT created_at AS KM_TIME
).
Please note that Kissmetrics will set the timezone to UTC/GMT (SET time_zone = "+0:00"
) for the connection before making any queries.
KM_LAST_RAN
When Kissmetrics syncs it will sync at most 50,000 rows every hour per query. When it runs the next hour it will need to make sure that it selects the next 50,000 rows - making sure not to duplicate or miss any rows.
This is where the KM_LAST_RAN
field comes in. There are a few key things to keep in mind:
- You need to select a column that only increments when new data is inserted. An ID column is a perfect candidate, because it only goes up (after Kissmetrics has processed the row with ID 500 it knows that no new row with a lower ID than 500 is going to be inserted). A timestamp or date/time column can work also as long as new rows are going to have later timestamps than the existing rows in the database. If users can insert rows with timestamps in the past than a timestamp is not a good choice.
- You need to tell Kissmetrics the last value that was used. You can do this by returning a field with the name
KM_LAST_RAN
. If you were using theid
column for this purpose you can doSELECT id AS KM_LAST_RAN
to accomplish this. Kissmetrics will store the last value used here. - You need to add a condition to your SQL query that will ensure that when Kissmetrics runs your query it only selects rows that have not been synced. The value that you last returned to Kissmetrics will be available in the variable
$KM_LAST_RAN_FOR_THIS_QUERY
. If you are using the ID column you can useWHERE id > $KM_LAST_RAN_FOR_THIS_QUERY
to accomplish this. You must be very careful when using timestamps for the this purpose. To ensure that Kissmetrics does not miss any data it is recommended that you use a>=
condition (e.g.WHERE created_at >= $KM_LAST_RAN_FOR_THIS_QUERY
). This does mean that Kissmetrics will get some duplicate rows, but Kissmetrics will ignore the duplicates. Not doing this means that Kissmetrics might miss some data depending on the granularity of your timestamps and the frequency that you get new data. - Your query will automatically be ordered by
KM_LAST_RAN
. You should not add ORDER BY or LIMIT clauses to your query. - NOTE: You should use the same field for KM_LAST_RAN and $KM_LAST_RAN_FOR_THIS_QUERY.
A full query might look like:
Additional properties
Any additional fields you return in your SQL will be recorded as Kissmetrics properties for the person/event. So let’s say you have gender
and country
columns in your table that you want to pass along when a user signs up. You could do:
Whatever the name of the field is is what the property will be recorded as. If you want to use a different name you can use SELECT ... AS ...
. So if you had a field cpn
that you wanted to record as Coupon Code
you could use: SELECT cpn AS "Coupon Code"
.
Examples
Tracking when a user signs up
It’s quite easy to track all your signups:
Imagine you also have a payment_plans
table and you want to include the plan name when a user signs up, you can do: