From the Postgres documentation we define an upsert as:
a (…) feature that allows (…) to atomically either insert a row, or on the basis of the row already existing, UPDATE that existing row instead, while safely giving little to no further thought to concurrency.
Obviously this logic can be implemented at the application level, although you then need to worry about the concurrency part. However, this sort of operation is exactly what we should enlist our database to help us with, as they are exceptionally good at it.
Upsert is a commonly used term to refer to adding an ON CONFLICT clause to an INSERT statement. As we'll see there are different ways to define the target of the conflict but it always boils down to some constraint violation.
Ecto
Now that we know what Upserts are, let's see them in action with Ecto. As an example, let's say that we have bank accounts associated with a customer email, which should be unique, a balance and a boolean to lock accounts for security reasons.
defmodule CreateAccounts do
use Ecto.Migration
def change do
create table(:accounts) do
add :email, :string, null: false
add :balance, :decimal, null: false
add :locked, :boolean, null: false, default: false
timestamps()
end
create unique_index(:accounts, :email)
end
end
defmodule Account do
use Ecto.Schema
import Ecto.Changeset
@fields [:email, :balance, :locked]
schema "accounts" do
field :email
field :balance, :decimal, default: Decimal.new(0)
field :locked, :boolean, default: false
timestamps()
end
def changeset(account, attrs) do
account
|> cast(attrs, @fields)
|> validate_required(@fields)
|> unique_constraint(:email)
end
end
With our database set up and schema in place we can go ahead and insert our first account!
%Account{}
|> Account.changeset(%{email: "mike@subvisual.com"})
|> Repo.insert()
Nothing fancy so far, however if we try to run the above statement again we'll
get an error from the uniqueness constraint on the email. If we know that the
account is already there, we can use Repo.update/2
. There is also
Repo.insert_or_update/2
but it relies on us having an existing struct that might
have been loaded from the DB or not, thus it's not an atomic operation.
Upserting
Let's imagine a feature on a system used by a bank teller to create an account, they don't know for sure if the customer is already in the system or not, but they just got handed $50 to deposit on that account.
We could do it in multiple steps in a non atomic way:
- Check if the account exists
- Create one with the balance if not
- Update the existing one adding to the previous balance if yes
That's too many steps though. Let's do it all in a single step!
attrs = %{
email: "mike@subvisual.com",
balance: Decimal.new(50)
}
%Account{}
|> Account.changeset(attrs)
|> Repo.insert(
on_conflict: [inc: [balance: attrs.balance]],
conflict_target: :email,
returning: true
)
There's a few things of note here, most importantly, the on_conflict
option
which takes different formats of input. In this case we are passing it an update
like set of operation, namely the inc
operation, which will do exactly what we
wanted, by incrementing the previous value, instead of replacing.
Secondly, we use the conflict_target
option, which goes in tandem with
on_conflict
. We simply state the conflict we care about will happen on the email
field if at all, however, we can also specify a constraint by name if we want to
be more specific.
Last but not least, we have the returning
option. In this specific scenario it
is not needed, as we are not using the return value of the call. In the case we
did use it, we will usually want returning
to be true
as it will ensure we got
the value that was persisted and not the struct we were trying to persist but
couldn't. More details on this can be found in the Ecto.Repo
documentation.
Upserts with conditions
So far we've looked at how we can perform an upsert on every conflict, regardless of the row that conflicted, however, it's a totally valid for us to want to do things conditionally.
Let's take the previous example, of incrementing the balance, and add another constraint. We only actually perform the operation if the account is not locked, otherwise we return an error.
Remember how I said that on_conflict
takes different types of inputs? Well, one
of them (and the one that gives us more control) is a generic Ecto.Query
struct,
which we already know how to add conditions to, by using the where
clause. All
put together it might look something like this:
attrs = %{
email: "mike@subvisual.com",
balance: Decimal.new(50)
}
balance = attrs.balance
on_conflict = from(
a in Account,
where: not a.locked,
update: [inc: [balance: ^balance]]
)
%Account{}
|> Account.changeset(attrs)
|> Repo.insert(
on_conflict: on_conflict,
conflict_target: :email,
stale_error_field: :email,
returning: true
)
As you can see the on_conflict
is now a standard Ecto.Query.
We reuse the
operation keyword list and pass it to update
and we add the condition we wanted.
There a few important things to have in mind as to how this query will be run,
though.
- The records from which it will select are the subset that triggered the
conflict, not the entire dataset
2. The updates will be ran via an update_all
, which do not update autogenerated
fields, for example
Other than that, it should work pretty much as expected. In the example above, it will successfully increment the balance of the account if it is not locked, returning the updated value.
If the account is locked, however, it will return an error via an error tuple,
the error will be stale_entry
and will be applied to the changeset, on the field
specified by stale_error_field
. The reason this happens is because the insert
didn't actually insert anything, since the on_conflict
query will return an
empty set. Ecto
treats these scenarios as a StaleEntryError
and would generally
raise, but in the upsert scenario we know that it might happen and
stale_error_field
is how we tell Ecto
that we are expecting it, so there is no
need to raise.
Conclusion
Upserts are a powerful tool to have in our database querying tool belt and can
be useful in many situations. We do need to keep in mind that it is not without
it's drawbacks, though. For instance its use is not recommend while also working
with associations, the insert
call becomes more verbose or if we want to handle
conflicts in more complex ways that involve system parts other than the database
(by losing the atomicity), it might not be the thing for us.
We did not cover the on_conflict
option extensively and I do recommend reading
the documentation where you'll also see :nothing
and {:replace, ...}
.
All in all it's relatively simple to use and it's a good way to handle scenarios such as the ones described above.