Let’s say I need to set a boolean field on 100,000 records at the same time. And I’m using Rails 4.x, just to clarify.
The simple way to do this is
unprocessed_records.each do |unprocessed_record| unprocessed_record.processed = true unprocessed_record.save! end
But this results in 100,000 ActiveRecord objects instantiated and held in memory at the same time. Since all I’m doing is updating an attribute, this is not efficient in memory or time.
A more efficient way to update all these records at once is
The benefit is that it doesn’t create an ActiveRecord instance per database record. It also skips validations and callbacks from the model, so the database update is faster.
One thing ActiveRecord does give us is the timestamps. Each time we
updated_at is modified. Using
#update_all, we lose that. In the
case of a database migration, we might not want the timestamps updated, but, in
this use case, we certainly do.
into the database without passing them through ActiveRecord’s typecasting. So
it’s important to use the correct value, since ActiveRecord won’t do anything
magical for us.
So how can we use
#update_all and still ensure the timestamps are kept
In searching, I came across this StackOverflow
question from 2013. The accepted
answer mentions that you can pass
#update_all a value of
updated_at equal to
DateTime.now. But I wanted to be sure this solution would still work today.
In the Rails console, I found the value which would be put into the database.
> DateTime.now.to_s => "2015-04-27T18:20:28-06:00"
Next, I had to check whether this value could be used for our column type, since ActiveRecord won’t be typecasting it.
My database is Postgres, and for
updated_at we use the column
timestamp [without time zone].
The output in the Rails console above clearly has the timezone information, so I needed to check whether that’s acceptable for Postgres or I’d have to munge the value.
Postgres’ documentation on date/time types has the following:
Note: The SQL standard requires that writing just timestamp be equivalent to timestamp without time zone, and PostgreSQL honors that behavior. timestamptz is accepted as an abbreviation for timestamp with time zone; this is a PostgreSQL extension.
This lead me to think I could use the value of
DateTime.now.to_s and be fine.
I tested it out from the command line, and verified it worked. The timestamp had
the correct value.
So, when you need to manually update timestamps in Postgres using Rails, use
Keep this in mind when you need to update many records along with their timestamps. We don’t get all of Rails’ magic, but it will save the server a lot of time and memory.
Update: My co-worker Zac McCormick asked a good question. If we need to store the time as UTC, does Postgres convert non-UTC times to UTC time before it strips the timezone?
The answer appears to be no. This SO
answer tells us
time zone ignores a timezone modifier if we should add one. The time zone is
assumed to be that of the time zone setting.
If your server’s timezome is not set to UTC, and you want to store UTC timestamps, you’ll have to convert the time yourself. This other SO question has a good answer.
> DateTime.now.to_s => "2015-04-27T19:07:55-06:00" > DateTime.now.new_offset(Rational(0,24)).to_s => "2015-04-28T01:07:56+00:00"