Making your database stay alive and available when you have extreme large amounts of inserts and updates isn’t easy. But by batching up your updates and inserts into 1 single query, you reduce the amount of data transfers, and number of transferred SQL statements.
INSERT INTO table_name ( unique_keys, extra_values ) VALUES ( u1, e1 ), ( u2, e2 ), ... , ( uN, eN ) ON DUPLICATE KEY UPDATE extra_values = VALUES (extra_values)
… will try to insert N rows, but when the unique key is duplicate, it will only update the given values instead. The row count for this statement is the number of tried inserts + the number of updates. So to get the number of inserts and updates you can calculate by:
updates = row_count - N; inserts = row_count - 2 * updates;
This approach can be used to pure updates too, but you can not add a
WHERE clause to limit updates.
Based on experience this approach can reduce the network and database load when updating large amount of rows with different special values. I am developing a system where a large amount of data is moved from a database (MySQL) store, checked for various updates, name hits, compared to other parts of the database etc., then pushed back to the database. The number of inserts and updates during a ‘low activity’ period easilly reaches 10,000 rows+ per table on ~15 tables, and may get 4-5 (10?) times larger in ‘high activity’ periods, and this is repeated regularily (each 3 minutes) during working hours.
This aproach has kept the database load to a level not noticable to the system admins, though I have no numbers on this. I would like to get real benchmark numbers on this, but I don’t have the time to do so myself.