Issue 7379053: Locking for jobs

Can't Edit
Can't Publish+Mail
Start Review

Created:
11 years, 2 months ago by abel.deuring

Modified:
11 years, 2 months ago

Reviewers:
rharding, mp+150853

Visibility:
Public.

Description

Locking for jobs Charmworld runs a few cron jobs to keep the list of avaliable charms up to date. If charmworld is installed on more than one machines, running these jobs on all machines would cause unnecessary duplicate work, like queueing the charms for further processing. This branch adds a simple locking mechansim for these jobs. The core idea: A lock is a record in the collection "locks" of the MongoDB with a given ID. MongoDB can raise a DuplicateKey exception when a record with the ID 'some_name' already exists during a call of collection.insert({'_id': 'some_name', ...}) ("can" means: When I first attempted to use the lock for the jobs review and core_review, locking simply did not work, while it worked fine in tests and for the queueing job. It turned out that the parameter "fsync" msut be specified when pymongo.Connection is instantiated. As the test test_bad_connection_setup shows, the actual value of this parameter is not important -- but it must be present...) Hence lock() ensures that fsync was specified for the given DB connection. Locks have a lease time; if the lease expires, a now stale lock is deleted when another process wants to acquire the lock. The related call of locks.remove(existing_lock) is a possible race condition: Two processes might try concurrently to remove the stale lock. A simple tests shows that duplicate remove(existing_lock) calls succeed, even if the second call just does nothing. https://code.launchpad.net/~adeuring/charmworld/queue-review-mutex/+merge/150853 (do not edit description out of merge proposal)

Patch Set 1 #

Total comments: 5

Created: 11 years, 2 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Stats (+254 lines, -16 lines)			Patch
A	[revision details]	View	1 chunk	+2 lines, -0 lines	0 comments	Download
M	charmworld/jobs/core_review.py	View	2 chunks	+11 lines, -2 lines	1 comment	Download
M	charmworld/jobs/lp.py	View	2 chunks	+16 lines, -2 lines	1 comment	Download
M	charmworld/jobs/review.py	View	2 chunks	+11 lines, -2 lines	1 comment	Download
A	charmworld/jobs/tests/test_utils.py	View	1 chunk	+145 lines, -0 lines	1 comment	Download
M	charmworld/jobs/utils.py	View	2 chunks	+54 lines, -0 lines	0 comments	Download
M	charmworld/testing/__init__.py	View	2 chunks	+10 lines, -10 lines	0 comments	Download
M	default.ini	View	1 chunk	+5 lines, -0 lines	1 comment	Download

Messages

Total messages: 3

Expand All Messages | Collapse All Messages

rharding

lgtm with a check on using safe vs fsync. If safe works then it should ...

11 years, 2 months ago (2013-02-27 17:30:27 UTC) #2

abel.deuring

11 years, 2 months ago (2013-02-28 13:00:23 UTC) #3

On 27.02.2013 18:30, rharding@mitechie.com wrote:
> lgtm with a check on using safe vs fsync. If safe works then it should
> be a bit more performance than an fsync.

Right, using safe=True works also. Checking if this parameter has been
specififed, is a bit more tricky, so I changed the function  lock() to
just try to insert a test record twice.

> 
> I also would prefer to see something that catches locks that were not
> released. I know they timeout, but we should have some notice that they
> were hit so we can look into the issue. I'm worried about a charm
> instance dying without us realizing there's an issue. It might take some
> coordination with the charm though to get an error/notice out and up to
> nagios or email or something.

If a job "just dies" without releasing the job, we have a problem that
is likely much more serious that just a stale lock:

    try:
        yield
    finally:
        locks.remove(my_lock)

ensures that the lock is released for "regular Python errors". If the
job dies due to a segfault, this segfault would be my main concern, not
the "orphaned" lock data.


> 
> 
> https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/core_review.py
> 
> File charmworld/jobs/core_review.py (right):
> 
>
https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/core_review.py#...
> 
> charmworld/jobs/core_review.py:107: c = pymongo.Connection(MONGO_URL,
> fsync=True)
> can we get away with just safe=True vs fsync? It's a bit lighter than
> fsync and we changed to this in test runs to help fix issues from mongo
> syncing.

Done. Remember though that we can use fsync=False -- the odd thing is
that specifying fsync=True as well as fsync=False ensure that inserting
a doc with an existing key raises a DuplicateKeyError, while _not_
specifying it suppresses the DuplicateKeyError.

> 
> https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/lp.py
> File charmworld/jobs/lp.py (right):
> 
> https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/lp.py#newcode73
> 
> charmworld/jobs/lp.py:73: connection = getconnection(settings)
> does this need to have the safe= flag set to get a safe connection?

No, the trick here are the default parameters of getconnection():

def getconnection(settings, fsync=False, safe=False):
    [...]
    connection = pymongo.Connection(
        url_or_host, port,
        fsync=fsync,
        safe=safe,
    )

So, fsync _is_ specified and everyting is fine.

> 
> https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/review.py
> File charmworld/jobs/review.py (right):
> 
>
https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/review.py#newco...
> 
> charmworld/jobs/review.py:117: c = pymongo.Connection(MONGO_URL,
> fsync=True)
> again, can safe= work?

changed.

> 
>
https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/tests/test_util...
> 
> File charmworld/jobs/tests/test_utils.py (right):
> 
>
https://codereview.appspot.com/7379053/diff/1/charmworld/jobs/tests/test_util...
> 
> charmworld/jobs/tests/test_utils.py:38: self.assertIs(None,
> self.db['locks'].find_one({'_id': LOCK_NAME}))
> should this be wrapped into a helper? utils.check_lock(db, LOCK_NAME) or
> something?

A matter of taste, I think ;) Personally, Id don't see much benefit in
changing a one-line call with another one-line call. And I prefer to see
the details where they are needed, in this case, that find_one() looks
for any record with the right ID.

> 
> https://codereview.appspot.com/7379053/diff/1/default.ini
> File default.ini (right):
> 
> https://codereview.appspot.com/7379053/diff/1/default.ini#newcode42
> default.ini:42: script_lease_time = 30
> we need the bug to note that the charm should be updated to create this
> time based on the config time used to cron the importer.
> 
> https://codereview.appspot.com/7379053/
>

Expand All Messages | Collapse All Messages