|
|
Created:
13 years, 7 months ago by Michael Mayorov Modified:
13 years, 7 months ago Visibility:
Public. |
DescriptionSearch and indexation for MoinMoin 2
Patch Set 1 #
Total comments: 13
Patch Set 2 : Creating tmp files with tempfile.mkdtemp, fixing some name issues #
Total comments: 1
Patch Set 3 : Changes to indexing schema test #
Total comments: 13
Patch Set 4 : Fixing Thomas Waldmann's and Reimar Bauer's remarks #Patch Set 5 : add add_document,remove_document, update_document methods #
Total comments: 15
Patch Set 6 : Fixed some stupid mistakes, add (may be) correct names. test_indexing has no fails anymore. #
Total comments: 1
Patch Set 7 : changed testing alghorithm in test_indexing.py #
Total comments: 10
Patch Set 8 : small fixes for test_indexing.py #
Total comments: 2
Patch Set 9 : Rebuilding indexes script #
Total comments: 12
Patch Set 10 : Make some changes in algorithm and value names #
Total comments: 5
Patch Set 11 : Fixes ugly code and some value names #Patch Set 12 : Using MultiSegmentWriter instead of standard writer #
Total comments: 8
Patch Set 13 : Now we can build partial indexes #
Total comments: 12
Patch Set 14 : Some fixes in building indexes. #
Total comments: 7
Patch Set 15 : minor fixes in building indexes, now working with python2.6 #
Total comments: 1
Patch Set 16 : Add clean option #
Total comments: 7
Patch Set 17 : Fixes in cleaning #
Total comments: 5
Patch Set 18 : WhooshIndex.clean_index() for clearing indexes, changes in rebuild_indexes script #
Total comments: 2
Patch Set 19 : Fixing WhooshIndex class, add test for building indexes #
Total comments: 6
Patch Set 20 : Rename build_index() to create_index(), remove creating-test, some fixes in WhooshIndex #
Total comments: 1
Patch Set 21 : Now if create=False and indexes can not be open, raise FatalError #Patch Set 22 : Add script for updating indexes #
Total comments: 8
Patch Set 23 : Change algorithm in update_indexes.py #
Total comments: 5
Patch Set 24 : New algo for updating indexes,but for now isnt working,traceback http://paste.pocoo.org/show/416926/ #
Total comments: 7
Patch Set 25 : Fixes in update_script #
Total comments: 1
Patch Set 26 : Fix mistakes in update_indexes #Patch Set 27 : Update indexes script #Patch Set 28 : Update indexes, changes in algo #
Total comments: 16
Patch Set 29 : Fix in update_indexes, now working for latest_revisions. #Patch Set 30 : now using diff for storage and index revisions,handling exceptions for thrashed items #
Total comments: 7
Patch Set 31 : Fixes from previous review #
Total comments: 8
Patch Set 32 : Fixed redundancy #
Total comments: 2
Patch Set 33 : minor changes #MessagesTotal messages: 33
some comments http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_acl.py File MoinMoin/search/_tests/test_acl.py (right): http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_acl.... MoinMoin/search/_tests/test_acl.py:17: assert False OK, that is not a test yet, but I guess it is just not finished... Don't forget to add a real test and the usual file header. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:5: MoinMoin - MoinMoin.search.indexing Tests ^ that blank is not in our header template. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:17: index_object = WhooshIndex(index_dir="MoinMoin/search/_tests/index/") do not put test data into the source code directories see other test code, create a temp dir for that. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:20: maybe we need to rename these 2, because in fact BOTH are about revisions (one about all revisions, one about latest revisions). you do not have an items index yet at all (you do not index item metadata yet at all). So, in a separate changeset, please rename those 2. maybe "all_revs" and "latest_rev"? http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:25: content=u"This is a content", This is content. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:26: mimetype=u"text/html;charset=utf-8", text/plain;... http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:27: tags=[u"nothing", u"intresting"], interesting http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:30: acl=[u'JoeDoe:read,write'], Nope. It is unicode, not a list of unicode. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:31: language=u"ru", en :) (and please no ru in tests, that stuff is non-ascii) http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:32: metadata=u"Other related stuff", I guess that metadata field is not really specified yet. http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/_tests/test_inde... MoinMoin/search/_tests/test_indexing.py:34: assert False add a real test http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/analyzers.py File MoinMoin/search/analyzers.py (right): http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/analyzers.py#new... MoinMoin/search/analyzers.py:10: from flask import g as flaskg where is it used? http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/upload.py File MoinMoin/search/upload.py (right): http://codereview.appspot.com/4539114/diff/1/MoinMoin/search/upload.py#newcode1 MoinMoin/search/upload.py:1: #!/usr/bin/env python you don't add upload.py to the repo. hg rm --force upload.py
Sign in to reply to this message.
clarify http://codereview.appspot.com/4539114/diff/3002/MoinMoin/search/_tests/test_i... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/3002/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:23: what I meant with my previous comment about naming was changing the names in the WhooshIndex class, so it is index_object.<somethingelse>.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:26: schema = getattr(index_object, self.index_name) that is NOT a schema, it is an index! http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:38: assert result.estimated_length() > 0 why not just assert len(result) == expected_length? http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:40: shutil.rmtree(index_dir) well, i did it like that in my minimal code where i tried to reproduce the problem you were seeing. but for tests, it is better to have the "test preparations" in the setup_method() (or setup_class, depends) and the "test cleanup" in the teardown_method() (or teardown_class). http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:45: index_name = "items" that is not really an "items" index, it is a index for the latest revision's data. i already requested renaming it. http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:48: "item_name" : u"Document One", pep8 does not want a space left of the colon. http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:63: (u"item_name", u"Document"), as I already said, do not format like that, it is a pain to maintain and creates more work than needed. http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:82: "item_name" : u"Document One", see above http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/analyzers.py File MoinMoin/search/analyzers.py (right): http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/analyzers.py#... MoinMoin/search/analyzers.py:48: assert isinstance(acl_right, unicode), "%r is not unicode" % acl_right referring to above 3 lines: nope :) although that worked as a quick fix, you can't leave it like that. write SANE code, please. http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/analyzers.py#... MoinMoin/search/analyzers.py:51: acl = ContentACL(app.cfg, value) as you need to give a list there, use [value] http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/indexing.py File MoinMoin/search/indexing.py (right): http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/indexing.py#n... MoinMoin/search/indexing.py:50: indexes = [('items', 'latest_revisions'), the first tuple elements also need fixing, of course. as a general rule: avoid naming things badly or wrongly, that'll just confuse YOU and also all other people who read your code.
Sign in to reply to this message.
tempfile.mkdtemp should get a dir some other minor findings http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:23: index_dir = tempfile.mkdtemp('', 'moin-') shouldn't that have a dir keyword? I would prefer a given file system location The user of mkdtemp() is responsible for deleting the temporary directory and its contents when done with it. A interpretation of that is that noone else should get the inspiration to delete this dir http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:37: print query print ???? http://codereview.appspot.com/4539114/diff/8001/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:45: index_name = "items" the "items" name is may be misleading. items have a different meaning
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:27: much better now http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:29: index_object = WhooshIndex(index_dir=self.index_dir) that is an instance of your WhooshIndex class http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:32: index_object.add_document(self.index_name, **document) please call whoosh directly and after the loop, do the commit http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:34: index = getattr(index_object, self.index_name) you confuse yourself with unclear names. index, index_object? http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:39: assert len(result) > 0 i repeat: do precise assertions if you expect one, then == 1 (not > 0) http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:47: index_name = "items" see my and reimars comment in previous review http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:59: "acl": [u"JoeDoe:read,write"], i give up http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:73: (u"acl", u"'JoeDoe:read'"), compare with above http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:88: "content": u"SRANIY Wi-Fi", english please http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/analyzers.py File MoinMoin/search/analyzers.py (right): http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/analyzers.py#... MoinMoin/search/analyzers.py:48: assert isinstance(acl_right, unicode), "%r is not unicode" % acl_right see my comment in last review how about just using assert isinstance(value, unicode)? http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/analyzers.py#... MoinMoin/search/analyzers.py:51: acl = ContentACL(app.cfg, value) see my comment from last review http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/indexing.py File MoinMoin/search/indexing.py (right): http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/indexing.py#n... MoinMoin/search/indexing.py:50: indexes = [('items', 'latest_revisions'), see my comment from last review http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/indexing.py#n... MoinMoin/search/indexing.py:96: no, never, not at all. you're not going to duplicate whoosh api again in your object. that's not necessary and it would have catastrophic performance impact if you do bulk indexing of many documents (as you do a commit after each add_document). http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/indexing.py#n... MoinMoin/search/indexing.py:103: see above. http://codereview.appspot.com/4539114/diff/3004/MoinMoin/search/indexing.py#n... MoinMoin/search/indexing.py:110: see above.
Sign in to reply to this message.
looks fine now, just one minor comment if test works, please commit / push. http://codereview.appspot.com/4539114/diff/18001/MoinMoin/search/analyzers.py File MoinMoin/search/analyzers.py (right): http://codereview.appspot.com/4539114/diff/18001/MoinMoin/search/analyzers.py... MoinMoin/search/analyzers.py:45: assert isinstance(value, unicode) # so you'll notice if it blows up you can remove the blow up comment now. the reason why I originally said that is was because you (back then) were expecting a list of unicode. the problem is that a unicode object is also a iterable ("list") of unicode chars, which can result in unexpected "s" "t" "u" "f" "f" happening.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:24: "rev_no": 1, first revno is 0 http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:25: "datetime": datetime.utcnow(), we already talked about why this is problematic http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:40: "content": u"Oh moin gott", this is not "en", we also already talked about that. http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:61: "acl": u"Anonymous:write", don't confuse people by using a name like "Anonymous" and just giving write rights doesn't make sense either. http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:75: "acl": u"Anonymous:admin", just giving admin rights only also does not make sense. http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:83: (u"item_name", u"Document", 4, 4), obviously you only get 4 results when searching in all revs, in latest revs, you should only get 2 results. this similarly applies for all queries here. http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:121: all_revs_doc[schema_field] = document[schema_field] all_revs_names = all_revs_index.schema.names() all_revs_doc = dict([(key, value) for key, value in document.items() if key in all_revs_names]) http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:127: for field_name, query, latst_res_len, all_res_len in queries: maybe using 2 separate tests for latest and all is easier. http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:129: assert len(latest_revs_searcher.search(query)) == latst_res_len you only check whether result count is as expected, not whether it really found the right document http://codereview.appspot.com/4539114/diff/9005/MoinMoin/search/_tests/test_i... MoinMoin/search/_tests/test_indexing.py:132: assert len(all_revs_searcher.search(query)) == all_res_len if ... in ...: assert ...
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/18002/MoinMoin/search/_tests/test_... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/18002/MoinMoin/search/_tests/test_... MoinMoin/search/_tests/test_indexing.py:89: (u"itemlinks", u"Home", 2, 1), how can latest be more than all? http://codereview.appspot.com/4539114/diff/18002/MoinMoin/search/_tests/test_... MoinMoin/search/_tests/test_indexing.py:134: print latest_revs_searcher.search(query) no print
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:5: MoinMoin - Rebuild indexes filename is "build indexes", here you say "rebuild indexes". consistency? http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:14: #import LocalConfig remove this, it doesn't work anyway http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:17: description = 'Rebuild indexes using flaskg.storage.' a user/admin does not know what flaskg.storage is http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:22: backend = flaskg.unprotected_storage x = y = z http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:24: all_revs_index = getattr(index_object, index_object.indexes[1][0]) above 2 lines are pretty ugly. as you know how they are called, you can do that easier. http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:25: all_revs_writer = all_revs_index.writer() with ... as ...: ... (commits automatically) http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:26: for backend_item in backend.iter_items_noindex(): just "item" is enough http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:28: revision = backend_item.get_revision(rev_no) just "rev"? good that you used rev_no for the number, so you have rev free for the revision object. http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:32: if key in latest_revs_index.schema.names()} you call latest_revs_index.schema.names() for each key. and it always returns the same stuff. inefficient. http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:34: latest_revs_writer.update_document(**metadata) as the scope where you have "latest_index_writer" is just 1 line, you could also just call it "writer" or "lw" or.... but that just as a side note. the MAIN issue with this code is that you call N times (N == revision count, can be thousands) update_document() without good reason. You know what the latest revision is, so you just need to add_document() with its data - ONCE - instead of thousands of calls to update_document(). http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:38: if key in all_revs_index.schema.names()} see above http://codereview.appspot.com/4539114/diff/27001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:41: all_revs_writer.commit() if you use with ... no need for this, done automatically.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:24: latest_revs_fields_name = latest_revs_index.schema.names() latest_rev_field_names http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:26: all_revs_fields_name = all_revs_index.schema.names() see above http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:33: if key in all_revs_fields_name} key: value for key, value in revision.items() ... http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:34: all_revs_writer.add_document(**metadata) add a comment below this line like # revision is now the latest revision of this item http://codereview.appspot.com/4539114/diff/13003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:37: if key in latest_revs_fields_name} see above
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:23: help='Number of processors. Default: 1'), I know it is described like that. But I think we should find out what it really does (and then also fix the whoosh docs). I guess it gives the number of indexing processes. One processor (as in CPU) can have multiple cores, cores can have hyperthreading, so this description might be misleading. http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:24: Option('--limitmb', '-l', required=False, dest='limitmb', type=int, default=0, Maybe find out what the default is when you do not give that param to whoosh. http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:25: help='Limit(in mbytes) of memory. Default: 0(unlimit)') Limit(in mbytes) of memory. Default: 0(unlimit) Ehrm, unlimited is the default? Is that true? Also, you need to be more precise. It is PER PROCESS. Also, mind your spacing. http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:36: MultiSegmentWriter(latest_rev_index, procs, limitmb) as latest_rev_writer: that is 2.7 only and does not work with 2.6 I'll tell you the last time now that you have to work with python 2.6 and I won't check this for you from now on. If we find out late (like in midterm or final review) that your code does not work on 2.6, but crashes all over the place due to 2.7-only features, you might be in trouble. For python 2.6, you need 2 nested "with" statements. I know it is not as pretty, but if it doesn't work, we have no choice. In general, it was a good idea to move the latest rev "with" to the outer scope (at least for the code as it is now), because it is more efficient like that. http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:48: latest_rev_writer.add_document(**metadata) As a general comment (as already pointed out on IRC/EP), you may want to split the indexing code into 2 separate parts: indexing latest revs (rather quick) indexing all revs (can take looooong) For the user being able to call these 2 parts separately, you need 2 different commandlines.
Sign in to reply to this message.
add a blank before bracket ( if scripts never become imported to something else initialisation is not needed. On the other hand I looked at some scripts we don't use anything from self in run. Because for avoiding code duplications I would prefer we have the code of run in util or a better place as a function with a good name and imported to the script. http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:14: empty line why? http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:25: help='Limit(in mbytes) of memory. Default: 0(unlimit)') blanks missing http://codereview.appspot.com/4539114/diff/17002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:28: def run(self, procs, limitmb): procs and limitmb not intialized?
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:22: help='Number of processors the writer. Default: 1'), Number of index-writing processes. Default: 1 Please verify if that is true. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:24: help='Maximum memory (in megabytes) the writer will use for the indexing pool. Default: 10'), ... each index-writer will use ... So, if you have 4, it is 4*10MB. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:33: def build_all(): def build_both() be consistent. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:48: ok, you duplicated that to keep that small optimization that the latest rev is only fetched once from the backend. maybe add a comment about this, otherwise one could think this is just duplicated code that could be replaced by 2 calls to the other methods. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:71: if method not in [u"both", u"all-revs", u"latest-revs"]: usually, it is good style to format lists like that: [1, 2, 3, ] or [1, 2, 3, ] So, if you add elements, you can always add "element,". http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:74: sys.exit(1) maybe check if the argument parser can do this "valid option" check for you somehow. not sure if it can. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:83: import sys if you need "sys" at multiple places, you do a global import for it. http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:86: sys.exit(0) what does it do if you do not sys.exit(0) here? http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:87: if method == u"all-revs": elif ... http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:89: sys.exit(0) see above http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:90: if method == u"latest-revs": elif ... http://codereview.appspot.com/4539114/diff/33001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:91: build_latest_revs() instead of doing the check for valid methods first, you could also continue here with: else: # handle invalid methods ... As having wrong methods is rare and not performance critical, that would be cleaner.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:22: Option('--procs', '-p', required=False, dest='procs', type=int, default=None, Is None a valid default if type is int? Maybe check if that really works, not sure. http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:27: choices=("all-revs", "latest-revs", "both"), good! :) http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:34: # Build indexes for all and latest revisions that could be rather a docstring http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:51: # Build indexes for all revisions. that could be rather a docstring http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:60: all_rev_writer.add_document(**metadata) please 1 empty line between methods http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:61: # Build indexes for latest revisions. that could be rather a docstring http://codereview.appspot.com/4539114/diff/38001/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:79: if method == u"both": build_both() nope, do not put multiple statements in one line
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/42002/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/42002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:80: if method == u"both": in line 27, you give str values as choices, here you use unicode otoh, in line 27, you also tell it is unicode decide :) for commandline options, maybe str makes more sense.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:29: or just nothing to lose'), you are duplicating stuff, that's not good. how about just having a boolean for --clean and just give it as a flag to the existing build_* methods, so they can kill the index before building it, if wanted? http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:36: def build_both(): add a clean param here http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:37: """Build indexes for all and latest revisions""" check clean param, call create_in if wanted http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:58: def build_all_revs(): see above http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:69: def build_latest_revs(): see above http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:87: btw, you are duplicating some functionality of your WhooshIndex class here, which also has code to create an index. and that getattr/setattr way of patching things is a bit ugly. http://codereview.appspot.com/4539114/diff/39002/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:103: import sys didn't you import that globally?
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... File MoinMoin/script/maint/build_indexes.py (right): http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:27: Option('--clean', required=False, dest='clean', type=bool, choices=(True,), are you sure you need choices for a boolean? what is the default? does it need to be given? http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:28: help='Clear index files of given index-name. ATTENITON: use it if your indexes broke, you had backup\ 'Create new index from scratch.' http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:47: ) can't you just call your WhooshIndex method to do this? http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:75: ) see above http://codereview.appspot.com/4539114/diff/41003/MoinMoin/script/maint/build_... MoinMoin/script/maint/build_indexes.py:92: ) see above
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/44002/MoinMoin/search/indexing.py File MoinMoin/search/indexing.py (right): http://codereview.appspot.com/4539114/diff/44002/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:100: if given_indexname in index_schema that works, but isn't the right way to do that you KNOW where the index name and the schema name is located in the tuple, so you do not need to "search" the tuple using "in". BTW, if you have to do such stuff, it is an indication that maybe the datastructure used for self.indexes is not appropriate. How about, instead of using a list of 2-tuples, to just use a dict, that maps index name -> schema name? Then you don't need to "search", but just "access". http://codereview.appspot.com/4539114/diff/44002/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:110: setattr(self, indexname, new_index) you are thinking much too complicated and this results in hard to understand, duplicated and ugly code. Why don't you just build a method that is similar to open_index(), using the part of the code of open_index() that deals with creating an index, and put that stuff into a method create_index()? Then, call that create_index() method from whereever you need to create an index, including (if possible) from open_index().
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/_tests/test_... File MoinMoin/search/_tests/test_indexing.py (right): http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/_tests/test_... MoinMoin/search/_tests/test_indexing.py:138: for field_name, query, latst_res_len, all_res_len in queries: is there a special reason why you are saving 1 char in "latest" here? http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/_tests/test_... MoinMoin/search/_tests/test_indexing.py:152: btw, you are mixing many tests into a single test method. this is ok, if all the tests are testing basically the same thing, but it seems that the tests are rather for 2 different things, so they should be in 2 test methods and each of the methods named appropriately. http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/_tests/test_... MoinMoin/search/_tests/test_indexing.py:154: assert index_object.all_revisions_index.is_empty() == True you don't need the "== True" as assert gets a boolean anyway. and in case you want to be rather explicit, it would be "is True", never "== True". this is because there is only a single object for True, False and None. "is" is checking object identity, while "==" is comparing the values of objects. http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/indexing.py File MoinMoin/search/indexing.py (right): http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:80: return structurally, that is ALMOST right, but it won't do the right thing in case build_index fails. in the old code, it was running into FatalError then and terminated moin. with new code, you just log it and continue (and you have to expect more errors then, because moin won't be able to work without index). So, this needs more change. I suggest you really ONLY create an index in build_index/create_index method and you keep the open_dir() at this place HERE and make sure errors get handled in same way no matter which one of the open_dir calls is failing. http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:86: def build_index(self, indexdir, indexname, schema): maybe rather create_index(...)? "build" sounds a bit like there would be something in it afterwards, but you just create an (empty) index. compare to if you say "build a house", there is a house expected afterwards, not just an empty building place. http://codereview.appspot.com/4539114/diff/33005/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:96: setattr(self, indexname, index) see other comment, better move open_dir/setattr to open_index method
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/52001/MoinMoin/search/indexing.py File MoinMoin/search/indexing.py (right): http://codereview.appspot.com/4539114/diff/52001/MoinMoin/search/indexing.py#... MoinMoin/search/indexing.py:85: raise FatalError("can't open whoosh index") wrong place if create is False and the first open_dir() fails, it is as fatal as if the 2nd open_dir fails.
Sign in to reply to this message.
str(key) ? and a unspecific != http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:42: metadata = dict([(str(key), value) just a question is that str(key) correct here? Can't it be nonascii? http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:63: if backend_mtime != index_mtime: # document has changed since last index update is != to unspecific. should it be > because now < also makes it changing, http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:65: uuid=metadata["uuid"]) indenting
Sign in to reply to this message.
Comments http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:42: metadata = dict([(str(key), value) On 2011/06/20 15:36:44, Reimar Bauer wrote: > just a question is that str(key) correct here? Can't it be nonascii? Yes, it is keys from backends http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:63: if backend_mtime != index_mtime: # document has changed since last index update On 2011/06/20 15:36:44, Reimar Bauer wrote: > is != to unspecific. should it be > because now < also makes it changing, I'm not sure. But may be user delete some stuff and this revision was created earlier? http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:65: uuid=metadata["uuid"]) On 2011/06/20 15:36:44, Reimar Bauer wrote: > indenting Ok.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:40: for rev_no in item.list_revisions(): why are you checking all revisions?
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/54001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:40: for rev_no in item.list_revisions(): On 2011/06/20 23:39:06, ThomasWaldmann wrote: > why are you checking all revisions? Because we compare mtime each revision from backend to each revision in indexes
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:22: # yeild dict with fields of each revision by given uuid typo http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:64: uuid=latest_index_rev["uuid"]) maybe you can get that directly from latest_index_rev? http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:67: while(True): this is not C! http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:68: try: somehow code below seems to assume that you get indexed stuff in some specific order? http://codereview.appspot.com/4539114/diff/34002/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:69: backend_datetime = datetime.fromtimestamp(backend_rev["mtime"]) used for?
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:29: yield revision that doesn't read sane if this function is supposed to yield revisions, why should it yield a value None rather than nothing at all? http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:53: uuid = item.get_revision(0)[u"uuid"] MoinMoin.config has constants for all this stuff, like UUID. http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:60: latest_index_rev = latest_index_rev_iter.next() as there is only 1 (or nothing), maybe an iterator isn't the right thing anyway http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:65: if latest_index_rev == None: is None http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:68: elif backend_datetime != latest_index_rev[u"mtime"]: see above, MTIME http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:75: while(True): see last review
Sign in to reply to this message.
found bug http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/46003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:29: yield revision On 2011/06/22 14:29:20, ThomasWaldmann wrote: > that doesn't read sane > > if this function is supposed to yield revisions, why should it yield a value > None rather than nothing at all? uuid value given from backend. If wiki was interrupted for some reason and document wasn't saved(I mean all revisions) then StopIteration error raises. http://codereview.appspot.com/4539114/diff/35005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:27: yield None Bug, it should be iterate from last to first
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:35: def yield_backend_rev(item): maybe don't but "yield" in the function name how about "item_storage_revs"? same thing for the other function name... http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:36: for revision in reversed(item.list_revisions()): why reversed? http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:39: # Convert fields from backend fortat to whoosh schema typo as a general comment: can you PLEASE just read your stuff yourself BEFORE asking for review? http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:45: metadata["rev_no"] = rev_no[0] rev_no sounds like 1 if it is a list, it should be called rev_nos http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:64: index_rev_list = [revision for revision in yield_index_rev(all_rev_searcher, name)] to convert an iterator into a list, you just use list(iterator) http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:73: if revision not in index_rev_list] didn't i tell you to use set operations? use a more meaningful name than tmp. http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:76: last_revisions.append(tmp[0]) # Add last revision if you do not use reversed() above, the last one is at its natural index of [-1]. http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:78: if revision not in backend_rev_list] set ops? http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:84: with latest_rev_index.writer() as all_rev_writer: wtf? names!!! http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:103: continue why? explain the semantics of above 2 lines please.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:45: metadata["rev_no"] = rev_no[0] On 2011/06/24 11:02:52, ThomasWaldmann wrote: > rev_no sounds like 1 > > if it is a list, it should be called rev_nos It's list of one value. rev_no=revision.keys() http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:64: index_rev_list = [revision for revision in yield_index_rev(all_rev_searcher, name)] On 2011/06/24 11:02:52, ThomasWaldmann wrote: > to convert an iterator into a list, you just use list(iterator) ok http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:73: if revision not in index_rev_list] On 2011/06/24 11:02:52, ThomasWaldmann wrote: > didn't i tell you to use set operations? You mean set()? but revision is dict and it's unhashable > > use a more meaningful name than tmp. ok http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:78: if revision not in backend_rev_list] On 2011/06/24 11:02:52, ThomasWaldmann wrote: > set ops? see above http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:84: with latest_rev_index.writer() as all_rev_writer: On 2011/06/24 11:02:52, ThomasWaldmann wrote: > wtf? names!!! oops, my inattention. http://codereview.appspot.com/4539114/diff/55003/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:103: continue On 2011/06/24 11:02:52, ThomasWaldmann wrote: > why? explain the semantics of above 2 lines please. Whoosh returns "None" if document was not found in indexes. But "continue" should replaced by "break"
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:55: last_documents = [] maybe rather use "latest" instead of "last", I think it expresses better what you mean. http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:58: index_rev_list = list(item_index_revs(all_rev_searcher, name)) btw, why do you use a generator if you just wanted the list? you could also just return the list from the function and not use the for loop there and the yield. http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:62: create_documents.append({name: add_rev_nos}) if you just want to add one name -> revnos thing, maybe a 2-tuple is more adequate. (same thing applies to some lines below) http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:63: last_documents.append({name: list(add_rev_nos)[-1]}) # Add last revision sets are unsorted, btw http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:71: for name, rev_no in last_revision.items(): if you can implement the 2-tuple approach, these 2 nested "fors" might get just 1 "for name, revno in last_documents:". http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:93: for name, rev_nos in revision.items(): see above
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/39007/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:63: last_documents.append({name: list(add_rev_nos)[-1]}) # Add last revision On 2011/06/26 08:42:15, ThomasWaldmann wrote: > sets are unsorted, btw But seems like whoosh searcher returns latest revision at the end.
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:24: # yield dict with fields of each revision by given name fix http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:29: return [None] needed? http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:30: return [revision["rev_no"] for revision in revs_found] rev http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:58: backend_rev_list = list(revision for revision in item.list_revisions()) “Thank you for calling the Department of Redundancy Department.” http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:71: name, rev_no = latest_revision you can unpack the tuple right in the "for" statement 2 lines above. for e1, e2 in list_of_2_tuples: ... http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:89: latest_rev_writer.delete_document(doc_number) maybe rather: if doc_number: ...delete... ? http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:93: name, rev_nos = revision see above http://codereview.appspot.com/4539114/diff/51005/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:105: name, rev_nos = revision see above
Sign in to reply to this message.
http://codereview.appspot.com/4539114/diff/61001/MoinMoin/script/maint/update... File MoinMoin/script/maint/update_indexes.py (right): http://codereview.appspot.com/4539114/diff/61001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:28: if revs_found: # if doc with this name doesn't exist if it is [] if it did not find anything, you can leave away that "if ...". http://codereview.appspot.com/4539114/diff/61001/MoinMoin/script/maint/update... MoinMoin/script/maint/update_indexes.py:61: latest_documents.append((name, list(add_rev_nos)[-1])) # Add latest revision don't assume that sets have order, they do not. it can happen that you are lucky and the [-1] is really the latest one, but this is not the case in general, so assume sets have random order.
Sign in to reply to this message.
|