Wiki Searchand Replace Using the Xmlrpc Web Service

Remember WikiSpiderREST.py, the script that crawled BittyWiki pages using its REST API to perform search-and-replace operations? You had to write a custom class (BittyWikiRESTAPl) to construct the right URLs to use against the REST interface, and a custom XML parser to process the response documents you got in return. Of course, once you have written that stuff, it can be reused in any application that uses BittyWiki's REST API, but the main selling point of XML-RPC is that such classes aren't necessary: xmlrpclib handles everything. Let's put that to the test by rewriting WikiSpiderREST.py as WikiSpiderXMLRPC.py:

#!/usr/bin/python

import

re

import

xmlrpclib

class WikiReplaceSpider:

"A

class for running search-and-replace against a web of wiki pages."

WIKI_WORD = re.compile('(([A-Z][a-z0-9]*){2,})')

def

_init_(self, rpcURL):

"Accepts a URL to a BittyWiki XML-RPC API."

server = xmlrpclib.ServerProxy(rpcURL)

self.api = server.bittywiki

def

replace(self, find, replace):

"""Spider wiki pages starting at the front page, accessing them

and changing them via the XML-RPC API."""

processed = {} #Keep track of the pages already processed.

todo = ['HomePage'] #Start at the front page of the wiki.

while todo:

for pageName in todo:

print 'Checking "%s"' % pageName

try:

pageText = self.api.getPage(pageName)

except xmlrpclib.Fault, fault:

if fault.faultString.find("No such page") == 0:

#We tried to access a page that doesn't exist;

#not a big deal.

pass

else:

#Some other problem; pass it on up.

raise xmlrpclib.Fault, fault

else:

#This page actually exists; process it.

#First, find any WikiWords in this page: they may

#reference other pages.

for wikiWord in self.WIKI_WORD.findall(pageText):

linkPage = wikiWord[0]

if not processed.get(linkPage) and linkPage not in todo:

#We haven't processed this page yet: put it on

#the to-do list.

todo.append(linkPage)

#Run the search-and-replace on the page text to get the #new text of the page.

newText = pageText.replace(find, replace)

#Check to see if this page name matches the search #string. If it does, delete it and recreate it #with the new text; otherwise, just save the new #text in the existing page.

newPageName = pageName.replace(find, replace) if newPageName != pageName:

print ' Deleting "%s", will recreate as "%s"' \

% (pageName, newPageName) self.api.delete(pageName) if newPageName != pageName or newText != pageText: print ' Saving "%s"' % newPageName saveResponse = self.api.save(newPageName, newText) #Mark the new page as processed so we don't go through #it a second time, if newPageName != pageName:

processed[newPageName] = True processed[pageName] = True todo.remove(pageName)

The WikiReplaceSpider class looks almost exactly the same as before. The only big difference is that, whereas before a method call like api.getPage moved into custom REST code you had to write, it now moves into pre-existing xmlrpclib code. Without those API-specific classes to implement, the WikiReplaceSpider class is pretty much all the code:

if _name_ == '_main_':

import sys

if len(sys.argv) == 4:

rpcURL, find, replace

= sys.argv[1:]

else:

print 'Usage: %s [URL

to BittyWiki XML-RPC API]

[find] [replace]' \

% sys.argv[0]

sys.exit(l)

WikiReplaceSpider(rpcURL).

.replace(find, replace)

That's it. This spider works just like the REST version, but it takes less code because there's no one-off code to deal with the specifics of the REST API. This script is run just like the REST version, but the URL passed in is the URL to the XML-RPC interface, instead of the URL to the REST interface:

$ python WikiSpiderXMLRPC.py http://localhost:8000/cgi-bin/bittywiki-xmlrpc.cgi Foo Bar

Checking "HomePage"

Saving "HomePage" Checking "FooCaseStudies"

Was this article helpful?

0 0

Post a comment