Open main menu

Changes

=Introduction=
 
'''After [http://www.ntg.nl/EuroTeX2009 eurotex meeting 2009],
I'm going to fix some typos here and there and
maybe expand some examples too. I estimated that around 20 September article and site will be in synch.'''
 
 
----
'''!! W A R N I N G !! '''
=Python packages=
These are python packages that are not in standard libraries ; "✔" means that I have made only small lua wrapper .<br/>
Only for PythonImagingLibrary (PIL) I have made a <context>{\x \Context}</context> example.* numpy,scipy matplot ✔ [[#Scipy|here]]
* odfpy ✔
* PIL, python imaging library ✔[[#Python_Imaging_Library_.28PIL.29| here]]
For these, I stille still have to decide what todo.
* TO FIX ; pygegl (I like its syntax, but need too much and it's easy only in python)
* ghostscript 8.64 ✔ [[#Ghostscript|here]]
* graphviz 2.24.0 ✔ [[#Graphviz|here]]
* ImageMagick-6.4.9 with pythonmagickwand ✔[[#ImageMagick|here]]
* fontforge 20090224 ✔ [[#Fontforge|here]] Useful to check symbols collision, and if one want to play with the last fontforge, eg to draw the outline of a glyph .
* R-2.8.1 with rpy2-2.0.3 (For Maurizio "Mau" Himmelman , GUIT) ✔ [[#R|here]] (see someone says also [http://micahelliott.com/2009/03/considering-r-as-python-supplement hereconsidering-r-as-python-supplement ]) .* quantlib 0.9.7 ✔ (need an example with output in pdf)* dbxml-2.4.16 (and sqlite) [[#dbxml|here]]
= Dedicated systems =
$HOME_LUN/sage/local/bin/python setup.py build
cd $HOME_LUN
mkdir tests-SAGEMATH && cd tests-SAGEMATH
##
## I have already installed prev. python.so, I don't want mess things
[[Image:Test-ode.png|900px]]
 
 
(other examples follows...)
==ROOT (CERN) ==
For more infos, see [http://root.cern.ch here] ([http://root.cern.ch/cgi-bin/print_hit_bold.pl/root/HowtoPyROOT.html?python#first_hit here] for python stuffs).
Under Linux installation is not difficult at all, so but in this case I choose to not create a luatex-lunatic apart, as done above for sagemath.<br/>See an example [[#ROOT| here]] .
= ConTeXt mkIV examples=This example shot how to literally embedHere I will collect some tex snippets,just to show some ideasoriginal python source code .
== Scipy ==<table><tr><td><texcode>
\startluacode
function testSCIPYtest_ROOT(figname,dpifilename)
require("python")
pg = python.globals()
python.apply = python.eval('apply') or {}
np = python.import("numpy")
mlab = python.import("matplotlib.mlab")
griddata = mlab.griddata
plt = python.import("matplotlib.pyplot")
ma = np.ma
random = python.import("numpy.random")
uniform = random.uniform
-- make up some randomly distributed data npts = 200 x = uniform(-2,2,npts) y = uniform(-2,2,npts) -- z = x*nppython.expexecute(-x**2-y**2)[[ z = x.__mul__( np.exp( (x.__pow__def run(2).__add__(y.__pow__(2))).__neg__() ) filename): -- define grid. xi = np.linspace(-2.1,2.1 from ROOT import TCanvas,100)TGraph yi = np.linspace(-2.1,2.1,100) from ROOT import gROOT -- grid the data. from math import sin zi = griddata(x,y,z,xi,yi) from array import array -- contour the gridded data, plotting dots at the randomly spaced data points. -- we put this in python globals space -- CS = plt gROOT.contourReset(xi,yi,zi,15,linewidths=0.5,colors='k') pg.xi = xi ; pg.yi = yi ; pg.zi = zi args c1 = python.evalTCanvas("[xi,yi,zi,15]") kv = python.eval("{'linewidthc1': 0.5 ,'colors' :'kA Simple Graph Example'}") CS = python.apply(plt.contour, args,kv) -- pg.jet = plt.cm.jet args = python.eval("[xi200,yi10,zi700,15]"500 ) kv = python.eval("{'cmap': jet}") CS = python c1.applySetFillColor(plt.contourf, args,kv42 ) -- draw colorbar plt c1.colorbarSetGrid() -- plot data points. pg.x n = x; pg.y = y 20 args = python.eval("[ x,y]") kv = python.evalarray("{'marker': 'od'), array( 'cd':'b','s':5}") CS = python.apply(plt.scatter, args,kv) plt.xlim for i in range(-2,2n ): plt x.ylimappend(-2,2) plt0.title(string.format('griddata test (%1*i points)',npts)) --plt y.savefigappend(figname, dpi, 'white') -- pg.figname = figname ; pg.dpi = dpi args = python.eval10*sin("x[fignamei]") kv = python+0.eval("{'dpi': dpi ,'facecolor' :'white'}"2 ) CS = python.apply(plt.savefig, args,kv)end\stopluacode
gr = TGraph( n, x, y ) gr.SetLineColor( 2 ) gr.SetLineWidth( 4 )\def\testSCIPY[#1]{% gr.SetMarkerColor( 4 ) gr.SetMarkerStyle( 21 ) gr.SetTitle( 'a simple graph' ) gr.GetXaxis().SetTitle( 'X title' ) gr.GetYaxis().SetTitle( 'Y title' ) gr.Draw( 'ACP' ) c1.Update() c1.Print(filename)\getparameters[scipy][#1]%) run = pg.run\ctxlua{testSCIPY run("\csname scipyfigname\endcsname","\csname scipydpi\endcsname"filename)}%end\externalfigure[\csname scipyfigname\endcsname]%}stopluacode
\starttext
\startTEXpage
\testSCIPYctxlua{test_ROOT("testsin.pdf")}\rotate[fignamerotation=90]{test-scipy-1\externalfigure[testsin.pdf},dpi][width={1505cm]}]
\stopTEXpage
\stoptext
</texcode> </td><td> [[Image:Testsin.jpg|512px]] </td> </tr></table>
We can do a bit better: separate python code from lua code .<br/>
Save this in <tt>test-ROOT1.py</tt> (so it's also easy to test) :
<pre>
from ROOT import TCanvas, TGraph ,TGraphErrors,TMultiGraph
from ROOT import gROOT
from math import sin
from array import array
def run(filename): c1 == Python Imaging Library TCanvas("c1","multigraph",200,10,700,500) c1.SetGrid(PIL) ==
{| class # draw a frame to define the range mg = TMultiGraph() # create first graph n = 24; x = array('d',range(24)) data = file('data').readlines() for line in data: line = line.strip() y = array('d',[float(d) for d in line.split()]) gr = TGraph(n,x,y) gr.Fit("pol6","wikitableq" ) mg.Add(gr)  mg.Draw("ap")  #force drawing of canvas to generate the fit TPaveStats c1.Update() c1.Print(filename) </pre>Here file 'data' is a 110 lines file with 24 floats values space separated,ie <br/><tt> 20.6000 19.4000 19.4000 18.3000 17.8000 16.1000 16.7000 21.1000 23.3000 26.1000 26.1000 27.2000 27.8000 28.3000 28.3000 27.2000 25.6000 22.8000 21.7000 21.7000 21.7000 21.7000 21.7000 21.7000 </tt>.<br/>Now a tex file, with a simple layer in lua as interface for python:{|
|-
|<texcode>
\startluacode
function testPILtest_ROOT(imageorig,imagesepiafilename)
require("python")
PIL_Image test = python.import("PIL.Image") PIL_ImageOps = python.import("PIL.ImageOps"'test-ROOT1') pythontest.executerun([[def make_linear_ramp(whitefilename): ramp = [] r, g, b = white for i in range(255): ramp.extend((r*i/255, g*i/255, b*i/255)) return ramp]]) -- make sepia ramp (tweak color as necessary) sepia = python.eval("make_linear_ramp((255, 240, 192))") im = PIL_Image.open(imageorig) -- convert to grayscale if not(im.mode == "L") then im = im.convert("L") end -- optional: apply contrast enhancement here, e.g. im = PIL_ImageOps.autocontrast(im)\stopluacode
-- apply sepia palette\starttext\startTEXpage im\ctxlua{test_ROOT("data.putpalette(sepiapdf")}\rotate[rotation=90]{\externalfigure[data.pdf]}\stopTEXpage\stoptext</texcode> | [[Image:Test-ROOT1.jpg|300px]]|}
-- convert back to RGB so we can save it as JPEG= ConTeXt mkIV examples= -- (alternativelyHere I will collect some tex snippets, save it in PNG or similar) im = imjust to show some ideas.convert("RGB")
im== Scipy ==Watch how python code <tt> z = x*np.saveexp(imagesepia-x**2-y**2)</tt>end\stopluacodeis translated in lua code <tt> z = x.__mul__( np.exp( (x.__pow__(2).__add__(y.__pow__(2))).__neg__() ) )</tt>
\def\SepiaImage#1#2{%
\ctxlua{testPIL("#1","#2")}%
\startcombination[2*1]
{\externalfigure[#1]}{\ss Orig.}
{\externalfigure[#2]}{\ss Sepia}
\stopcombination
}
 \starttext\startTEXpage\SepiaImage{lena.jpg}{lena-sepia.jpg}\stopTEXpage\stoptext</texcode> || [[Image:Test-PIL.png|330px]] |} class== ROOT ==This example shot how to literally embedoriginal python source code . {|"wikitable"
|-
|<texcode>
\startluacode
function test_ROOTtestSCIPY(filenamefigname,dpi)
require("python")
pg = python.globals()
python.apply = python.eval('apply') or {}
np = python.import("numpy")
mlab = python.import("matplotlib.mlab")
griddata = mlab.griddata
plt = python.import("matplotlib.pyplot")
ma = np.ma
random = python.import("numpy.random")
uniform = random.uniform
 
-- make up some randomly distributed data
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
-- z = x*np.exp(-x**2-y**2)
z = x.__mul__( np.exp( (x.__pow__(2).__add__(y.__pow__(2))).__neg__() ) )
-- define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,100)
-- grid the data.
zi = griddata(x,y,z,xi,yi)
-- contour the gridded data, plotting dots
-- at the randomly spaced data points.
-- we put this in python globals space
-- CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
pg.xi = xi ; pg.yi = yi ; pg.zi = zi
args = python.eval("[xi,yi,zi,15]")
kv = python.eval("{'linewidth': 0.5 ,'colors' :'k'}")
CS = python.apply(plt.contour, args,kv)
--
pg.jet = plt.cm.jet
args = python.eval("[xi,yi,zi,15]")
kv = python.eval("{'cmap': jet}")
CS = python.apply(plt.contourf, args,kv)
-- draw colorbar
plt.colorbar()
-- plot data points.
pg.x = x; pg.y = y
args = python.eval("[x,y]")
kv = python.eval("{'marker': 'o', 'c':'b','s':5}")
CS = python.apply(plt.scatter, args,kv)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title(string.format('griddata test (%i points)',npts))
--plt.savefig(figname, dpi, 'white')
--
pg.figname = figname ; pg.dpi = dpi
args = python.eval("[figname]")
kv = python.eval("{'dpi': dpi ,'facecolor' :'white'}")
CS = python.apply(plt.savefig, args,kv)
end
\stopluacode
python.execute([[
def run(filename):
from ROOT import TCanvas, TGraph
from ROOT import gROOT
from math import sin
from array import array
 
 
gROOT.Reset()
 
c1 = TCanvas( 'c1', 'A Simple Graph Example', 200, 10, 700, 500 )
c1.SetFillColor\def\testSCIPY[#1]{%\getparameters[scipy][#1]%\ctxlua{testSCIPY( 42 "\csname scipyfigname\endcsname", "\csname scipydpi\endcsname")}%\externalfigure[\csname scipyfigname\endcsname]% c1.SetGrid()}
n \starttext\startTEXpage\testSCIPY[figname= 20 x{test-scipy-1.pdf}, y dpi= array( 'd' ), array( 'd' ){150}]\stopTEXpage\stoptext</texcode>|| [[Image:Test-scipy.png|600px]] |}
for i in range== Python Imaging Library ( n ): x.append( 0.1*i ) y.append( 10*sin( x[i]+0.2 ) PIL)==
gr {| class= TGraph"wikitable" |-|<texcode>\startluacodefunction testPIL( nimageorig, x, y imagesepia) gr.SetLineColor require( 2 "python") gr PIL_Image = python.SetLineWidthimport( 4 "PIL.Image") gr PIL_ImageOps = python.SetMarkerColorimport( 4 "PIL.ImageOps") gr python.SetMarkerStyleexecute( 21 )[[ gr.SetTitledef make_linear_ramp( 'a simple graph' white): gr.GetXaxis().SetTitle( 'X title' )ramp = [] gr.GetYaxis().SetTitle( 'Y title' )r, g, b = white gr.Drawfor i in range( 'ACP' 255): c1 ramp.Updateextend((r*i/255, g*i/255, b*i/255)) c1.Print(filename)return ramp
]])
run -- make sepia ramp (tweak color as necessary) sepia = python.eval("make_linear_ramp((255, 240, 192))") im = pgPIL_Image.open(imageorig) -- convert to grayscale if not(im.runmode == "L") run then im = im.convert(filename"L") end -- optional: apply contrast enhancement here, e.g. im = PIL_ImageOps.autocontrast(im)  -- apply sepia palette im.putpalette(sepia)  -- convert back to RGB so we can save it as JPEG -- (alternatively, save it in PNG or similar) im = im.convert("RGB")  im.save(imagesepia)end\stopluacode \def\SepiaImage#1#2{%\ctxlua{testPIL("#1","#2")}%\startcombination[2*1]{\externalfigure[#1]}{\ss Orig.}{\externalfigure[#2]}{\ss Sepia}\stopcombination} 
\starttext
\startTEXpage
\ctxluaSepiaImage{test_ROOT("testsinlena.pdf")jpg}\rotate[rotation=90]{\externalfigure[testsinlena-sepia.pdf][width=5cm]jpg}
\stopTEXpage
\stoptext
</texcode> || [[Image:TestsinTest-PIL.jpgpng|512px330px]]
|}
We can do a bit better: separate python code from lua code .<br/>
Save this in <tt>test-ROOT1.py</tt> (so it's also easy to test) :
<pre>
from ROOT import TCanvas, TGraph ,TGraphErrors,TMultiGraph
from ROOT import gROOT
from math import sin
from array import array
def run(filename):
c1 = TCanvas("c1","multigraph",200,10,700,500)
c1.SetGrid()
# draw a frame to define the range mg = TMultiGraph() # create first graph n = 24;ImageMagick == x = array('d',range(24)) data = file(ImageMagick® 'data').readlines() for line in data[http: line = line//www.imagemagick.org/script/index.strip(php here]) y = array('d'is a software suite to create, edit, and compose bitmap images. It can read,[floatconvert and write images in a variety of formats (dover 100) for d in lineincluding DPX, EXR, GIF, JPEG, JPEG-2000, PDF, PhotoCD, PNG, Postscript, SVG, and TIFF. Use ImageMagick to translate, flip, mirror, rotate, scale, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.split()]) gr = TGraph(n''There are at least two python bindings,x,y)and this time I consider gr[http://www.Fit(procoders.net/?p=39 PythonMagickWand] which is a binding "pol6ala","q") mgctypes way .Add(gr)
mg.Draw("ap")Code is simple<texcode>\usetypescriptfile[type-gentium]\usetypescript[gentium]\setupbodyfont[gentium,10pt]\setuppapersize[A5][A5]\setuplayout[height=middle,topspace=1cm,header={2\lineheight},footer=0pt,backspace=1cm,margin=1cm, width=middle]
#force drawing of canvas to generate the fit TPaveStats
c1.Update()
c1.Print(filename)
</pre>
Here file 'data' is a 110 lines file with 24 floats values space separated,
ie <br/><tt> 20.6000 19.4000 19.4000 18.3000 17.8000 16.1000 16.7000 21.1000 23.3000 26.1000 26.1000 27.2000 27.8000 28.3000 28.3000 27.2000 25.6000 22.8000 21.7000 21.7000 21.7000 21.7000 21.7000 21.7000 </tt>.<br/>
Now a tex file, with a simple layer in lua as interface for python:
{|
|-
|<texcode>
\startluacode
function test_ROOTtestimagemagick(filenamebox,t) local w local h local d local f local res = 118.11023622047244094488 -- 300 dpi local opacity = 25 local sigma = 15 local x = 10 local y = 10  
require("python")
test pg = python.globals() PythonMagickWand = python.import('test"PythonMagickWand") w = math.floor((tex.wd[box] / 65536 ) / 72.27 * 2.54 * res ) h = math.floor(((tex.ht[box] / 65536) + (tex.dp[box] / 65536)) / 72.27 *2.54 *res ) f = string.format("%s.png",t)  wand = PythonMagickWand.NewMagickWand() background = PythonMagickWand.NewPixelWand(0) --ROOT1'PythonMagickWand.MagickNewImage(wand,w,h,background) testPythonMagickWand.runMagickNewImage(wand,w,h,background)  PythonMagickWand.MagickSetImageResolution(wand,res,res) PythonMagickWand.MagickSetImageUnits(wand,PythonMagickWand.PixelsPerCentimeterResolution) PythonMagickWand.MagickShadowImage(wand,opacity,sigma,x,y) PythonMagickWand.MagickWriteImage(wand ,f)   print(filenamew,h,f)
end
\stopluacode
 
 
 
 
\def\testimagemagick[#1]{%
\getparameters[imagemagick][#1]%
\ctxlua{testimagemagick(\csname imagemagickbox\endcsname,"\csname imagemagickfilename\endcsname")}%
}
 
\newcount\shdw
\long\def\startShadowtext#1\stopShadowtext{%
\bgroup%
\setbox0=\vbox{#1}%
\testimagemagick[box=0,filename={shd-\the\shdw}]%
%%
\defineoverlay[backg][{\externalfigure[shd-\the\shdw.png]}]%
\framed[background=backg,frame=off,offset=4pt]{\box0}%
%%\framed{\box0}
\global\advance\shdw by 1%
\egroup%
}
\starttext
\startTEXpage%\ctxlua{test_ROOT("data.pdf")}startShadowtext%\rotate[rotation=90]{input tufte\externalfigure[data.pdf]}stopShadowtext%
\stopTEXpage
\stoptext
</texcode> | And here is the result: [[Image:Test-ROOT1imagemagick.jpg|300pxpng]]|}
== Fontforge ==
def getcurve(self,letter):
self.glname = letter
res_Array = []
res = dict()
try :
#glyph_letter = [ g for g in self.font.glyphs() if g.glyphname == self.glname][0] g = self.font[letter]
except Exception ,e :
res['err'] = str(e)
res_Array.append(res) return resres_Array cntlayer_idx = glyph_letter0; for layer_name in g.layers: layer = g.layers[1layer_name] for contour_idx in range(len(layer)): res = dict() contour = layer[0contour_idx] contour_name = contour.name res['name']= contour.name res['is_quadratic'] = cntcontour.is_quadratic res['closed'] = cntcontour.closed res['points'] = [(p.x,p.y,"%i" %p.on_curve) for p in cnt contour ] res['design_size'] = self.font.design_size res['em'] = self.font.em res_Array.append(res) return resres_Array 
def drawmpostpath(self,letter):
res_Array = self.getcurve(letter)
state = 0
paths = ''
for res in res_Array:
temp = ''
for p in res['points'] :
if p[2]=='1' :
if state == 1 :
temp = temp + '-- (%s,%s)' %(p[0] ,p[1]) ; state = 1; continue
else:
temp = temp + '.. (%s,%s)' %(p[0] ,p[1]) ; state = 1; continue
if state == 1 : temp = temp + ' .. controls (%s,%s)' %(p[0],p[1]) ; state =2; continue
if state == 2 : temp = temp + ' and (%s,%s) ' %(p[0],p[1]) ; state =0; continue
if res['closed'] :
if state == 1 :
temp = 'draw ' + temp[2:] + " -- cycle;\n"
else:
temp = 'draw ' + temp[2:] + " .. cycle;\n"
else:
temp = 'draw ' + temp[2:] + ";\n"
paths = paths + temp
return paths
def getmpostoutline(self,letter):
res = self.getcurve(letter)
path = '..'.join( [str((p[0],p[1])) for p in res['points'] if p[2] == '1'] )
return path
def getmpostpoints(self,letter):
res = self.getcurve(letter)
path = [str((p[0],p[1])) for p in res['points'] if p[2] == '1']
return path
def getmpostpointsSugardrawmpostpoints(self,letter): res res_Array = self.getcurve(letter) path dots = '' for res in res_Array: temp = 'drawdot \n'.join( ["drawdot %s;" %str((p[0],p[1])) for p in res['points'] if p[2] == '1'] )+ "\n" dots = dots + temp return 'drawdot ' +pathdots
if __name__ == '__main__':
s = simpledraw("koeieletterslmmono10-regular.pfbotf") #res = s.getmpostpointsSugar('C') #print res #print s.getmpostoutline('C') print s.getcurve('e') print s.drawmpostpath('e') print s.drawmpostpoints('e') 
</pre>
Next lua layer, which in this case is embed in a tex file:
<texcode>
 
\setupcolors[state=start]
 
 
\startluacode
function testFontforge(fontfile,letter)
testoutlines = python.import("test-fontforge")
s = testoutlines.simpledraw(fontfile)
g = s.getmpostoutlinedrawmpostpath(letter) p = s.getmpostpointsSugardrawmpostpoints(letter) --print( string.format("\%s = \%s ==", letter,g )) tex.sprint(tex.ctxcatcodes,"\\startMPcode") tex.sprint(tex.ctxcatcodes,"pickup pencircle scaled 1pt;") tex.sprint(tex.ctxcatcodes,string.format("draw \%s .. cycle;",g) ) tex.sprint(tex.ctxcatcodes,"pickup pencircle scaled 8pt;") tex.sprint(tex.ctxcatcodes,string.format("\%s",p) ) tex.sprint(tex.ctxcatcodes,"\\stopMPcode")
end
\stopluacode
And this is the result: <br/>
[[Image:Test-fontforge.png|900px]]
 
...ok,it's not correct (why?), but it looks funny :)
== Ghostscript ==
For the first case, we consider an implementation of eps2pdf, being ps2pdf virtually the same .<br/>
Actually there is not a python binding of ghostscript, so we build a simple wrapper
using ctypes module<tt>testgs.py</tt> :
<pre>
if __name__ == '__main__' :
dens = density('u-random-int','test-001.pdf',10,7,'o') dens.run()</pre>   <texcode>\startluacodefunction testR(samples,outpdf,w,h,kernel) require("python") testR = python.import("test-R") dens = testR.density(samples,outpdf,w,h,kernel) dens.run()end\stopluacode \def\plotdenstiy[#1]{%\getparameters[R][#1]%\expanded{\ctxlua{testR("\Rsamples","\Routpdf",\Rwidth,\Rheight,"\Rkernel")}}%} \setupbodyfont[sans,10pt]\starttext\startTEXpage\plotdenstiy[samples={u-random-int},outpdf={test-001.pdf},width={10},height={7},kernel={o}]\setupcombinations[location=top]\startcombination[1*2]{\vbox{\hsize=400bpThis is a density plot of around {\tt 100 000} random numbers between $0$ and $2^{16}-1$ generated from {\tt \hbox{/dev/urandom}}}}{}{\externalfigure[test-001.pdf][width={400bp}]}{}\stopcombination\stopTEXpage\stoptext</texcode> And here is the plot <br/> [[Image:Test-R.png]] == dbxml ==From site [http://www.oracle.com/database/berkeley-db/xml/index.html (see here)] : ''Oracle Berkeley DB XML is an open source, embeddable XML database with XQuery-based access to documents stored in containers and indexed based on their content. Oracle Berkeley DB XML is built on top of Oracle Berkeley DB and inherits its rich features and attributes. Like Oracle Berkeley DB, it runs in process with the application with no need for human administration. Oracle Berkeley DB XML adds a document parser, XML indexer and XQuery engine on top of Oracle Berkeley DB to enable the fastest, most efficient retrieval of data.'' As test, we can use a dump from wikiversity [http://en.wikiversity.org/wiki/Getting_stats_out_of_Wikiversity_XML_dumps (see here)] .  === Build the cointainer ===First we build the container 'Data.dbxml' in the directory "wikienv" (that must exists) :<pre>"""---"""from bsddb3.db import *from dbxml import *import sysimport reimport time def createEnvironment(home): """Create DBEnv and initialize XmlManager""" try: environment = DBEnv() # environment.set_cachesize(0,512 * 1024 *1024,1) environment.set_lk_max_lockers(10000) environment.set_lk_max_locks(10000) environment.set_lk_max_objects(10000) # initialize DBEnv for transactions environment.open(home, DB_RECOVER|DB_CREATE|DB_INIT_LOCK|DB_DSYNC_LOG| DB_INIT_LOG|DB_INIT_MPOOL|DB_INIT_TXN, 0) except DBError, exc: print exc sys.exit() try: mgr = XmlManager(environment, 0) mgr.setDefaultPageSize(4096) except XmlException, se: print xe sys.exit() return mgr def createContainer(mgr, containerName, flags): """create/open a node container""" try: uc = mgr.createUpdateContext() container = mgr.openContainer(containerName, flags|DB_CREATE, XmlContainer.WholedocContainer) container.addIndex("","title","edge-element-substring-string",uc) container.addIndex("","username","edge-element-substring-string",uc) container.addIndex("","text","edge-element-substring-string",uc) return container except XmlException, ex: print ex sys.exit()  def loadcontent(mgr, container,content,printmsg,k): """ -- """ id= re.compile(r"<id>(.*)</id>") title = re.compile(r"<title>(.*)</title>",re.MULTILINE|re.DOTALL)  id_text = id.search(content,re.MULTILINE|re.DOTALL).group(1) title_text = title.search(content).group(1) docName = '_'.join(title_text.split()) + '_' +id_text txn = False try: # all Container modification operations need XmlUpdateContext uc = mgr.createUpdateContext() # create XmlTransaction for the operation txn = mgr.createTransaction() # use the DBXML_GEN_NAME flag to make sure this # succeeds by creating a new, unique name # Use a try/except block to allow the transaction to # be aborted in the proper scope upon error try: docName = container.putDocument(txn, docName, content, uc, DBXML_GEN_NAME) txn.commit() except XmlException, ex: print k,ex txn.abort() if printmsg: # now, get the document in a new transaction txn = mgr.createTransaction() doc = container.getDocument(txn, docName) name = doc.getName() docContent = doc.getContentAsString() txn.commit() # done with data # print the name and content print name pass except XmlException, inst: print inst if txn: txn.abort() # "main"def main(): home = "wikienv" # some configuration... containerName = "Data.dbxml" # initialize... mgr = createEnvironment(home) # create/open a transactional container container = createContainer(mgr, containerName, DBXML_TRANSACTIONAL)  startpage = re.compile(r"^\s*<page>\s*$") endpage = re.compile(r"^\s*</page>\s*$") id= re.compile(r"<id>(.*)</id>") title = re.compile(r"<title>(.*)</title>",re.MULTILINE|re.DOTALL) text = re.compile(r"<text ([^>]*)>(.*)</text>",re.MULTILINE|re.DOTALL) k,k1,k2 = 0,0,0 startcollect = False #src = file("enwiki-latest-pages-articles.xml","rb") src = file("enwikiversity-20090627-pages-articles.xml",'rb') for line in src: try: k1 = k1 +1 except: k1 = 0 if divmod(k1,10000)[0]>0 and divmod(k1,10000)[1] == 0 : print "k1=%012d,k=%012d ,sleep 1 sec." % (k1,k) #time.sleep(1) if startcollect and endpage.match(line) is None: temp = ''.join((temp,line)) continue if startpage.match(line) is not None: temp = line startcollect = True pos = src.tell() continue if endpage.match(line) is not None: content = ''.join((temp,line)) startcollect = False if title.search(content) is not None and id.search(content) is not None: #title_text = title.search(temp).group(1) #id_text , content_len = id.search(temp,re.MULTILINE|re.DOTALL).group(1), len(temp) #text_text = ((text.search(temp) is not None and text.search(temp).group(2)) or '' )+ ' ' + title_text #keywords = [kk.lower() for kk in re.split("\W",text_text) # if len(kk) >4 and kk.lower() != 'redirect' # and kk.lower() != 'disambiguation' ] #keywords.append(title_text) #keywords = list(set(keywords)) #keywords.sort() printmesg = False if divmod(k,100)[1] == 0 and divmod(k,100)[0] >0: print "%012d sync" %k container.sync() #del container #container = mgr.openContainer(containerName,DBXML_TRANSACTIONAL) #if divmod(k,1200)[1] == 0 and divmod(k,1200)[0] == 1: #print k,title_text,id_text ,pos,content_len,keywords#,temp #printmesg = True #print '%09d insert data...'%k, #return loadcontent(mgr,container,content,printmesg,k) k = k+1 src.close() if __name__ == "__main__": main()</pre> === Make pdf ===We use this modules <tt>wikidbxml_queryTxn.py</tt>to retrive a page, given a title (it can be also used as basis to build more complex queries, but for now it's adeguate ): <pre>from lxml import etreefrom bsddb3.db import *from dbxml import *import StringIOimport fcntlimport osimport pprintimport sysimport timeimport mwlib.docbookwriterfrom mwlib.dummydb import DummyDBfrom mwlib.uparser import parseString  def getXML(title,res): db = DummyDB() r = parseString(title=title, raw=res, wikidb=db) dbw = mwlib.docbookwriter.DocBookWriter() dbw.writeBook(r) pprint.pprint( dbw.getTree() ) return dbw.asstring()  def getConTeXt(title,res): db = DummyDB() r = parseString(title=title, raw=res, wikidb=db) dbw = mwlib.docbookwriter.DocBookWriter() dbw.writeBook(r) article = dbw.getTree() res = []  def managepara(c,res): if c.tag == 'para' and (c.text is not None): res.append(c.text.strip()+r"\par") if c.tag == 'para' and (len(c.getchildren())>0): for c1 in c.iterchildren('ulink'): res.append(r'cfr~\type{%s}\par' %c1.get('url'))   def managetable(c,res): if c.tag == 'informaltable' : res.append(r'\bTABLE') for row in c.iterchildren(): res.append(r'\bTR') for col in row.iterchildren(): res.append(r'\bTD '+ col.text.strip()+ r'\eTD') res.append(r'\eTR') res.append(r'\eTABLE')  def section_content(c,res): managepara(c,res) if c.tag == 'section' : subsection = c subsection_title = subsection.find("sectioninfo").find("title").text.strip() res.append(r"\subsection{%s}" % subsection_title) for sc in subsection.iterchildren(): subsection_content(sc,res)  def subsection_content(c,res): managepara(c,res) if c.tag == 'section' : subsubsection = c subsubsection_title = subsubsection.find("sectioninfo").find("title").text.strip() res.append(r"\subsubsection{%s}" % subsubsection_title) for sc in subsubsection.iterchildren(): subsubsection_content(sc,res) if c.tag == 'informaltable' : managetable(c,res)   def subsubsection_content(c,res): managepara(c,res)   chapter_title = article.find("articleinfo").find("title").text res.append(r"\chapter{%s}" % chapter_title)  section = article.find("section") section_title = section.find("sectioninfo").find("title").text res.append(r"\section{%s}" % section_title)  for c in section.iterchildren(): section_content(c,res) #pprint.pprint(res) return '\n'.join(res)     def query(env=None,mgr=None,container=None,querystring='Foo'): """ Always check with queryPlan (for example matches is not optimized for indexes) """ anID = env.lock_id() lock = env.lock_get(anID, "shared lock", DB_LOCK_READ) updateContext = mgr.createUpdateContext(); try: txn = mgr.createTransaction() resultsContext = mgr.createQueryContext() #queryString = "collection('%s')/page[contains(title,'%s')]" % (container.getName(),data) #queryString = "collection('%s')%s" % (container.getName(),querystring) results = mgr.query(txn, querystring, resultsContext) res = [res.asString() for res in results] txn.commit() return res #print "START",book_name ## ## except XmlException, inst: txn.abort() print "XmlException (", inst.exceptionCode,"): ", inst.what#,'name=',theName if inst.exceptionCode == DATABASE_ERROR: print "Database error code:",inst.dbError env.lock_put(lock) env.lock_id_free(anID) print 'OK exit' def getArtitleByTitle(title): pass env = DBEnv() env.set_cachesize(0, 64 * 1024 * 1024, 1) path2DbEnv ='wikienv' env.open(path2DbEnv,DB_THREAD|DB_REGISTER|DB_RECOVER|DB_INIT_MPOOL|DB_CREATE|DB_INIT_LOCK|DB_INIT_LOG|DB_INIT_TXN, 0) mgr = XmlManager(env,0) containerTxn = mgr.createTransaction() theContainer = "Data.dbxml" container = mgr.openContainer(containerTxn, theContainer) containerTxn.commit() ## lockfile = open("lock.kmgr", "w") fcntl.flock(lockfile, fcntl.LOCK_EX) try: res = set() querystring = 'collection("%s")/page[contains(title,"%s")]/revision/text/text()' % (theContainer,title) res = res.union(query(env,mgr,container,querystring=querystring)) res = ''.join(list(res)).decode('utf8') #res = getXML(title,res) #open('res.dbk','w').write( " ".join(res.split()) ) res = getConTeXt(title,res) return res except Exception,e: print "error on read:" ,e fcntl.flock(lockfile, fcntl.LOCK_UN) lockfile.close() def writeres(title,preamble,postamble,filename):  res = getArtitleByTitle(title=title) if res is not None : res = res.replace('&',r'\&') res = res.replace('#',r'\#') else: res = '' open(filename,'wb').write( '\n'.join((preamble,res,postamble)) )  pass if __name__ == '__main__':  preamble = r"""\usetypescriptfile[type-gentium]\usetypescript[gentium]\setupbodyfont[gentium,10pt]\setuppapersize[A5][A5]\setuplayout[height=middle,topspace=1cm,header={2\lineheight},footer=0pt,backspace=1cm,margin=1cm, width=middle]\starttext"""  postamble = r"""\stoptext"""   title="Primary mathematics/Numbers" filename = 'res.tex' writeres(title,preamble,postamble,filename)</pre> And in the end mkiv wrapper:<texcode>\usetypescriptfile[type-gentium]\usetypescript[gentium]\setupbodyfont[gentium,10pt]\setuppapersize[A5][A5]\setuplayout[height=middle,topspace=1cm,header={2\lineheight},footer=0pt,backspace=1cm,margin=1cm, width=middle]  \startluacodefunction testdbxml(title,preamble,postamble,filename) require("python") pg = python.globals() wikiversity = python.import("wikidbxml_queryTxn") wikiversity.writeres(title,preamble,postamble,filename) end\stopluacode \def\testdbxml[#1]{%\getparameters[dbxml][#1]%\ctxlua{testdbxml("\csname dbxmltitle\endcsname","\csname dbxmlpreamble\endcsname", "\csname dbxmlpostamble\endcsname","\csname dbxmlfilename\endcsname")}%\input \csname dbxmlfilename\endcsname %}   \starttext\testdbxml[title={Primary mathematics/Numbers}, preamble={}, postamble={}, filename={testres.tex}]\stoptext</texcode> Here here the result: <table class="wikitable"> <tr><td></td> <td>[[Image:Dbxml-1.png]]</td></tr><tr><td>[[Image:Dbxml-2.png]]</td> <td>[[Image:Dbxml-3.png]]</td></tr></table> One can also use sqlite that comes with python to query for titles <tt>category.db</tt>made from (for example) <tt>enwikiversity-20090627-category.sql</tt>,so reports are more simpler:just put this in python code above, right before 'if __name__' ...:<pre>import sqlite3def querycategory(title): conn = sqlite3.connect('category.db') c = conn.cursor() t = (title,) c.execute('select cat_title from category where cat_title like "%%%s%%" ;' % t) res = [row[0] for row in c] conn.commit() c.close() return res  def simplereports(title): res = querycategory(title) j = 0 for r in res: g = r.replace('_',' ') print g title= g.encode('utf8') filename = 'reps%04d.tex' % j writeres(title,'','',filename) j = j+1 return j </pre> Similary, add this to tex code<texcode>\startluacodefunction listtitles(title) require("python") pg = python.globals() wikiversity = python.import("wikidbxml_queryTxn") r = wikiversity.querycategory(title) local j = 0 local res = r[j] or {} while res do local d = string.format("\%s\\par",string.gsub(tostring(res),'_',' ')) tex.sprint(tex.ctxcatcodes,d) j = j+1 res = r[j] endend\stopluacode  \startluacodefunction simplereports(title) require("python") pg = python.globals() wikiversity = python.import("wikidbxml_queryTxn") r = wikiversity.simplereports(title) local j = tonumber(r) for v = 0,j-1 do local d = string.format("\\input reps\%04d ",v) tex.sprint(tex.ctxcatcodes,d) end print( j )end\stopluacode</texcode>
and test it with
<texcode>
\startluacode
function testR(samples,outpdf,w,h,kernel)
require("python")
testR = python.import("test-R")
dens = testR.density(samples,outpdf,w,h,kernel)
dens.run()
end
\stopluacode
 
\def\plotdenstiy[#1]{%
\getparameters[R][#1]%
\expanded{\ctxlua{testR("\Rsamples","\Routpdf",\Rwidth,\Rheight,"\Rkernel")}}%
}
 
\setupbodyfont[sans,10pt]
\starttext
\startTEXpage\plotdenstiy[samples={u-random-int},outpdf={test-001.pdf},width={10},height={7},kernel={o}]\setupcombinations[location=top]\startcombination[1*2]{\vbox{\hsize=400bpThis is a density plot of around {\tt 100 000} random numbers bfb Query for between $0$ and $2^{16'geometr':}-1$ generated from {\tt \hbox{/dev/urandom}}}}ctxlua{listtitles("geometr")}%{\externalfigure[test-001.pdf][width={400bp}]}ctxlua{simplereports("geometr")}\stopcombination\stopTEXpage%
\stoptext
</texcode>
And here is the plot <br/> [[Image:Test-R(query results are stored in reps0001.png]]tex ,reps0002.tex ,..and so on.)