Since the upgrade to LibreOffice 4, I have been experiencing more crashes with messages such as “Binary URP bridge disposed during call” (which as far as I can tell means that an exception was thrown somewhere in LibreOffice) and other more standard exceptions. Typically they are non-repeatable.

I have tried adding delays in my code (e.g. time.sleep(1) ), with what appears to be some small success. While normally you might think that this is a stupid idea that should do nothing, some of LibreOffice’s processing happens asynchronously (you tell it to do something, it tells you it’s done it, but actually it then does it a bit later). A good example of this is when updating indexes – I found that if you add a lot of pages to a document and then instantly create an index, it will create an index for a model of the document that is out of date. Delay for a few seconds first, and it will give you the correct index.  Bad programming technique, but it works.

I have also found that having fewer LibreOffice windows open helps.

Today I found a more reliable solution, which has had instant and obvious success: run LibreOffice in headless mode. So start LibreOffice (in server mode, to work with Uno) with the command:

soffice --headless "--accept=socket,port=2002;urp;"

It has been working pretty well so far. YMMV.

Thanks for all the comments lately.  Keep them coming.  A combination of site stats and comments shows that there is a steady stream of interest in this topic.

The current priority is to create a decent introductory tutorial to help people get started – the Cookbook isn’t much use if you can’t get started and there is a serious lack of good basic-level tutorials in the community.  However, much of the content is added when I have to find something out, which is why the Cookbook tends to be updated more often.

There is a reality check that I’m massively resource constrained at the moment, so keep your fingers crossed that I get appropriately distracted one weekend to make some decent progress.

I haven’t had much time to review the project’s current status, but the areas that currently come to mind include:

Introductory Tutorial to Python UNO

As I’m getting better at using Python with LibreOffice, I’m getting more qualified to write an introductory tutorial!  Some of the methods that I used to start with probably weren’t the best ones, so in some respects, it’s probably best I didn’t write this section first.  However, it clearly is the most useful part to the wider community.

Uploading Source Documents and Processing Code to Github

The current version is created by processing an ODT file to remove comments, empty sections etc.  This means there are two useful things straight-off: a nice big ODT file for people to experiment with, and some sample code.  Uploading them to Github would not only give people access to them, but would also allow external contributions.

Better Website and Online Readable Version to Allow Easier Discovery, Browsing and Comments

I don’t think that a super-flashy website is a great priority, but I think it would be good to create an online-viewable version partly to help people find the content on search engines and browse it more casually.  It would also provide an interface for submitting comments next to items within the document.  In my day-to-day work I have recently been developing a website using Twitter Bootstrap, and I quite like the style of their documentation browser, with a the sidebar navigation.  This would also provide some good example code of converting ODT to HTML.

Bug Identification and Fixing

I get quite a lot of crashes (and they really annoy me!).  Bulk processing a wide variety of documents inevitably leads to finding more edge cases.  I’d like to explore finding a way to better work with the LibreOffice developers to get bugs identified and fixed.  I think my time is better spent working on documentation, so it’s about exploring ideas like making a test environment that works for tutorials and Python UNO newbies, and which provides useful tests for LibreOffice developers which they can use independently.

More Sample Code

Most of the code that I write for my own projects could be published for people to look at, complete with some test files to allow easy experimentation.  It just requires me to do a bit of work checking there is nothing sensitive included, and separating out configuration from the actual processing code.  It’s not necessarily “good” code, but it could still be helpful.

As per usual, comments are more than welcome on what I should spend my time working on.

I’m publishing the second draft of DocumentHacker today.  The structure remains the same as:

  1. Writing a Long Document
  2. Programming LibreOffice with Python Tutorial
  3. LibreOffice Python UNO Cookbook

It should be useful for anyone wanting to use Python + LibreOffice / OpenOffice via Python UNO.

This isn’t a major improvement on the last version – it’s only a few pages longer.  This time I am also including the “warts and all” version, which might be helpful until I write things up more completely.  Please excuse any comments that clearly weren’t meant to be made public!

First there was LibreOffice 4, now there is OpenOffice 4.0.

The headline new feature is a new context-sensitive sidebar (it shows stuff relevant to what you’re doing), but there are also bug fixes, and some fixes to import filters that show how import filters can lack basic features (e.g. they’ve just fixed using pictures for bullets from Word 2003).  Great to see progress happening however.

Read the full release notes for more info.

I’m unlikely to test it any time soon as it’s might cause conflicts with LibreOffice / be a non-trivial install on Ubuntu.

Good news that AMD is helping out The Document Foundation with the creation of LibreOffice.  This is on top of Intel.  Plan is to speed up Calc in part using the GPU (press release).

Looking at the blog there have been a few additions to the advisory board recently, maybe part of an active push?  The list of the advisory board is here.

I’ve submitted a bug report, but I thought that it would be worth sharing my frustration that ODT / OpenDocument for all its supposed greatness as a common compatibility file format between open source and closed source programs has proven pretty frustrating when using Abiword lately.   I’m not saying Abiword is “rubbish”, but just to warn that it will likely create corrupted ODT files.  ODT isn’t Abiword’s native file format, and it’s not a big project, but I still expected better.

The problems have been with files created initially in LibreOffice writer, but they have been narrowed down to Abiword.  In both cases Abiword wasn’t even creating valid XML output (ODT is XML based internally) – this means that they couldn’t be opened, and they have taken hours of painstaking work to debug them and fix my files.

The two problems included:

  • Not properly escaping the ‘&’ character in urls (it’s pretty common to have in URLs)
  • Not correctly closing some XML tags.

The second problem could well have been a hard to find bug, but the first one I find surprising.  As every software developer knows, bugs do occur.  If the program crashes and you lose 20 minutes of work, that’s really annoying, but it will only take 20 minutes to fix and you tend to know about it when that happens so can recall what you did.  The issue with these bugs is that they corrupt the whole file (even if someone technical like me can recover them), and they do so without you knowing it’s just happened.

I suggested in my bug report that the include an XML parser before finally saving to check that it at least is valid XML.  I’d certainly rather know up front that something had gone horribly wrong, and not save my changes, than corrupt all my work up to that point.

I’ve got confidence that these bugs will be fixed soon enough, but I would still be very careful about using Abiword with ODT given I came across two critical bugs having only used it for a few days.

Update: I have been using Abiword 1.9.2, which the maintainers say is a testing release that I shouldn’t have been using.  It is however the default version packaged with Ubuntu.

You can install LibreOffice 4 on Ubuntu 12.10, Quantal Quetzal, but my advice is not to.  Having used it for a few weeks, my experience is that it crashes too often, and doesn’t provide any obvious benefits.  There was even a stage where LibreOffice simply wouldn’t work, and I was using the supposedly stable repository!  Brief experiments using Abiword and Calligra Words, when LibreOffice wasn’t working, caused problems including data loss (which I did recover with some effort and luck).

I have now upgraded to Ubuntu 13.04, Raring Ringtail, which has LibreOffice 4 installed by default.  So far it’s been working flawlessly, but I really haven’t used it enough to be sure.

If you do want to install LibreOffice 4 on Ubuntu 12.10, then you need to add the repository at https://launchpad.net/~libreoffice/+archive/libreoffice-4-0 (instructions on that page) and upgrade.  You may need to do sudo apt-get dist-upgrade to install it properly.  It will remove LibreOffice 3 from your system.

Sebastian Sauer has released a very early alpha version of Calligra on Android called COffice, with plans to port to yet more mobile operating systems such as Blackberry.  It’s based on work done already on Calligra Mobile.

Calligra Suite is a competitor to LibreOffice / OpenOffice, which works on the same OpenDocument file formats.  It used to be called KOffice, and its history is as a project to create the best office suite for KDE.  From the perspective of creating text documents, it has interesting differences including its focus on the use of frames for greater control over formatting, and its user interface is superficially more user friendly.

I’ve quickly tried the Android version, and it correctly opens the two test ODT documents, but it’s too early to do much.  The idea of an open source office suite sharing code across desktop and mobile is really exciting however.  Given much of the code should be common across platforms, the project has the potential to develop quickly.  One to watch.

A question that has come up since releasing my first pre-release edition of DocumentHacker on using Python and LibreOffice is that annoying children in the car question “Why?”. Why use LibreOffice when you could use any number of report generation tools? Why complicate typing up a document in a word processor with programming? Why use LibreOffice, rather than manipulating ODTs directly? Why use Python, rather than Java, C++, C# or basic?

I’ll answer the last question first, because it’s the easiest one. Java, C++ etc are unsuitable as they complex and verbose interfaces and generally shouldn’t give any noticeable performance advantage if you are using UNO because the heavy lifting is being done by the same fast LibreOffice code, whatever language you control that code with. LibreOffice Basic is an option for macros, but it’s not a general purpose programming language. Python is simple to use, very flexible and can be used from the interactive console. This makes development a lot faster.

You should use LibreOffice rather than manipulating ODT files directly because it’s a high level interface. You can uncompress the files and manipulate XML, but can you do it as reliably or in as little time as going via LibreOffice?

The other are probably best answered with some examples of why you might decide to use this system. As always, use the right tool for the job, and this isn’t the right tool for all jobs:

  1. You want to automatically reformat a number of documents, for instance to enforce a company style policy.
  2. To perform an advanced search within several documents such as looking for bibliographic references to a particular author.
  3. You want to automatically create documents, such as invoices, but you want non-techies to be able to manually edit them later using a familiar user interface (LibreOffice).
  4. Bulk document conversion e.g. converting all ODTs to PDF. This can be done on a webserver.

The use case that I actually discuss in the book is for managing the creation of long documents, which are continuously changing and being deployed. It was my use initial use case, and in part I am using it to stoke the debate on what user interfaces should be like in a world of programmers – is it better to encourage more expressive programming, than the use of fixed function buttons?

Click the link to download the first draft.  Although far from complete, it should be really useful for anyone who wants to get into using Python UNO with LibreOffice or OpenOffice.  The cookbook part at the back in particular should be really useful.

Download DocumentHacker – First Draft

For some background, the book is divided into three sections:

  1. Writing a Long Document
  2. Programming LibreOffice with Python Tutorial
  3. LibreOffice Python UNO Cookbook

The book doesn’t just dive into the Python, but gives a bit more context and purpose.  This is a book about using Python with LibreOffice Writer to create awesome documents that would be hard to create without using Python (or another programming language) with LibreOffice.

When going into the Python section, currently the best part is the cookbook, which covers processing headings, tables, indexes, frames, headers, footers and so on.  It even includes some hints about how to work out what to do given the poor state of the documentation.

Download it.  It’s free!  Let me know what you think of it so far.

Follow

Get every new post delivered to your Inbox.