API Reference

xlutils.copy.copy(wb)

Copy an xlrd.Book into an xlwt.Workbook preserving as much information from the source object as possible.

See the xlutils.copy documentation for an example.

xlutils.display.quoted_sheet_name(sheet_name, encoding='ascii')

This returns a string version of the supplied sheet name that is safe to display, including encoding the unicode sheet name into a string:

>>> from xlutils.display import quoted_sheet_name
>>> quoted_sheet_name(u'Price(\xa3)','utf-8')
b'Price(\xc2\xa3)'

It also quotes the sheet name if it contains spaces:

>>> quoted_sheet_name(u'My Sheet')
b"'My Sheet'"

Single quotes are replaced with double quotes:

>>> quoted_sheet_name(u"John's Sheet")
b"'John''s Sheet'"
xlutils.display.cell_display(cell, datemode=0, encoding='ascii')

This returns a string representation of the supplied cell, no matter what type of cell it is. Here’s some example output:

>>> import xlrd
>>> from xlrd.sheet import Cell
>>> from xlutils.display import cell_display
>>> from xlutils.compat import PY3
>>> cell_display(Cell(xlrd.XL_CELL_EMPTY, ''))
'undefined'
>>> cell_display(Cell(xlrd.XL_CELL_BLANK, ''))
'blank'
>>> cell_display(Cell(xlrd.XL_CELL_NUMBER, 1.2))
'number (1.2000)'
>>> cell_display(Cell(xlrd.XL_CELL_BOOLEAN, 0))
'logical (FALSE)'
>>> cell_display(Cell(xlrd.XL_CELL_DATE, 36892.0))
'date (2001-01-01 00:00:00)'

Erroneous date values will be displayed like this:

>>> cell_display(Cell(xlrd.XL_CELL_DATE, 1.5))
'date? (1.500000)'

Note

To display dates correctly, make sure that datemode is passed and is taken from the datemode attribute of the xlrd.Book from which the cell originated as shown below:

>>> wb = open_workbook(join(test_files,'date.xls'))
>>> cell = wb.sheet_by_index(0).cell(0, 0)
>>> cell_display(cell, wb.datemode)
'date (2012-04-13 00:00:00)'

If non-unicode characters are to be displayed, they will be masked out:

>>> cd = cell_display(Cell(xlrd.XL_CELL_TEXT,u'Price (\xa3)'))
>>> if PY3:
...     str(cd) == "text (b'Price (?)')"
... else:
...     str(cd) == 'text (Price (?))'
True

If you want to see these characters, specify an encoding for the output string:

>>> cd = cell_display(Cell(xlrd.XL_CELL_TEXT,u'Price (\xa3)'), encoding='utf-8')
>>> if PY3:
...     str(cd) == "text (b'Price (\\xc2\\xa3)')"
... else:
...     str(cd) == 'text (Price (\xc2\xa3))'
True

Error cells will have their textual description displayed:

>>> cell_display(Cell(xlrd.XL_CELL_ERROR, 0))
'error (#NULL!)'
>>> cell_display(Cell(xlrd.XL_CELL_ERROR, 2000))
'unknown error code (2000)'

If you manage to pass a cell with an unknown cell type, an exception will be raised:

>>> cell_display(Cell(69, 0))
Traceback (most recent call last):
...
Exception: Unknown Cell.ctype: 69
class xlutils.filter.BaseFilter

A concrete filter that implements pass-through behaviour for BaseFilterInterface.

This often makes a great base class for your own filters.

class xlutils.filter.BaseFilterInterface

This is the filter interface that shows the correct way to call the next filter in the chain. The next attribute is set up by the process() function. It can make a good base class for a new filter, but subclassing BaseFilter is often a better idea!

cell(rdrowx, rdcolx, wtrowx, wtcolx)

This is called for every cell in the sheet being processed. This is the most common method in which filtering and queuing of onward calls to the next filter takes place.

Parameters:
  • rdrowx – the index of the row to be read from in the current sheet.
  • rdcolx – the index of the column to be read from in the current sheet.
  • wtrowx – the index of the row to be written to in the current output sheet.
  • wtcolx – the index of the column to be written to in the current output sheet.
finish()

This method is called once processing of all workbooks has been completed.

A filter should call this method on the next filter in the chain as an indication that no further calls will be made to any methods or that, if they are, any new calls should be treated as new batch of workbooks with no information retained from the previous batch.

row(rdrowx, wtrowx)

This is called every time processing of a new row in the current sheet starts. It is primarily for copying row-based formatting from the source row to the target row.

Parameters:
  • rdrowx – the index of the row in the current sheet from which information for the row to be written should be copied.
  • wtrowx – the index of the row in sheet to be written to which information should be written for the row being read.
set_rdsheet(rdsheet)

This is only ever called by a filter that wishes to change the source of cells mid-way through writing a sheet.

Parameters:rdsheet – the Sheet object from which cells from this point forward should be read from.
sheet(rdsheet, wtsheet_name)

This method is called every time processing of a new sheet in the current workbook starts.

Parameters:
  • rdsheet – the Sheet object from which the new sheet should be created.
  • wtsheet_name – the name of the sheet into which content should be written.
start()

This method is called before processing of a batch of input. This allows the filter to initialise any required data structures and dispose of any existing state from previous batches.

It is called once before the processing of any workbooks by the included reader implementations.

This method can be called at any time. One common use is to reset all the filters in a chain in the event of an error during the processing of a rdbook.

Implementations of this method should be extremely robust and must ensure that they call the start() method of the next filter in the chain regardless of any work they do.

workbook(rdbook, wtbook_name)

This method is called every time processing of a new workbook starts.

Parameters:
  • rdbook – the Book object from which the new workbook should be created.
  • wtbook_name – the name of the workbook into which content should be written.
class xlutils.filter.BaseReader

A base reader good for subclassing.

__call__(filter)

Once instantiated, a reader will be called and have the first filter in the chain passed to its __call__() method. The implementation of this method should call the appropriate methods on the filter based on the cells found in the Book objects returned from the get_workbooks() method.

get_filepaths()

This is the most common method to implement. It must return an interable sequence of paths to excel files.

get_workbooks()

If the data to be processed is not stored in files or if special parameters need to be passed to xlrd.open_workbook() then this method must be overriden. Any implementation must return an iterable sequence of tuples. The first element of which must be an xlrd.Book object and the second must be the filename of the file from which the book object came.

class xlutils.filter.BaseWriter

This is the base writer that copies all data and formatting from the specified sources. It is designed for sequential use so when, for example, writing two workbooks, the calls must be ordered as follows:

Usually, only the get_stream() method needs to be implemented in subclasses.

cell(rdrowx, rdcolx, wtrowx, wtcolx)

This should be called for every cell in the sheet being processed.

Parameters:
  • rdrowx – the index of the row to be read from in the current sheet.
  • rdcolx – the index of the column to be read from in the current sheet.
  • wtrowx – the index of the row to be written to in the current output sheet.
  • wtcolx – the index of the column to be written to in the current output sheet.
finish()

This method should be called once processing of all workbooks has been completed.

get_stream(filename)

This method is called once for each file written. The filename of the file to be created is passed and something with write() and close() methods that behave like a file object’s must be returned.

row(rdrowx, wtrowx)

This should be called every time processing of a new row in the current sheet starts.

Parameters:
  • rdrowx – the index of the row in the current sheet from which information for the row to be written will be copied.
  • wtrowx – the index of the row in sheet to be written to which information will be written for the row being read.
set_rdsheet(rdsheet)

This should only ever called by a filter that wishes to change the source of cells mid-way through writing a sheet.

Parameters:rdsheet – the Sheet object from which cells from this point forward will be read.
sheet(rdsheet, wtsheet_name)

This method should be called every time processing of a new sheet in the current workbook starts.

Parameters:
  • rdsheet – the Sheet object from which the new sheet will be created.
  • wtsheet_name – the name of the sheet into which content will be written.
start()

This method should be called before processing of a batch of input. This allows the filter to initialise any required data structures and dispose of any existing state from previous batches.

workbook(rdbook, wtbook_name)

This method should be called every time processing of a new workbook starts.

Parameters:
  • rdbook – the Book object from which the new workbook will be created.
  • wtbook_name – the name of the workbook into which content will be written.
class xlutils.filter.ColumnTrimmer(is_junk=None)

This filter will strip columns containing no useful data from the end of sheets.

See the ColumnTrimmer documentation for an example.

class xlutils.filter.DirectoryWriter(path)

A writer that stores files in a filesystem directory

get_stream(filename)

Returns a stream for the file in the configured directory with the specified name.

class xlutils.filter.Echo(name=None, methods=True)

This filter will print calls to the methods configured when the filter is created along with the arguments passed.

For more details, see the documentation.

class xlutils.filter.ErrorFilter(level=40, message='No output as errors have occurred.')

A filter that gates downstream writers or filters on whether or not any errors have occurred.

See ErrorFilter for details.

finish()

The method that triggers downstream filters and writers if no errors have occurred.

class xlutils.filter.GlobReader(spec)

A reader that emits events for all files that match the glob in the spec.

class xlutils.filter.MemoryLogger(path, methods=True)

This filter will dump stats to the path it was configured with using the heapy package if it is available.

class xlutils.filter.MethodFilter(methods=True)

This is a base class that implements functionality for filters that want to do a common task such as logging, printing or memory usage recording on certain calls configured at filter instantiation time.

Echo is an example of this.

method(name, *args)

This is the method that needs to be implemented. It is called with the name of the method that was called on the MethodFilter and the arguments that were passed to that method.

class xlutils.filter.StreamWriter(stream)

A writer for writing exactly one workbook to the supplied stream

get_stream(filename)

Returns the stream passed during instantiation.

class xlutils.filter.XLRDReader(wb, filename)

A reader that uses an in-memory xlrd.Book object as its source of events.

get_workbooks()

Yield the workbook passed during instantiation.

class xlutils.filter.XLWTWriter

A writer that writes to a sequence of in-memory xlwt.Workbook objects.

xlutils.filter.process(reader, *chain)

The driver function for the xlutils.filter module.

It takes a chain of one reader, followed by zero or more filters and ending with one writer.

All the components are chained together by the process() function setting their next attributes appropriately. The reader is then called with the first filter in the chain.

xlutils.margins.ispunc(character)

This little helper function returns True if called with a punctuation character and False with any other:

>>> from xlutils.margins import ispunc
>>> ispunc('u')
False
>>> ispunc(',')
True

It also works fine with unicode characters:

>>> ispunc(u',')
True
>>> ispunc(u'w')
False

It does not, however, return sensible answers if called with more than one character:

>>> ispunc(',,,')
False
xlutils.margins.cells_all_junk(cells, is_rubbish=None)

Return True if all cells in the sequence are junk. What qualifies as junk: – empty cell – blank cell – zero-length text – text is all whitespace – number cell and is 0.0 – text cell and is_rubbish(cell.value) returns True.

This function returns True if all the cells supplied are junk:

>>> from xlutils.margins import cells_all_junk
>>> from xlrd.sheet import Cell,empty_cell
>>> cells_all_junk([empty_cell,empty_cell,empty_cell])
True

But it returns False as soon as any of the cells supplied are not junk:

>>> from xlrd import XL_CELL_NUMBER
>>> cells_all_junk([empty_cell,Cell(XL_CELL_NUMBER,1),empty_cell])
False

The definition of ‘junk’ is as follows:

  • Empty cells are junk:

    >>> from xlrd import XL_CELL_EMPTY
    >>> cells_all_junk([Cell(XL_CELL_EMPTY,'')])
    True
    
  • Blank cells are junk:

    >>> from xlrd import XL_CELL_BLANK
    >>> cells_all_junk([Cell(XL_CELL_BLANK,'')])
    True
    
  • Number cells containing zero are considered junk:

    >>> from xlrd import XL_CELL_NUMBER
    >>> cells_all_junk([Cell(XL_CELL_NUMBER,0)])
    True
    

    However, if a number cell contains anything else, it’s not junk:

    >>> cells_all_junk([Cell(XL_CELL_NUMBER,1)])
    False
    
  • Text cells are junk if they don’t contain anything:

    >>> from xlrd import XL_CELL_TEXT
    >>> cells_all_junk([Cell(XL_CELL_TEXT,'')])
    True
    

    or if they contain only space characters:

    >>> cells_all_junk([Cell(XL_CELL_TEXT,' \t\n\r')])
    True
    

    otherwise they aren’t considered junk:

    >>> cells_all_junk([Cell(XL_CELL_TEXT,'not junk')])
    False
    

    However, you can also pass a checker function such as this one:

    >>> def isrubbish(cell): return cell.value=='rubbish'
    

    Which can then be used to check for junk conditions of your own choice:

    >>> cells_all_junk([Cell(XL_CELL_TEXT,'rubbish')],isrubbish)
    True
    >>> cells_all_junk([Cell(XL_CELL_TEXT,'not rubbish')],isrubbish)
    False
    

    Passing a function like this isn’t only limited to text cells:

    >>> def isnegative(cell): return isinstance(cell.value,float) and cell.value<0 or False
    
    >>> cells_all_junk([Cell(XL_CELL_NUMBER,-1.0)],isnegative)
    True
    >>> cells_all_junk([Cell(XL_CELL_NUMBER,1.0)],isnegative)
    False
    
  • Date, boolean, and error fields are all not considered to be junk:

    >>> from xlrd import XL_CELL_DATE, XL_CELL_BOOLEAN, XL_CELL_ERROR
    >>> cells_all_junk([Cell(XL_CELL_DATE,'')])
    False
    >>> cells_all_junk([Cell(XL_CELL_BOOLEAN,'')])
    False
    >>> cells_all_junk([Cell(XL_CELL_ERROR,'')])
    False
    

Be careful, though, as if you call cells_all_junk() with an empty sequence of cells, you’ll get True:

>>> cells_all_junk([])
True
xlutils.margins.number_of_good_rows(sheet, checker=None, nrows=None, ncols=None)

Return 1 + the index of the last row with meaningful data in it.

This function returns the number of rows in a sheet that contain anything other than junk, as defined by the cells_all_junk() function.

For example:

>>> from xlutils.tests.fixtures import make_sheet
>>> sheet = make_sheet((
...           ('X',' ',' ',' ',' '),
...           (' ',' ',' ','X',' '),
...           (' ',' ',' ',' ',' '),
...           ('X',' ',' ',' ',' '),
...           (' ',' ','X',' ',' '),
...           (' ',' ',' ',' ',' '),
...           ))
>>> from xlutils.margins import number_of_good_rows
>>> number_of_good_rows(sheet)
5

You can limit the area searched using the nrows and ncols parameters:

>>> number_of_good_rows(sheet,nrows=3)
2
>>> number_of_good_rows(sheet,ncols=2)
4
>>> number_of_good_rows(sheet,ncols=3,nrows=3)
1

You can also pass a checking function through to the cells_all_junk() calls:

>>> number_of_good_rows(sheet,checker=lambda c:c.value=='X')
0
xlutils.margins.number_of_good_cols(sheet, checker=None, nrows=None, ncols=None)

Return 1 + the index of the last column with meaningful data in it.

This function returns the number of columns in a sheet that contain anything other than junk, as defined by the cells_all_junk() function.

For example:

>>> sheet = make_sheet((
...           ('X',' ',' ','X',' ',' '),
...           (' ',' ',' ',' ',' ',' '),
...           (' ',' ',' ',' ','X',' '),
...           (' ','X',' ',' ',' ',' '),
...           (' ',' ',' ',' ',' ',' '),
...           (' ',' ',' ',' ',' ',' '),
...           ))
>>> from xlutils.margins import number_of_good_cols
>>> number_of_good_cols(sheet)
5

You can limit the area searched using the nrows and ncols parameters:

>>> number_of_good_cols(sheet,nrows=2)
4
>>> number_of_good_cols(sheet,ncols=2)
2
>>> number_of_good_cols(sheet,ncols=3,nrows=3)
1

You can also pass a checking function through to the cells_all_junk() calls:

>>> number_of_good_cols(sheet,checker=lambda c:c.value=='X')
0
xlutils.save.save(wb, filename_or_stream)

Save the supplied xlrd.Book to the supplied stream or filename.

class xlutils.styles.NamedStyle(name, xf)

An object with name and xf attributes representing a particular style in a workbook.

class xlutils.styles.Styles(book)

A mapping-like object that will return a NamedStyle instance for the cell passed to the __getitem__() method.

class xlutils.view.CheckSheet(book, sheet, row_slice=None, col_slice=None)

A special sheet view for use in automated tests.

compare(*expected)

Call to check whether this view contains the expected data. If it does not, a descriptive AssertionError will be raised. Requires testfixtures.

Parameters:expected – tuples containing the data that should be present in this view.
class xlutils.view.CheckerView(path, class_=None)

A special subclass of View for use in automated tests when you want to check the contents of a generated spreadsheet.

Views of sheets are returned as CheckSheet instances which have a handy compare() method.

class_

alias of CheckSheet

class xlutils.view.Col(name)

An end-inclusive column label index for use in slices, eg: [:, Col('A'), Col('B')]

class xlutils.view.Row(name)

A one-based, end-inclusive row index for use in slices, eg:: [Row(1):Row(2), :]

class xlutils.view.SheetView(book, sheet, row_slice=None, col_slice=None)

A view on a sheet in a workbook. Should be created by indexing a View.

These can be sliced to create smaller views.

Views can be iterated over to return a set of iterables, one for each row in the view. Data is returned as in the cell values with the exception of dates and times which are converted into datetime instances.

__weakref__

list of weak references to the object (if defined)

book = None

The workbook used by this view.

sheet = None

The sheet in the workbook used by this view.

class xlutils.view.View(path, class_=None)

A view wrapper around a Book that allows for easy iteration over the data in a group of cells.

Parameters:
  • path – The path of the .xls from which to create views.
  • class – An class to use instead of SheetView for views of sheets.
__getitem__(item)

Returns of a view of a sheet in the workbook this view is created for.

Parameters:item – either zero-based integer index or a sheet name.
__weakref__

list of weak references to the object (if defined)

class_

This can be replaced in a sub-class to use something other than SheetView for the views of sheets returned.

alias of SheetView