API Reference¶
-
xlutils.copy.
copy
(wb)¶ Copy an
xlrd.Book
into anxlwt.Workbook
preserving as much information from the source object as possible.See the xlutils.copy documentation for an example.
-
xlutils.display.
quoted_sheet_name
(sheet_name, encoding='ascii')¶ This returns a string version of the supplied sheet name that is safe to display, including encoding the unicode sheet name into a string:
>>> from xlutils.display import quoted_sheet_name >>> quoted_sheet_name(u'Price(\xa3)','utf-8') b'Price(\xc2\xa3)'
It also quotes the sheet name if it contains spaces:
>>> quoted_sheet_name(u'My Sheet') b"'My Sheet'"
Single quotes are replaced with double quotes:
>>> quoted_sheet_name(u"John's Sheet") b"'John''s Sheet'"
-
xlutils.display.
cell_display
(cell, datemode=0, encoding='ascii')¶ This returns a string representation of the supplied cell, no matter what type of cell it is. Here’s some example output:
>>> import xlrd >>> from xlrd.sheet import Cell >>> from xlutils.display import cell_display >>> from xlutils.compat import PY3
>>> cell_display(Cell(xlrd.XL_CELL_EMPTY, '')) 'undefined'
>>> cell_display(Cell(xlrd.XL_CELL_BLANK, '')) 'blank'
>>> cell_display(Cell(xlrd.XL_CELL_NUMBER, 1.2)) 'number (1.2000)'
>>> cell_display(Cell(xlrd.XL_CELL_BOOLEAN, 0)) 'logical (FALSE)'
>>> cell_display(Cell(xlrd.XL_CELL_DATE, 36892.0)) 'date (2001-01-01 00:00:00)'
Erroneous date values will be displayed like this:
>>> cell_display(Cell(xlrd.XL_CELL_DATE, 1.5)) 'date? (1.500000)'
Note
To display dates correctly, make sure that datemode is passed and is taken from the datemode attribute of the
xlrd.Book
from which the cell originated as shown below:>>> wb = open_workbook(join(test_files,'date.xls')) >>> cell = wb.sheet_by_index(0).cell(0, 0) >>> cell_display(cell, wb.datemode) 'date (2012-04-13 00:00:00)'
If non-unicode characters are to be displayed, they will be masked out:
>>> cd = cell_display(Cell(xlrd.XL_CELL_TEXT,u'Price (\xa3)')) >>> if PY3: ... str(cd) == "text (b'Price (?)')" ... else: ... str(cd) == 'text (Price (?))' True
If you want to see these characters, specify an encoding for the output string:
>>> cd = cell_display(Cell(xlrd.XL_CELL_TEXT,u'Price (\xa3)'), encoding='utf-8') >>> if PY3: ... str(cd) == "text (b'Price (\\xc2\\xa3)')" ... else: ... str(cd) == 'text (Price (\xc2\xa3))' True
Error cells will have their textual description displayed:
>>> cell_display(Cell(xlrd.XL_CELL_ERROR, 0)) 'error (#NULL!)'
>>> cell_display(Cell(xlrd.XL_CELL_ERROR, 2000)) 'unknown error code (2000)'
If you manage to pass a cell with an unknown cell type, an exception will be raised:
>>> cell_display(Cell(69, 0)) Traceback (most recent call last): ... Exception: Unknown Cell.ctype: 69
-
class
xlutils.filter.
BaseFilter
¶ A concrete filter that implements pass-through behaviour for
BaseFilterInterface
.This often makes a great base class for your own filters.
-
class
xlutils.filter.
BaseFilterInterface
¶ This is the filter interface that shows the correct way to call the next filter in the chain. The next attribute is set up by the
process()
function. It can make a good base class for a new filter, but subclassingBaseFilter
is often a better idea!-
cell
(rdrowx, rdcolx, wtrowx, wtcolx)¶ This is called for every cell in the sheet being processed. This is the most common method in which filtering and queuing of onward calls to the next filter takes place.
Parameters: - rdrowx – the index of the row to be read from in the current sheet.
- rdcolx – the index of the column to be read from in the current sheet.
- wtrowx – the index of the row to be written to in the current output sheet.
- wtcolx – the index of the column to be written to in the current output sheet.
-
finish
()¶ This method is called once processing of all workbooks has been completed.
A filter should call this method on the next filter in the chain as an indication that no further calls will be made to any methods or that, if they are, any new calls should be treated as new batch of workbooks with no information retained from the previous batch.
-
row
(rdrowx, wtrowx)¶ This is called every time processing of a new row in the current sheet starts. It is primarily for copying row-based formatting from the source row to the target row.
Parameters: - rdrowx – the index of the row in the current sheet from which information for the row to be written should be copied.
- wtrowx – the index of the row in sheet to be written to which information should be written for the row being read.
-
set_rdsheet
(rdsheet)¶ This is only ever called by a filter that wishes to change the source of cells mid-way through writing a sheet.
Parameters: rdsheet – the Sheet
object from which cells from this point forward should be read from.
-
sheet
(rdsheet, wtsheet_name)¶ This method is called every time processing of a new sheet in the current workbook starts.
Parameters: - rdsheet – the
Sheet
object from which the new sheet should be created. - wtsheet_name – the name of the sheet into which content should be written.
- rdsheet – the
-
start
()¶ This method is called before processing of a batch of input. This allows the filter to initialise any required data structures and dispose of any existing state from previous batches.
It is called once before the processing of any workbooks by the included reader implementations.
This method can be called at any time. One common use is to reset all the filters in a chain in the event of an error during the processing of a rdbook.
Implementations of this method should be extremely robust and must ensure that they call the
start()
method of the next filter in the chain regardless of any work they do.
-
workbook
(rdbook, wtbook_name)¶ This method is called every time processing of a new workbook starts.
Parameters: - rdbook – the
Book
object from which the new workbook should be created. - wtbook_name – the name of the workbook into which content should be written.
- rdbook – the
-
-
class
xlutils.filter.
BaseReader
¶ A base reader good for subclassing.
-
__call__
(filter)¶ Once instantiated, a reader will be called and have the first filter in the chain passed to its
__call__()
method. The implementation of this method should call the appropriate methods on the filter based on the cells found in theBook
objects returned from theget_workbooks()
method.
-
get_filepaths
()¶ This is the most common method to implement. It must return an interable sequence of paths to excel files.
-
get_workbooks
()¶ If the data to be processed is not stored in files or if special parameters need to be passed to
xlrd.open_workbook()
then this method must be overriden. Any implementation must return an iterable sequence of tuples. The first element of which must be anxlrd.Book
object and the second must be the filename of the file from which the book object came.
-
-
class
xlutils.filter.
BaseWriter
¶ This is the base writer that copies all data and formatting from the specified sources. It is designed for sequential use so when, for example, writing two workbooks, the calls must be ordered as follows:
workbook()
call for first workbooksheet()
call for first sheetrow()
call for first rowcell()
call for left-most cell of first rowcell()
call for second-left-most cell of first row- …
row()
call for second row- …
sheet()
call for second sheet- …
workbook()
call for second workbook- …
finish()
call
Usually, only the
get_stream()
method needs to be implemented in subclasses.-
cell
(rdrowx, rdcolx, wtrowx, wtcolx)¶ This should be called for every cell in the sheet being processed.
Parameters: - rdrowx – the index of the row to be read from in the current sheet.
- rdcolx – the index of the column to be read from in the current sheet.
- wtrowx – the index of the row to be written to in the current output sheet.
- wtcolx – the index of the column to be written to in the current output sheet.
-
finish
()¶ This method should be called once processing of all workbooks has been completed.
-
get_stream
(filename)¶ This method is called once for each file written. The filename of the file to be created is passed and something with
write()
andclose()
methods that behave like afile
object’s must be returned.
-
row
(rdrowx, wtrowx)¶ This should be called every time processing of a new row in the current sheet starts.
Parameters: - rdrowx – the index of the row in the current sheet from which information for the row to be written will be copied.
- wtrowx – the index of the row in sheet to be written to which information will be written for the row being read.
-
set_rdsheet
(rdsheet)¶ This should only ever called by a filter that wishes to change the source of cells mid-way through writing a sheet.
Parameters: rdsheet – the Sheet
object from which cells from this point forward will be read.
-
sheet
(rdsheet, wtsheet_name)¶ This method should be called every time processing of a new sheet in the current workbook starts.
Parameters: - rdsheet – the
Sheet
object from which the new sheet will be created. - wtsheet_name – the name of the sheet into which content will be written.
- rdsheet – the
-
start
()¶ This method should be called before processing of a batch of input. This allows the filter to initialise any required data structures and dispose of any existing state from previous batches.
-
workbook
(rdbook, wtbook_name)¶ This method should be called every time processing of a new workbook starts.
Parameters: - rdbook – the
Book
object from which the new workbook will be created. - wtbook_name – the name of the workbook into which content will be written.
- rdbook – the
-
class
xlutils.filter.
ColumnTrimmer
(is_junk=None)¶ This filter will strip columns containing no useful data from the end of sheets.
See the ColumnTrimmer documentation for an example.
-
class
xlutils.filter.
DirectoryWriter
(path)¶ A writer that stores files in a filesystem directory
-
get_stream
(filename)¶ Returns a stream for the file in the configured directory with the specified name.
-
-
class
xlutils.filter.
Echo
(name=None, methods=True)¶ This filter will print calls to the methods configured when the filter is created along with the arguments passed.
For more details, see the documentation.
-
class
xlutils.filter.
ErrorFilter
(level=40, message='No output as errors have occurred.')¶ A filter that gates downstream writers or filters on whether or not any errors have occurred.
See ErrorFilter for details.
-
finish
()¶ The method that triggers downstream filters and writers if no errors have occurred.
-
-
class
xlutils.filter.
GlobReader
(spec)¶ A reader that emits events for all files that match the glob in the spec.
-
class
xlutils.filter.
MemoryLogger
(path, methods=True)¶ This filter will dump stats to the path it was configured with using the heapy package if it is available.
-
class
xlutils.filter.
MethodFilter
(methods=True)¶ This is a base class that implements functionality for filters that want to do a common task such as logging, printing or memory usage recording on certain calls configured at filter instantiation time.
Echo is an example of this.
-
method
(name, *args)¶ This is the method that needs to be implemented. It is called with the name of the method that was called on the MethodFilter and the arguments that were passed to that method.
-
-
class
xlutils.filter.
StreamWriter
(stream)¶ A writer for writing exactly one workbook to the supplied stream
-
get_stream
(filename)¶ Returns the stream passed during instantiation.
-
-
class
xlutils.filter.
XLRDReader
(wb, filename)¶ A reader that uses an in-memory
xlrd.Book
object as its source of events.-
get_workbooks
()¶ Yield the workbook passed during instantiation.
-
-
class
xlutils.filter.
XLWTWriter
¶ A writer that writes to a sequence of in-memory
xlwt.Workbook
objects.
-
xlutils.filter.
process
(reader, *chain)¶ The driver function for the
xlutils.filter
module.It takes a chain of one reader, followed by zero or more filters and ending with one writer.
All the components are chained together by the
process()
function setting theirnext
attributes appropriately. The reader is then called with the first filter in the chain.
-
xlutils.margins.
ispunc
(character)¶ This little helper function returns
True
if called with a punctuation character andFalse
with any other:>>> from xlutils.margins import ispunc >>> ispunc('u') False >>> ispunc(',') True
It also works fine with unicode characters:
>>> ispunc(u',') True >>> ispunc(u'w') False
It does not, however, return sensible answers if called with more than one character:
>>> ispunc(',,,') False
-
xlutils.margins.
cells_all_junk
(cells, is_rubbish=None)¶ Return True if all cells in the sequence are junk. What qualifies as junk: – empty cell – blank cell – zero-length text – text is all whitespace – number cell and is 0.0 – text cell and is_rubbish(cell.value) returns True.
This function returns
True
if all the cells supplied are junk:>>> from xlutils.margins import cells_all_junk >>> from xlrd.sheet import Cell,empty_cell >>> cells_all_junk([empty_cell,empty_cell,empty_cell]) True
But it returns
False
as soon as any of the cells supplied are not junk:>>> from xlrd import XL_CELL_NUMBER >>> cells_all_junk([empty_cell,Cell(XL_CELL_NUMBER,1),empty_cell]) False
The definition of ‘junk’ is as follows:
Empty cells are junk:
>>> from xlrd import XL_CELL_EMPTY >>> cells_all_junk([Cell(XL_CELL_EMPTY,'')]) True
Blank cells are junk:
>>> from xlrd import XL_CELL_BLANK >>> cells_all_junk([Cell(XL_CELL_BLANK,'')]) True
Number cells containing zero are considered junk:
>>> from xlrd import XL_CELL_NUMBER >>> cells_all_junk([Cell(XL_CELL_NUMBER,0)]) True
However, if a number cell contains anything else, it’s not junk:
>>> cells_all_junk([Cell(XL_CELL_NUMBER,1)]) False
Text cells are junk if they don’t contain anything:
>>> from xlrd import XL_CELL_TEXT >>> cells_all_junk([Cell(XL_CELL_TEXT,'')]) True
or if they contain only space characters:
>>> cells_all_junk([Cell(XL_CELL_TEXT,' \t\n\r')]) True
otherwise they aren’t considered junk:
>>> cells_all_junk([Cell(XL_CELL_TEXT,'not junk')]) False
However, you can also pass a checker function such as this one:
>>> def isrubbish(cell): return cell.value=='rubbish'
Which can then be used to check for junk conditions of your own choice:
>>> cells_all_junk([Cell(XL_CELL_TEXT,'rubbish')],isrubbish) True >>> cells_all_junk([Cell(XL_CELL_TEXT,'not rubbish')],isrubbish) False
Passing a function like this isn’t only limited to text cells:
>>> def isnegative(cell): return isinstance(cell.value,float) and cell.value<0 or False
>>> cells_all_junk([Cell(XL_CELL_NUMBER,-1.0)],isnegative) True >>> cells_all_junk([Cell(XL_CELL_NUMBER,1.0)],isnegative) False
Date, boolean, and error fields are all not considered to be junk:
>>> from xlrd import XL_CELL_DATE, XL_CELL_BOOLEAN, XL_CELL_ERROR >>> cells_all_junk([Cell(XL_CELL_DATE,'')]) False >>> cells_all_junk([Cell(XL_CELL_BOOLEAN,'')]) False >>> cells_all_junk([Cell(XL_CELL_ERROR,'')]) False
Be careful, though, as if you call
cells_all_junk()
with an empty sequence of cells, you’ll get True:>>> cells_all_junk([]) True
-
xlutils.margins.
number_of_good_rows
(sheet, checker=None, nrows=None, ncols=None)¶ Return 1 + the index of the last row with meaningful data in it.
This function returns the number of rows in a sheet that contain anything other than junk, as defined by the
cells_all_junk()
function.For example:
>>> from xlutils.tests.fixtures import make_sheet >>> sheet = make_sheet(( ... ('X',' ',' ',' ',' '), ... (' ',' ',' ','X',' '), ... (' ',' ',' ',' ',' '), ... ('X',' ',' ',' ',' '), ... (' ',' ','X',' ',' '), ... (' ',' ',' ',' ',' '), ... )) >>> from xlutils.margins import number_of_good_rows >>> number_of_good_rows(sheet) 5
You can limit the area searched using the nrows and ncols parameters:
>>> number_of_good_rows(sheet,nrows=3) 2 >>> number_of_good_rows(sheet,ncols=2) 4 >>> number_of_good_rows(sheet,ncols=3,nrows=3) 1
You can also pass a checking function through to the
cells_all_junk()
calls:>>> number_of_good_rows(sheet,checker=lambda c:c.value=='X') 0
-
xlutils.margins.
number_of_good_cols
(sheet, checker=None, nrows=None, ncols=None)¶ Return 1 + the index of the last column with meaningful data in it.
This function returns the number of columns in a sheet that contain anything other than junk, as defined by the
cells_all_junk()
function.For example:
>>> sheet = make_sheet(( ... ('X',' ',' ','X',' ',' '), ... (' ',' ',' ',' ',' ',' '), ... (' ',' ',' ',' ','X',' '), ... (' ','X',' ',' ',' ',' '), ... (' ',' ',' ',' ',' ',' '), ... (' ',' ',' ',' ',' ',' '), ... )) >>> from xlutils.margins import number_of_good_cols >>> number_of_good_cols(sheet) 5
You can limit the area searched using the nrows and ncols parameters:
>>> number_of_good_cols(sheet,nrows=2) 4 >>> number_of_good_cols(sheet,ncols=2) 2 >>> number_of_good_cols(sheet,ncols=3,nrows=3) 1
You can also pass a checking function through to the
cells_all_junk()
calls:>>> number_of_good_cols(sheet,checker=lambda c:c.value=='X') 0
-
xlutils.save.
save
(wb, filename_or_stream)¶ Save the supplied
xlrd.Book
to the supplied stream or filename.
-
class
xlutils.styles.
NamedStyle
(name, xf)¶ An object with
name
andxf
attributes representing a particular style in a workbook.
-
class
xlutils.styles.
Styles
(book)¶ A mapping-like object that will return a
NamedStyle
instance for the cell passed to the__getitem__()
method.
-
class
xlutils.view.
CheckSheet
(book, sheet, row_slice=None, col_slice=None)¶ A special sheet view for use in automated tests.
-
compare
(*expected)¶ Call to check whether this view contains the expected data. If it does not, a descriptive
AssertionError
will be raised. Requires testfixtures.Parameters: expected – tuples containing the data that should be present in this view.
-
-
class
xlutils.view.
CheckerView
(path, class_=None)¶ A special subclass of
View
for use in automated tests when you want to check the contents of a generated spreadsheet.Views of sheets are returned as
CheckSheet
instances which have a handycompare()
method.-
class_
¶ alias of
CheckSheet
-
-
class
xlutils.view.
Col
(name)¶ An end-inclusive column label index for use in slices, eg:
[:, Col('A'), Col('B')]
-
class
xlutils.view.
Row
(name)¶ A one-based, end-inclusive row index for use in slices, eg::
[Row(1):Row(2), :]
-
class
xlutils.view.
SheetView
(book, sheet, row_slice=None, col_slice=None)¶ A view on a sheet in a workbook. Should be created by indexing a
View
.These can be sliced to create smaller views.
Views can be iterated over to return a set of iterables, one for each row in the view. Data is returned as in the cell values with the exception of dates and times which are converted into
datetime
instances.-
__weakref__
¶ list of weak references to the object (if defined)
-
book
= None¶ The workbook used by this view.
-
sheet
= None¶ The sheet in the workbook used by this view.
-
-
class
xlutils.view.
View
(path, class_=None)¶ A view wrapper around a
Book
that allows for easy iteration over the data in a group of cells.Parameters: - path – The path of the .xls from which to create views.
- class – An class to use instead of
SheetView
for views of sheets.
-
__getitem__
(item)¶ Returns of a view of a sheet in the workbook this view is created for.
Parameters: item – either zero-based integer index or a sheet name.
-
__weakref__
¶ list of weak references to the object (if defined)