content
column (#80)xlex()
(#57).include_blank_cells = FALSE
had a bug that
returned blank cells as an empty row in the xlsx_cells()
data frame.-F10
(#26 @cablegui)._
as dates when
the underscore is followed by a date-ish character like M
(#24).include_blank_cells = FALSE
in
xlsx_cells()
(#25).maybe_xlsx()
is
provided for checking whether a file might be in the xlsx format. It is
impossible to be sure from the magic number alone, because the magic numbers
are either common to all zip files, or common to other Microsoft Office files
(e.g. .doc, .ppt).xlsx_cells()
and xlsx_formats()
replace tidy_xlsx()
, which has been
deprecated. xlsx_cells()
returns a single data frame of all the cells in
scope (the whole workbook, or chosen sheets), rather than a list of separate
data frames for each sheet. xlsx_formats()
performs orders of magnitude
faster.xlsx_validation()
imports validation rules from cells that restrict data
input, such as cells that require a selection from a drop-down list. See
the
vignette
vignette("data-validation-rules", package = "tidyxl")
.xlsx_names()
imports defined names (aka named ranges/formulas), which can
be used to filter for particular ranges of cells by name. Use is_range()
to filter for ones that are named ranges, and then read
joining rules to cells
for how to join cell ranges to cell addresses. This will become easier in a
future release.is_range()
checks whether a formula is simply ranges of cells.xlex()
tokenises formulas. This is useful for detecting
spreadsheet smells like embedded constants and deep nesting. There is a
demo Shiny app, and a
vignette
vignette("smells", package = "tidyxl")
. A vector of Excel function names
excel_functions
can be used to separated built-in functions from custom
functions. More experimental features will be implemented in the off-CRAN
package lexl before becoming part of
tidyxl.xlsx_cells()$character_formatted
is a new column for the in-cell formatting
of text (#5). This is for when different parts of text in a single cell
have been formatted differently from one another.is_date_format()
checks whether a number format string is a date format.
This is useful if a cell formula contains a number formatting string (e.g.
TEXT(45678,"yyyy")
), and you need to know that the constant 45678 is a
date in order to recover it at full resolution (rather than parsing the
character output "2025" as a year).xlsx_color_theme()
and it's British alias xlsx_colour_theme()
returns the
theme colour palette used in a file. This is useful to monitor use of a
corporate standard theme.xlsx_color_standard
and it's British alias xlsx_colour_standard
are data
frames of the standard Excel palette (red
, blue
, etc.).=A1*2
in cell B1
becomes =A2*2
in cell B2
(for more details see
issue #7).tidy_xlsx()
has been deprecated in favour of xlsx_cells()
,
which returns a data frame of all the cells in the workbook (or in the
requested sheets), and xlsx_formats()
, which returns a lookup list of cell
formats.tidy_xlsx()
and one of it's replacments xlsx_cells()
content
has been replaced by is_blank
, a logical value
indicating whether the cell contains data. Please replace !is.na(content)
with !is_blank
to filter out blank cells (ones with formatting but no
value).formula_type
has been replaced by is_array
, a logical value
indicating whether the cell's formula is an array formula or not. In Excel
array formulas are represented visually by being surrounded by curly braces
{}
.xlsx_formats()
and tidy_xlsx()
, theme colours are given by name rather
than by number, e.g. "accent6"
instead of 4
."[Cyan]0%"
)
are no longer mis-detect as dates (#21). is_date_format()
tests whether a
number format is a date format.xlsx_formats()
is now thoroughly tested, and several relatively minor bugs
fixed. For example, xlsx_formats(path)$local$fill$patternFill$patternType
consistently returns NA
and never "none"
when a pattern fill has not
been set, and escape-backslashes are consistently omitted from numFmts.xlex()
, is_range()
and the handling of relative references in shared
formulas requires a dependency on the
piton package, which wraps the
PEGTL C++ parser generator.
date1904
attribute for "false"
or "1"
to support files
created by the openxlsx
package (#8)."Normal"
) instead of by index integer. All the vectors under
x$formats$style
are named according to the style names.
x$data$sheet$style_format_id
has been renamed to x$data$sheet$style_format
and its type changed from integer (index into style formats) to character
(still an index, but looking up the named vectors by name). There are
examples in the README and vignette.NEWS.md
file to track changes to the package.