Another possibility would be to use a lambda function or a callable
object. This adds an overhead but would also allow you to inject new
parameters that go into the function call. It also does not require
any extra import.
obj.old_method_name = lambda *a, **kw: new_method_name(obj, *a, **kw)
A full example goes like this:
class C:
def __init__(self):
self.value = 21
def get(self):
return self.value
def new_get(self):
return self.value * 2
obj = C()
print(obj.get())
obj.get = lambda *a, **kw: new_get(obj, *a, **kw)
print(obj.get())
This would first output 21 and then 42.
--
What you are trying to do requires more than just replacing the
function _convert_cell. By default, OpenpyxlReader loads the workbook
in read_only mode, discarding all links. This means that the cell
object present in _convert_cell has no hyperlink attribute. There is
no option to make it load the links. To force it to be loaded, we need
to replace load_workbook as well. This method asks openpyxl to load
the workbook, deciding whether it will discard the links or not.
The second problem is that as soon as you instantiate an ExcelFile
object it will instantiate an OpenpyxlReader and load the file.
Leaving you with no time to replace the functions. Happily, ExcelFile
gets the engine class from a static dictionary called _engines. This
means that we can extend OpenpyxlReader, overwrite those two methods
and replace the reference in ExcelFile._engines. The full source is:
import pandas as pd
class MyOpenpyxlReader(pd.ExcelFile.OpenpyxlReader):
def load_workbook(self, filepath_or_buffer):
from openpyxl import load_workbook
return load_workbook(
filepath_or_buffer,
read_only=False,
data_only=False,
keep_links=True
)
def _convert_cell(self, cell, convert_float: bool):
value = super()._convert_cell(cell, convert_float)
if cell.hyperlink is None:
return value
else:
return (value, cell.hyperlink.target)
pd.ExcelFile._engines["openpyxl"] = MyOpenpyxlReader
df = pd.read_excel("links.xlsx")
print(df)
The source above worked on python 3.8.10, pandas 1.5.0, and openpyxl
3.0.10. The output for a sample xlsx file with the columns id, a page
name (with links), and the last access is shown next. The first
element in the second column's output tuple is the cell's text and the
second element is the cell's link:
id
page last access
0 1 (google, https://www.google.com/) 2022-04-12
1 2 (gmail, https://gmail.com/) 2022-02-06
2 3 (maps, https://www.google.com/maps) 2022-02-17
3 4 (bbc, https://bbc.co.uk/) 2022-08-30
4 5 (reddit, https://www.reddit.com/) 2022-12-02
5 6(stackoverflow, https://stackoverflow.com/) 2022-05-25
--
Should you do any of this? No.
1. What makes a good developer is his ability to create clear and
maintainable code. Any of these options are clearly not clear,
increase cognitive complexity, and reduce reliability.
2. We are manipulating internal class attributes and internal methods
(those starting with _). Internal elements are not guaranteed to stay
there over different versions, even minor updates. You should not
manipulate them unless you are working on a fixed library version,
like implementing tests and checking if the internal state has
changed, hacking it, or debugging. Python assumes you will access
these attributes wisely.
3. If you are working with other developers and you commit this code
there is a huge chance another developer is using a slightly different
pandas version that misses one of these elements. You will break the
build, your team will complain and start thinking you are a naive
developer.
4. Even if you adapt your code for multiple pandas versions you will
end up with multiple ifs and different implementations. You don't want
to maintain this over time.
5. It clearly takes more time to understand pandas' internals than
writing your reader using openpyxl. It is not cumbersome, and if it
changes the execution time from 20ms to 40ms but is much more reliable
and maintainable we surely prefer the latter.
The only scenario I see in which this would be acceptable is when you
or your boss have an important presentation in the next hour, and you
need a quick fix to make it work in order to demonstrate it. After the
presentation is over and people have validated the functionality you
should properly implement it.
Keep It Simple and Stupid (KISS)
--
Diego Souza
Wespa Intelligent Systems
Rio de Janeiro - Brasil
On Mon, Sep 19, 2022 at 1:00 PM wrote:
>
>
> From: "Weatherby,Gerard"
> Date: Mon, 19 Sep 2022 13:06:42 +
> Subject: Re: How to replace an instance method?