what to do with multiple BOMs

2021-08-19 Thread Robin Becker

Channeling unicode text experts and xml people:

I have xml entity with initial bytes ff fe ff fe which the file command says is
UTF-16, little-endian text.

I agree, but what should be done about the additional BOM.

A test output made many years ago seems to keep the extra BOM. The xml context 
is


xml file 014.xml


]>
&e;\xef\xbb\xbfdata'

which implies seems as though the extra BOM in the entity has been kept and 
processed into a different BOM meaning utf8.

I think the test file is wrong and that multiple BOM chars in the entiry should 
have been removed.

Am I right?
--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list


Re: on perhaps unloading modules?

2021-08-19 Thread Hope Rouselle
Chris Angelico  writes:

> On Tue, Aug 17, 2021 at 4:02 AM Greg Ewing
>  wrote:
>> The second best way would be to not use import_module, but to
>> exec() the student's code. That way you don't create an entry in
>> sys.modules and don't have to worry about somehow unloading the
>> module.
>
> I would agree with this. If you need to mess around with modules and
> you don't want them to be cached, avoid the normal "import" mechanism,
> and just exec yourself a module's worth of code.

Sounds like a plan.  Busy, haven't been able to try it out.  But I will.
Soon.  Thank you!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: on perhaps unloading modules?

2021-08-19 Thread Hope Rouselle
Martin Di Paola  writes:

> This may not answer your question but it may provide an alternative 
> solution.
>
> I had the same challenge that you an year ago so may be my solution will 
> work for you too.
>
> Imagine that you have a Markdown file that *documents* the expected 
> results.
>
> This is the final exam, good luck!
>
> First I'm going to load your code (the student's code):
>
> ```python
 import student
> ```
>
> Let's see if you programmed correctly a sort algorithm
>
> ```python
 data = [3, 2, 1, 3, 1, 9]
 student.sort_numbers(data)
> [1, 1, 2, 3, 3, 9]
> ```
>
> Let's now if you can choose the correct answer:
>
> ```python
 t = ["foo", "bar", "baz"]
 student.question1(t)
> "baz"
> ```
>
> Now you can run the snippets of code with:
>
>byexample -l python the_markdown_file.md
>
> What byexample does is to run the Python code, capture the output and 
> compare it with the expected result.
>
> In the above example "student.sort_numbers" must return the list
> sorted.
> That output is compared by byexample with the list written below.
>
> Advantages? Each byexample run is independent of the other and the 
> snippet of codes are executed in a separated Python process. byexample 
> takes care of the IPC.
>
> I don't know the details of your questions so I'm not sure if byexample 
> will be the tool for you. In my case I evaluate my students giving them 
> the Markdown and asking them to code the functions so they return the 
> expected values.

Currently procedures in one question are used in another question.
Nevertheless, perhaps I could (in other tests) design something
different.  Although, to be honest, I would rather not have to use
something like Markdown because that means more syntax for students.

> Depending of how many students you have you may considere to
> complement this with INGInious. It is designed to run students'
> assignments assuming nothing on the untrusted code.
>
> Links:
>
> https://byexamples.github.io/byexample/
> https://docs.inginious.org/en/v0.7/

INGInious looks pretty interesting.  Thank you!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: some problems for an introductory python test

2021-08-19 Thread Hope Rouselle
Chris Angelico  writes:

> On Tue, Aug 17, 2021 at 3:51 AM Hope Rouselle 
> wrote:
>>
>> Chris Angelico  writes:
>> >> Wow, I kinda feel the same as you here.  I think this justifies
>> >> perhaps
>> >> using a hardware solution.  (Crazy idea?! Lol.)
>> >
>> > uhhh Yes. Very crazy idea. Can't imagine why anyone would
>> > ever
>> > think about doing that.
>>
>> Lol.  Really?  I mean a certain panic button.  You know the GNU Emacs.
>> It has this queue with the implications you mentioned --- as much as it
>> can.  (It must of course get the messages from the system, otherwise it
>> can't do anything about it.)  And it has the panic button C-g.  The
>> keyboard has one the highest precedences in hardware interrupts,
>> doesn't
>> it not?  A certain very important system could have a panic button that
>> invokes a certain debugger, say, for a crisis-moment.
>>
>> But then this could be a lousy engineering strategy.  I am not an
>> expert
>> at all in any of this.  But I'm surprised with your quick
>> dismissal. :-)
>>
>> > Certainly nobody in his right mind would have WatchCat listening on
>> > the serial port's Ring Indicator interrupt, and then grab a paperclip
>> > to bridge the DTR and RI pins on an otherwise-unoccupied serial port
>> > on the back of the PC. (The DTR pin was kept high by the PC, and
>> > could
>> > therefore be used as an open power pin to bring the RI high.)
>>
>> Why not?  Misuse of hardware?  Too precious of a resource?
>>
>> > If you're curious, it's pins 4 and 9 - diagonally up and in from the
>> > short
>> > corner. 
>> > http://www.usconverters.com/index.php?main_page=page&id=61&chapter=0
>>
>> You know your pins!  That's impressive.  I thought the OS itself could
>> use something like that.  The fact that they never do... Says
>> something,
>> doesn't it?  But it's not too obvious to me.
>>
>> > And of COURSE nobody would ever take an old serial mouse, take the
>> > ball out of it, and turn it into a foot-controlled signal... although
>> > that wasn't for WatchCat, that was for clipboard management
>> > between my
>> > app and a Windows accounting package that we used. But that's a
>> > separate story.
>>
>> Lol.  I feel you're saying you would. :-)
>
> This was all a figure of speech, and the denials were all tongue in
> cheek. Not only am I saying we would, but we *did*. All of the above.

Cool! :-) 

> The Ring Indicator trick was one of the best, since we had very little
> other use for serial ports, and it didn't significantly impact the
> system during good times, but was always reliable when things went
> wrong.
>
> (And when I posted it, I could visualize the port and knew which pins
> to bridge, but had to go look up a pinout to be able to say their pin
> numbers and descriptions.)

Nice!

>> I heard of Python for the first time in the 90s.  I worked at an ISP.
>> Only one guy was really programming there, Allaire ColdFusion.  But,
>> odd enough, we used to say we would ``write a script in Python'' when
>> we meant to say we were going out for a smoke.  I think that was
>> precisely because nobody knew that ``Python'' really was.  I never
>> expected it to be a great language.  I imagined it was something like
>> Tcl.  (Lol, no offense at all towards Tcl.)
>
> Haha, that's a weird idiom!

Clueless people --- from Rio de Janeiro area in Brazil. :-)  It was
effectively just an in-joke.

> Funny you should mention Tcl.
>
> https://docs.python.org/3/library/tkinter.html

Cool!  Speaking of GUIs and Python, that Google software called Backup
and Sync (which I think it's about to be obsoleted by Google Drive) is
written in Python --- it feels a bit heavy.  The GUI too seems a bit
slow sometimes.  Haven't tried their ``Google Drive'' as a replacement
yet.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: what to do with multiple BOMs

2021-08-19 Thread MRAB

On 2021-08-19 14:07, Robin Becker wrote:

Channeling unicode text experts and xml people:

I have xml entity with initial bytes ff fe ff fe which the file command says is
UTF-16, little-endian text.

I agree, but what should be done about the additional BOM.

A test output made many years ago seems to keep the extra BOM. The xml context 
is


xml file 014.xml


]>
&e;\xef\xbb\xbfdata'

which implies seems as though the extra BOM in the entity has been kept and 
processed into a different BOM meaning utf8.

I think the test file is wrong and that multiple BOM chars in the entiry should 
have been removed.

Am I right?

The use of a BOM b'\xef\xbb\xbf' at the start of a UTF-8 file is a 
Windows thing. It's not used on non-Windows systems. Putting it in the 
middle, e.g. b'\xef\xbb\xbfdata', just looks wrong.


It looks like the contents of a UTF-8 file, with a BOM because it 
originated on a Windows system, were read in without stripping the BOM 
first.

--
https://mail.python.org/mailman/listinfo/python-list


Re: what to do with multiple BOMs

2021-08-19 Thread Richard Damon
By the rules of Unicode, that character, if not the very first character of the 
file, should be treated as a “zero-width non-breaking space”, it is NOT a BOM 
character there.

It’s presence in the files is almost certainly an error, and being caused by 
broken software or software processing files in a manner that it wasn’t 
designed for.

> On Aug 19, 2021, at 1:48 PM, Robin Becker  wrote:
> 
> Channeling unicode text experts and xml people:
> 
> I have xml entity with initial bytes ff fe ff fe which the file command says 
> is
> UTF-16, little-endian text.
> 
> I agree, but what should be done about the additional BOM.
> 
> A test output made many years ago seems to keep the extra BOM. The xml 
> context is
> 
> 
> xml file 014.xml
>  
> 
> ]>
> &e; 
> the entitity file 014.ent is bombomdata
> 
> b'\xff\xfe\xff\xfed\x00a\x00t\x00a\x00'
> 
> The old saved test output of processing is
> 
> b'\xef\xbb\xbfdata'
> 
> which implies seems as though the extra BOM in the entity has been kept and 
> processed into a different BOM meaning utf8.
> 
> I think the test file is wrong and that multiple BOM chars in the entiry 
> should have been removed.
> 
> Am I right?
> --
> Robin Becker
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ANN: Dogelog Runtime, Prolog to the Moon (2021)

2021-08-19 Thread Mostowski Collapse

Woa! The JavaScript JIT compiler is quite impressive. I now
ported Dogelog runtime to Python as well, so that I can compare
JavaScript and Python, and tested without clause indexing:

between(L,H,L) :- L =< H.
between(L,H,X) :- L < H, Y is L+1, between(Y,H,X).

setup :- between(1,255,N), M is N//2, assertz(edge(M,N)), fail.
setup :- edge(M,N), assertz(edge2(N,M)), fail.
setup.

anc(X,Y) :- edge(X, Y).
anc(X,Y) :- edge(X, Z), anc(Z, Y).

anc2(X,Y) :- edge2(Y, X).
anc2(X,Y) :- edge2(Y, Z), anc2(X, Z).

:- setup.
:- time((between(1,10,_), anc2(0,255), fail; true)).
:- time((between(1,10,_), anc(0,255), fail; true)).

The results are:

/* Python 3.10.0rc1 */
% Wall 188 ms, trim 0 ms
% Wall 5537 ms, trim 0 ms

/* JavaScript Chrome 92.0.4515.159 */
% Wall 5 ms, trim 0 ms
% Wall 147 ms, trim 0 ms
--
https://mail.python.org/mailman/listinfo/python-list


Re: ANN: Dogelog Runtime, Prolog to the Moon (2021)

2021-08-19 Thread Mostowski Collapse

Thats a factor 37.8 faster! I tested the a variant of
the Albufeira instructions Prolog VM aka ZIP, which
was also the inspiration for SWI-Prolog.

Open Source:

The Python Version of the Dogelog Runtime
https://github.com/jburse/dogelog-moon/tree/main/devel/runtimepy

The Python Test Harness
https://gist.github.com/jburse/bf6c01c7524f2611d606cb88983da9d6#file-test-py 



Mostowski Collapse schrieb:

Woa! The JavaScript JIT compiler is quite impressive. I now
ported Dogelog runtime to Python as well, so that I can compare
JavaScript and Python, and tested without clause indexing:

between(L,H,L) :- L =< H.
between(L,H,X) :- L < H, Y is L+1, between(Y,H,X).

setup :- between(1,255,N), M is N//2, assertz(edge(M,N)), fail.
setup :- edge(M,N), assertz(edge2(N,M)), fail.
setup.

anc(X,Y) :- edge(X, Y).
anc(X,Y) :- edge(X, Z), anc(Z, Y).

anc2(X,Y) :- edge2(Y, X).
anc2(X,Y) :- edge2(Y, Z), anc2(X, Z).

:- setup.
:- time((between(1,10,_), anc2(0,255), fail; true)).
:- time((between(1,10,_), anc(0,255), fail; true)).

The results are:

/* Python 3.10.0rc1 */
% Wall 188 ms, trim 0 ms
% Wall 5537 ms, trim 0 ms

/* JavaScript Chrome 92.0.4515.159 */
% Wall 5 ms, trim 0 ms
% Wall 147 ms, trim 0 ms


--
https://mail.python.org/mailman/listinfo/python-list