Jianzhen Wu created LIVY-995:
--------------------------------

             Summary: JsonParseException is thrown when closing Livy session 
when using python profile
                 Key: LIVY-995
                 URL: https://issues.apache.org/jira/browse/LIVY-995
             Project: Livy
          Issue Type: Improvement
            Reporter: Jianzhen Wu
            Assignee: Jianzhen Wu


Startup  and enable spark.python.profile.
{code:java}
./bin/pyspark --master local --conf spark.python.profile=true
{code}
 
Execute code related to Spark RDD. When pyspark is closed, Pyspark will output 
profile information.
{code:java}
>>> rdd = sc.parallelize(range(100)).map(str)
>>> rdd.count()
[Stage 0:>                                                          (0 + 1) / 1]
100
>>>
============================================================
Profile of RDD<id=1>
============================================================
         244 function calls (241 primitive calls) in 0.001 seconds
 
   Ordered by: internal time, cumulative time
 
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      101    0.000    0.000    0.000    0.000 rdd.py:1237(<genexpr>)
      101    0.000    0.000    0.000    0.000 util.py:72(wrapper)
        1    0.000    0.000    0.000    0.000 serializers.py:255(dump_stream)
        1    0.000    0.000    0.000    0.000 serializers.py:213(load_stream)
        2    0.000    0.000    0.000    0.000 \{built-in method builtins.sum}
        1    0.000    0.000    0.001    0.001 worker.py:607(process)
        1    0.000    0.000    0.000    0.000 context.py:549(f)
        1    0.000    0.000    0.000    0.000 \{built-in method _pickle.dumps}
        1    0.000    0.000    0.000    0.000 serializers.py:561(read_int)
        1    0.000    0.000    0.000    0.000 serializers.py:568(write_int)
      4/1    0.000    0.000    0.000    0.000 rdd.py:2917(pipeline_func)
        1    0.000    0.000    0.000    0.000 serializers.py:426(dumps)
        1    0.000    0.000    0.000    0.000 rdd.py:1237(<lambda>)
        1    0.000    0.000    0.000    0.000 serializers.py:135(load_stream)
        2    0.000    0.000    0.000    0.000 rdd.py:1072(func)
        1    0.000    0.000    0.000    0.000 rdd.py:384(func)
        1    0.000    0.000    0.000    0.000 util.py:67(fail_on_stopiteration)
        1    0.000    0.000    0.000    0.000 
serializers.py:151(_read_with_length)
        2    0.000    0.000    0.000    0.000 context.py:546(getStart)
        3    0.000    0.000    0.000    0.000 rdd.py:416(func)
        1    0.000    0.000    0.000    0.000 
serializers.py:216(_load_stream_without_unbatching)
        2    0.000    0.000    0.000    0.000 \{method 'write' of 
'_io.BufferedWriter' objects}
        1    0.000    0.000    0.000    0.000 \{method 'read' of 
'_io.BufferedReader' objects}
        1    0.000    0.000    0.000    0.000 \{built-in method _operator.add}
        1    0.000    0.000    0.000    0.000 \{built-in method 
builtins.hasattr}
        3    0.000    0.000    0.000    0.000 \{built-in method builtins.len}
        1    0.000    0.000    0.000    0.000 \{built-in method _struct.unpack}
        1    0.000    0.000    0.000    0.000 rdd.py:1226(<lambda>)
        1    0.000    0.000    0.000    0.000 \{method 'close' of 'generator' 
objects}
        1    0.000    0.000    0.000    0.000 \{built-in method from_iterable}
        1    0.000    0.000    0.000    0.000 \{built-in method _struct.pack}
        1    0.000    0.000    0.000    0.000 \{method 'disable' of 
'_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 \{built-in method builtins.iter}
{code}
 
This is because Spark register show_profiles when Spark exit in profile.py 
{code:java}
    def add_profiler(self, id, profiler):
        """Add a profiler for RDD/UDF `id`"""
        if not self.profilers:
            if self.profile_dump_path:
                atexit.register(self.dump_profiles, self.profile_dump_path)
            else:
                atexit.register(self.show_profiles)
 
        self.profilers.append([id, profiler, False])
{code}
 
 
For Livy session, Livy does not convert the output to JSON format. And throw 
below exception:
 
{code:java}
24/01/17 11:17:30 INFO [shutdown-hook-0] ApplicationMaster: Unregistering 
ApplicationMaster with FAILED (diag message: User class threw exception: 
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('=' (code 
61)): expected a valid value (JSON String, Number, Array, Object or token 
'null', 'true' or 'false')
 at [Source: 
(String)"============================================================"; line: 
1, column: 2]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:710)
at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:635)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1952)
at 
com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:781)
at 
com.fasterxml.jackson.databind.ObjectReader._initForReading(ObjectReader.java:355)
at 
com.fasterxml.jackson.databind.ObjectReader._bindAndClose(ObjectReader.java:2023)
at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:1491)
at 
org.livy.toolkit.shaded.org.json4s.jackson.JsonMethods.parse(JsonMethods.scala:33)
at 
org.livy.toolkit.shaded.org.json4s.jackson.JsonMethods.parse$(JsonMethods.scala:20)
at 
org.livy.toolkit.shaded.org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:71)
at 
org.apache.livy.repl.PythonInterpreter.$anonfun$sendRequest$1(PythonInterpreter.scala:288)
at scala.Option.map(Option.scala:230)
at 
org.apache.livy.repl.PythonInterpreter.sendRequest(PythonInterpreter.scala:287)
at 
org.apache.livy.repl.PythonInterpreter.sendShutdownRequest(PythonInterpreter.scala:277)
at org.apache.livy.repl.ProcessInterpreter.close(ProcessInterpreter.scala:62)
at org.apache.livy.repl.PythonInterpreter.close(PythonInterpreter.scala:234)
at org.apache.livy.repl.Session.$anonfun$close$1(Session.scala:232)
at org.apache.livy.repl.Session.$anonfun$close$1$adapted(Session.scala:232)
at 
scala.collection.mutable.HashMap$$anon$2.$anonfun$foreach$3(HashMap.scala:158)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:158)
at org.apache.livy.repl.Session.close(Session.scala:232)
at org.apache.livy.toolkit.IpynbBootstrap.close(IpynbBootstrap.scala:246)
at org.apache.livy.toolkit.IpynbBootstrap$.main(IpynbBootstrap.scala:72)
at org.apache.livy.toolkit.IpynbBootstrap.main(IpynbBootstrap.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:764)
)
{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to