May 13, 2023

Pyspark describe() vs summary()

 describe() - takes column names as strings as optional args

  • describe() takes columns as string as options argos

    • df.describe().show()

    • df.describe('name').show()

df1 = spark.createDataFrame([['False', 'test1', 30], ['True', 'test2', 40]], ['is_student', 'name', 'age'])

summary() - takes statistics as strings as optional args

  • summary() takes statistics (mean, count) as string as optional args

    • df.summary().show()

    • df.summary('mean').show()

    • df.summary('count').show()

    • df.summary('stddev').show()


df1 = spark.createDataFrame([['False', 'test1', 30], ['True', 'test2', 40]], ['is_student', 'name', 'age'])


No comments:

Post a Comment