Sort() or OrderBy() #Both are synonyms
sort() - ascending order
df.sort(df.age.asc()).collect()
df.sort(df['age']).collect()
df.sort(col('age').asc()).collect()
df.sort('age', ascending=True).collect() # ascending order
For asc, If any null values, row come at the top
sort() - descending order
df.sort('age', ascending=False).collect() # descending order
df.sort(col('age').desc()).collect() # descending order
For desc, If any null values, row come at the bottom
using desc/asc functions from pyspark.sql
from pyspark.sql import asc, desc
df.sort(asc('age'), desc('name')).collect()
orderBy()
df.orderBy(df.age.asc()).collect()
df.orderBy(df['age']).collect()
df.orderBy(col('age').asc()).collect()
Dealing with nulls in sorting
Default, For asc, If any null values, row come at the top
asc_nulls_first()
Default, For desc, If any null values, row come at the bottom
desc_nulls_last()
transactionsDf.sort("predError", ascending=False)
Correct! When using DataFrame.sort() and setting ascending=False, the DataFrame will be sorted by the specified column in descending order, putting all missing values last.
Alternatively,
transactionsDf.sort("predError", ascending=False)
(or)
transactionsDf.sort(desc_nulls_last("predError"))
asc_nulls_last()
desc_nulls_first()
Composite sorting (multiple column sorting)
df.sort(['age', 'name']).collect() # pass as a list
df.sort(['age', 'name'], ascending=[0, 1]).collect() # pass as a list
df.sort(col('age').desc(), col('name').asc()).collect() # individual
Prioritized sorting
Suppose you want to order by levels or some logic - say Beginner, Intermediate, Advanced
Beginner = 0
Intermediate = 1
Advanced = 2
You have to sort by integer values 0/1/2
df1 = spark.createDataFrame([['False', 'test1', 'Beginner'], ['True', 'test2', 'Intermediate'], ['False', 'test3', 'Advanced']], ['is_student', 'name', 'type'])
df1.withColumn('new', when(col('type') == 'Beginner', 1)
.when(col('type') == 'Intermediate', 2)
.otherwise(3))\
.show()
df1.withColumn('new', expr("""
CASE
WHEN type == 'Beginner'
THEN 1
WHEN type == 'Intermediate'
THEN 2
ELSE 3
END
"""
) ).show()
No comments:
Post a Comment