Boosting Data Processing: Performance Tune Pandas

Rejected

Session Description

This talk will be a in-depth exploration of techniques to enhance the performance of Pandas, the powerful data analysis library in Python. It will cover strategies, tips, and best practices for optimizing data processing workflows, leading to faster and more efficient analysis.

Recognizing the challenges posed by big data, we're well aware that Pandas can struggle with large datasets. Given that optimization is integral to tech, this talk delves into effective strategies for accelerating Pandas operations be it simple transformations on data or data export/imports to databases. I will also cover alternate supporting libraries to use and simple modifications to existing code to speed up execution.

We will have a live demo with code snippets demonstrating the usage and performance comparison as opposed to traditional methods which are used widespread.

This session aims to address 4 key points:

1. Why pandas is slow when it comes to handling big data?

2. Slight code modifications to existing pandas code syntax

3. Using different libraries to speed up execution - like SQLAlchemy, NVIDIA’s RAPIDS cuDF library among others

4. Performance comparison between proposed and existing methods

Drawing from personal experience, I'll share tried-and-tested methods to optimize Python scripts using Pandas and reduce pipeline execution time, ultimately enhancing resource efficiency.

Key Takeaways

None

References

https://pandas.pydata.org/docs/user_guide/enhancingperf.html

Session Categories

FOSS

Speakers

Asha Holla

Analytics Developer Bloom Value

Reviews

0 %

Approvability

Approvals

Rejections

Not Sure

This talk ticks a lot of things (real world examples, deep tech problems solving). But we received some other proposals which are more aligned to the conference and its scale.

Reviewer #1

Rejected