|
|
马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。
您需要 登录 才可以下载或查看,没有账号?立即注册
x
在数据分析工作中,我们经常需要将分析结果以专业、易读的格式呈现给他人。PDF格式因其跨平台兼容性和格式稳定性,成为报告分享的首选。Pandas作为Python数据分析的核心库,提供了强大的数据处理能力,但本身并不直接支持PDF导出功能。本文将详细介绍如何结合pandas和其他Python库,将数据轻松导出为专业的PDF格式报告。
准备工作
在开始之前,我们需要安装一些必要的Python库。主要需要的库包括:
• pandas:用于数据处理
• matplotlib和seaborn:用于数据可视化
• reportlab:用于PDF生成
• PyPDF2:用于PDF操作
• tabulate:用于表格格式化
• weasyprint:用于HTML/CSS到PDF的转换
可以通过以下命令安装这些库:
- pip install pandas matplotlib seaborn reportlab PyPDF2 tabulate weasyprint
复制代码
基础操作
使用DataFrame.to_string()和reportlab
最简单的方法是使用DataFrame的to_string()方法将数据转换为字符串,然后使用reportlab将其写入PDF。
- import pandas as pd
- from reportlab.lib.pagesizes import letter
- from reportlab.pdfgen import canvas
- from reportlab.lib.units import inch
- # 创建示例DataFrame
- data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
- 'Age': [25, 30, 35, 40],
- 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
- df = pd.DataFrame(data)
- # 将DataFrame转换为字符串
- df_string = df.to_string()
- # 创建PDF
- pdf_file = "basic_output.pdf"
- c = canvas.Canvas(pdf_file, pagesize=letter)
- # 设置字体
- c.setFont("Helvetica", 12)
- # 添加标题
- c.drawString(inch, 10*inch, "Basic DataFrame Export")
- # 添加DataFrame内容
- text_object = c.beginText(inch, 9*inch)
- text_object.textLines(df_string)
- c.drawText(text_object)
- # 保存PDF
- c.save()
复制代码
使用tabulate和reportlab
tabulate库可以帮助我们创建更美观的表格格式:
- import pandas as pd
- from reportlab.lib.pagesizes import letter
- from reportlab.pdfgen import canvas
- from reportlab.lib.units import inch
- from tabulate import tabulate
- # 创建示例DataFrame
- data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
- 'Age': [25, 30, 35, 40],
- 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
- df = pd.DataFrame(data)
- # 使用tabulate格式化DataFrame
- table = tabulate(df, headers='keys', tablefmt='grid')
- # 创建PDF
- pdf_file = "tabulate_output.pdf"
- c = canvas.Canvas(pdf_file, pagesize=letter)
- # 设置字体
- c.setFont("Helvetica", 10)
- # 添加标题
- c.drawString(inch, 10*inch, "DataFrame Export with Tabulate")
- # 添加表格
- text_object = c.beginText(inch, 9*inch)
- text_object.textLines(table)
- c.drawText(text_object)
- # 保存PDF
- c.save()
复制代码
使用matplotlib和PDF后端
matplotlib支持将图形直接保存为PDF格式,我们可以利用这一点来导出表格:
- import pandas as pd
- import matplotlib.pyplot as plt
- import numpy as np
- # 创建示例DataFrame
- data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
- 'Age': [25, 30, 35, 40],
- 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
- df = pd.DataFrame(data)
- # 创建图形
- fig, ax = plt.subplots(figsize=(8, 6))
- ax.axis('off') # 隐藏坐标轴
- # 创建表格
- table = ax.table(cellText=df.values,
- colLabels=df.columns,
- cellLoc='center',
- loc='center')
- # 设置表格样式
- table.auto_set_font_size(False)
- table.set_fontsize(12)
- table.scale(1, 1.5)
- # 添加标题
- plt.title("DataFrame Export with Matplotlib", fontsize=16, pad=20)
- # 保存为PDF
- plt.savefig("matplotlib_output.pdf", bbox_inches='tight')
- plt.close()
复制代码
中级技巧
使用reportlab的Table类
reportlab提供了更高级的表格功能,可以让我们更好地控制表格的样式和布局:
- import pandas as pd
- from reportlab.lib.pagesizes import letter
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph
- from reportlab.lib.styles import getSampleStyleSheet
- from reportlab.lib import colors
- # 创建示例DataFrame
- data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
- 'Age': [25, 30, 35, 40],
- 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
- df = pd.DataFrame(data)
- # 准备数据
- data_for_table = [df.columns.tolist()] + df.values.tolist()
- # 创建PDF文档
- doc = SimpleDocTemplate("reportlab_table.pdf", pagesize=letter)
- elements = []
- # 添加标题
- styles = getSampleStyleSheet()
- title = Paragraph("DataFrame Export with ReportLab Table", styles['Title'])
- elements.append(title)
- # 创建表格
- table = Table(data_for_table)
- # 设置表格样式
- style = TableStyle([
- ('BACKGROUND', (0, 0), (-1, 0), colors.grey),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke),
- ('ALIGN', (0, 0), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 14),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
- ('BACKGROUND', (0, 1), (-1, -1), colors.beige),
- ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 1), (-1, -1), 12),
- ('GRID', (0, 0), (-1, -1), 1, colors.black)
- ])
- table.setStyle(style)
- # 添加表格到文档
- elements.append(table)
- # 构建PDF
- doc.build(elements)
复制代码
添加样式和格式化
我们可以进一步自定义表格的样式,包括颜色、字体、边框等:
- import pandas as pd
- from reportlab.lib.pagesizes import letter, landscape
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Spacer
- from reportlab.lib.styles import getSampleStyleSheet
- from reportlab.lib import colors
- from reportlab.lib.units import inch
- # 创建示例DataFrame
- data = {'Product': ['A', 'B', 'C', 'D', 'E'],
- 'Sales': [1200, 1800, 1500, 2100, 1900],
- 'Profit': [300, 450, 375, 525, 475],
- 'Margin': [0.25, 0.25, 0.25, 0.25, 0.25]}
- df = pd.DataFrame(data)
- # 格式化数值列
- df['Sales'] = df['Sales'].map('${:,.2f}'.format)
- df['Profit'] = df['Profit'].map('${:,.2f}'.format)
- df['Margin'] = df['Margin'].map('{:.2%}'.format)
- # 准备数据
- data_for_table = [df.columns.tolist()] + df.values.tolist()
- # 创建PDF文档(横向布局)
- doc = SimpleDocTemplate("styled_table.pdf", pagesize=landscape(letter))
- elements = []
- # 添加标题
- styles = getSampleStyleSheet()
- title = Paragraph("Sales Report with Styled Table", styles['Title'])
- elements.append(title)
- elements.append(Spacer(1, 0.25*inch))
- # 添加描述
- description = Paragraph("This report shows the sales performance of different products.", styles['Normal'])
- elements.append(description)
- elements.append(Spacer(1, 0.25*inch))
- # 创建表格
- table = Table(data_for_table, colWidths=[1.5*inch, 1.5*inch, 1.5*inch, 1.5*inch])
- # 设置表格样式
- style = TableStyle([
- # 标题行样式
- ('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 14),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
-
- # 数据行样式
- ('ALIGN', (0, 1), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 1), (-1, -1), 12),
-
- # 交替行颜色
- ('BACKGROUND', (0, 1), (-1, 1), colors.lightgrey),
- ('BACKGROUND', (0, 2), (-1, 2), colors.white),
- ('BACKGROUND', (0, 3), (-1, 3), colors.lightgrey),
- ('BACKGROUND', (0, 4), (-1, 4), colors.white),
- ('BACKGROUND', (0, 5), (-1, 5), colors.lightgrey),
-
- # 边框
- ('GRID', (0, 0), (-1, -1), 1, colors.black),
- ('BOX', (0, 0), (-1, -1), 2, colors.black),
-
- # 特定单元格样式(高亮最高销售额)
- ('BACKGROUND', (1, 3), (1, 3), colors.lightgreen),
- ])
- table.setStyle(style)
- # 添加表格到文档
- elements.append(table)
- # 添加页脚
- footer = Paragraph("Generated on: " + pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S"), styles['Normal'])
- elements.append(Spacer(1, 0.5*inch))
- elements.append(footer)
- # 构建PDF
- doc.build(elements)
复制代码
使用HTML和CSS与weasyprint
weasyprint允许我们使用HTML和CSS来设计PDF,这为那些熟悉Web开发的人提供了更灵活的选项:
- import pandas as pd
- import weasyprint
- # 创建示例DataFrame
- data = {'Product': ['A', 'B', 'C', 'D', 'E'],
- 'Sales': [1200, 1800, 1500, 2100, 1900],
- 'Profit': [300, 450, 375, 525, 475],
- 'Margin': [0.25, 0.25, 0.25, 0.25, 0.25]}
- df = pd.DataFrame(data)
- # 格式化数值列
- df['Sales'] = df['Sales'].map('${:,.2f}'.format)
- df['Profit'] = df['Profit'].map('${:,.2f}'.format)
- df['Margin'] = df['Margin'].map('{:.2%}'.format)
- # 将DataFrame转换为HTML表格
- html_table = df.to_html(classes='styled-table', index=False)
- # 创建完整的HTML文档
- html_content = f"""
- <!DOCTYPE html>
- <html>
- <head>
- <title>Sales Report</title>
- <style>
- body {{
- font-family: Arial, sans-serif;
- margin: 20px;
- }}
- h1 {{
- color: #2c3e50;
- text-align: center;
- }}
- .styled-table {{
- border-collapse: collapse;
- margin: 25px 0;
- font-size: 0.9em;
- font-family: sans-serif;
- min-width: 400px;
- box-shadow: 0 0 20px rgba(0, 0, 0, 0.15);
- }}
- .styled-table thead tr {{
- background-color: #009879;
- color: #ffffff;
- text-align: left;
- }}
- .styled-table th,
- .styled-table td {{
- padding: 12px 15px;
- text-align: center;
- }}
- .styled-table tbody tr {{
- border-bottom: 1px solid #dddddd;
- }}
- .styled-table tbody tr:nth-of-type(even) {{
- background-color: #f3f3f3;
- }}
- .styled-table tbody tr:last-of-type {{
- border-bottom: 2px solid #009879;
- }}
- .styled-table tbody tr.active-row {{
- font-weight: bold;
- color: #009879;
- }}
- .footer {{
- text-align: right;
- font-size: 10px;
- color: #777;
- margin-top: 30px;
- }}
- </style>
- </head>
- <body>
- <h1>Sales Report</h1>
- {html_table}
- <div class="footer">
- Generated on: {pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S")}
- </div>
- </body>
- </html>
- """
- # 使用weasyprint将HTML转换为PDF
- weasyprint.HTML(string=html_content).write_pdf("html_css_output.pdf")
复制代码
高级技巧
创建多页PDF
当数据量很大时,我们需要将表格分布在多个页面上:
- import pandas as pd
- from reportlab.lib.pagesizes import letter
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, PageBreak
- from reportlab.lib.styles import getSampleStyleSheet
- from reportlab.lib import colors
- from reportlab.lib.units import inch
- import numpy as np
- # 创建大型示例DataFrame
- np.random.seed(42)
- products = [f'Product-{i}' for i in range(1, 101)]
- sales = np.random.randint(100, 5000, size=100)
- profit = np.random.randint(10, 1000, size=100)
- margin = profit / sales
- df = pd.DataFrame({
- 'Product': products,
- 'Sales': sales,
- 'Profit': profit,
- 'Margin': margin
- })
- # 格式化数值列
- df['Sales'] = df['Sales'].map('${:,.2f}'.format)
- df['Profit'] = df['Profit'].map('${:,.2f}'.format)
- df['Margin'] = df['Margin'].map('{:.2%}'.format)
- # 创建PDF文档
- doc = SimpleDocTemplate("multipage_table.pdf", pagesize=letter)
- elements = []
- # 添加标题
- styles = getSampleStyleSheet()
- title = Paragraph("Multi-Page Sales Report", styles['Title'])
- elements.append(title)
- # 每页显示的行数(不包括标题行)
- rows_per_page = 20
- # 计算总页数
- total_pages = (len(df) - 1) // rows_per_page + 1
- # 分页创建表格
- for page_num in range(total_pages):
- start_idx = page_num * rows_per_page
- end_idx = min((page_num + 1) * rows_per_page, len(df))
-
- # 获取当前页的数据
- page_data = df.iloc[start_idx:end_idx]
-
- # 准备表格数据
- if page_num == 0: # 第一页包含标题行
- data_for_table = [df.columns.tolist()] + page_data.values.tolist()
- else:
- data_for_table = page_data.values.tolist()
-
- # 创建表格
- table = Table(data_for_table, colWidths=[1.5*inch, 1.5*inch, 1.5*inch, 1.5*inch])
-
- # 设置表格样式
- style = TableStyle([
- # 标题行样式(仅第一页)
- *([('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 14),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12)] if page_num == 0 else []),
-
- # 数据行样式
- ('ALIGN', (0, 0 if page_num > 0 else 1), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 0 if page_num > 0 else 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 0 if page_num > 0 else 1), (-1, -1), 10),
-
- # 交替行颜色
- *[('BACKGROUND', (0, i), (-1, i), colors.lightgrey)
- for i in range(1 if page_num == 0 else 0, len(data_for_table), 2)],
-
- # 边框
- ('GRID', (0, 0 if page_num > 0 else 1), (-1, -1), 1, colors.black),
- ('BOX', (0, 0 if page_num > 0 else 1), (-1, -1), 2, colors.black),
- ])
- table.setStyle(style)
-
- # 添加表格到文档
- elements.append(table)
-
- # 添加页码
- page_info = Paragraph(f"Page {page_num + 1} of {total_pages}", styles['Normal'])
- elements.append(page_info)
-
- # 如果不是最后一页,添加分页符
- if page_num < total_pages - 1:
- elements.append(PageBreak())
- # 构建PDF
- doc.build(elements)
复制代码
添加图表和可视化
在报告中添加图表可以使数据更加直观:
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- from reportlab.lib.pagesizes import letter
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Image, Spacer
- from reportlab.lib.styles import getSampleStyleSheet
- from reportlab.lib import colors
- from reportlab.lib.units import inch
- import io
- # 创建示例DataFrame
- np.random.seed(42)
- categories = ['Category A', 'Category B', 'Category C', 'Category D', 'Category E']
- values = np.random.randint(10, 100, size=5)
- df = pd.DataFrame({
- 'Category': categories,
- 'Value': values
- })
- # 创建图表
- plt.figure(figsize=(8, 6))
- bars = plt.bar(df['Category'], df['Value'], color='skyblue')
- plt.title('Category Values', fontsize=16)
- plt.xlabel('Category', fontsize=12)
- plt.ylabel('Value', fontsize=12)
- plt.grid(axis='y', linestyle='--', alpha=0.7)
- # 在柱状图上添加数值标签
- for bar in bars:
- height = bar.get_height()
- plt.text(bar.get_x() + bar.get_width()/2., height,
- f'{height}',
- ha='center', va='bottom')
- # 将图表保存到内存中
- img_buffer = io.BytesIO()
- plt.savefig(img_buffer, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- # 将缓冲区位置重置到开始
- img_buffer.seek(0)
- # 创建PDF文档
- doc = SimpleDocTemplate("report_with_chart.pdf", pagesize=letter)
- elements = []
- # 添加标题
- styles = getSampleStyleSheet()
- title = Paragraph("Report with Chart and Table", styles['Title'])
- elements.append(title)
- elements.append(Spacer(1, 0.25*inch))
- # 添加图表
- chart = Image(img_buffer, width=6*inch, height=4.5*inch)
- elements.append(chart)
- elements.append(Spacer(1, 0.5*inch))
- # 准备表格数据
- data_for_table = [df.columns.tolist()] + df.values.tolist()
- # 创建表格
- table = Table(data_for_table, colWidths=[3*inch, 2*inch])
- # 设置表格样式
- style = TableStyle([
- # 标题行样式
- ('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 14),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
-
- # 数据行样式
- ('ALIGN', (0, 1), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 1), (-1, -1), 12),
-
- # 交替行颜色
- ('BACKGROUND', (0, 1), (-1, 1), colors.lightgrey),
- ('BACKGROUND', (0, 2), (-1, 2), colors.white),
- ('BACKGROUND', (0, 3), (-1, 3), colors.lightgrey),
- ('BACKGROUND', (0, 4), (-1, 4), colors.white),
- ('BACKGROUND', (0, 5), (-1, 5), colors.lightgrey),
-
- # 边框
- ('GRID', (0, 0), (-1, -1), 1, colors.black),
- ('BOX', (0, 0), (-1, -1), 2, colors.black),
- ])
- table.setStyle(style)
- # 添加表格到文档
- elements.append(table)
- # 添加分析文本
- analysis_text = """
- This report presents the values across different categories. As shown in the chart and table above,
- Category D has the highest value, while Category A has the lowest. The data suggests a varied distribution
- of values across all categories.
- """
- analysis = Paragraph(analysis_text, styles['Normal'])
- elements.append(Spacer(1, 0.5*inch))
- elements.append(analysis)
- # 添加页脚
- footer = Paragraph("Generated on: " + pd.Timestamp.now().strftime("%Y-%m-%d %H:%M:%S"), styles['Normal'])
- elements.append(Spacer(1, 0.5*inch))
- elements.append(footer)
- # 构建PDF
- doc.build(elements)
复制代码
创建自定义模板和自动化报告
我们可以创建一个函数来自动化报告生成过程:
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- from reportlab.lib.pagesizes import letter, landscape
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Image, Spacer, PageBreak
- from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
- from reportlab.lib.units import inch
- from reportlab.lib import colors
- import io
- from datetime import datetime
- def generate_sales_report(data, output_filename, company_name="Your Company"):
- """
- 生成销售报告的函数
-
- 参数:
- data -- 包含销售数据的DataFrame
- output_filename -- 输出PDF文件名
- company_name -- 公司名称
- """
- # 创建PDF文档
- doc = SimpleDocTemplate(output_filename, pagesize=landscape(letter))
- elements = []
-
- # 获取样式
- styles = getSampleStyleSheet()
-
- # 创建自定义标题样式
- title_style = ParagraphStyle(
- 'CustomTitle',
- parent=styles['Title'],
- fontSize=24,
- textColor=colors.darkblue,
- alignment=1, # 居中
- spaceAfter=30
- )
-
- # 添加标题
- title = Paragraph(f"{company_name} Sales Report", title_style)
- elements.append(title)
-
- # 添加日期
- date_paragraph = Paragraph(f"Report Date: {datetime.now().strftime('%Y-%m-%d')}", styles['Normal'])
- elements.append(date_paragraph)
- elements.append(Spacer(1, 0.5*inch))
-
- # 创建总销售额图表
- plt.figure(figsize=(10, 6))
- plt.bar(data['Product'], data['Sales'], color='skyblue')
- plt.title('Sales by Product', fontsize=16)
- plt.xlabel('Product', fontsize=12)
- plt.ylabel('Sales ($)', fontsize=12)
- plt.grid(axis='y', linestyle='--', alpha=0.7)
-
- # 在柱状图上添加数值标签
- for i, v in enumerate(data['Sales']):
- plt.text(i, v + max(data['Sales']) * 0.01, f"${v:,.0f}", ha='center')
-
- # 将图表保存到内存中
- img_buffer = io.BytesIO()
- plt.savefig(img_buffer, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- img_buffer.seek(0)
-
- # 添加图表
- chart = Image(img_buffer, width=7*inch, height=4*inch)
- elements.append(chart)
- elements.append(Spacer(1, 0.5*inch))
-
- # 准备表格数据
- table_data = [data.columns.tolist()] + data.values.tolist()
-
- # 创建表格
- table = Table(table_data, colWidths=[2*inch, 1.5*inch, 1.5*inch, 1.5*inch])
-
- # 设置表格样式
- style = TableStyle([
- # 标题行样式
- ('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 14),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
-
- # 数据行样式
- ('ALIGN', (0, 1), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 1), (-1, -1), 12),
-
- # 交替行颜色
- *[('BACKGROUND', (0, i), (-1, i), colors.lightgrey)
- for i in range(1, len(table_data), 2)],
-
- # 边框
- ('GRID', (0, 0), (-1, -1), 1, colors.black),
- ('BOX', (0, 0), (-1, -1), 2, colors.black),
- ])
- table.setStyle(style)
-
- # 添加表格到文档
- elements.append(table)
- elements.append(Spacer(1, 0.5*inch))
-
- # 添加摘要信息
- total_sales = data['Sales'].sum()
- total_profit = data['Profit'].sum()
- avg_margin = data['Margin'].mean()
-
- summary_text = f"""
- <b>Summary:</b><br/>
- Total Sales: ${total_sales:,.2f}<br/>
- Total Profit: ${total_profit:,.2f}<br/>
- Average Margin: {avg_margin:.2%}<br/>
- Top Performing Product: {data.loc[data['Sales'].idxmax(), 'Product']} (${data['Sales'].max():,.2f})
- """
-
- summary = Paragraph(summary_text, styles['Normal'])
- elements.append(summary)
- elements.append(Spacer(1, 0.5*inch))
-
- # 创建利润率图表
- plt.figure(figsize=(10, 6))
- plt.bar(data['Product'], data['Margin'], color='lightgreen')
- plt.title('Profit Margin by Product', fontsize=16)
- plt.xlabel('Product', fontsize=12)
- plt.ylabel('Margin (%)', fontsize=12)
- plt.grid(axis='y', linestyle='--', alpha=0.7)
-
- # 在柱状图上添加数值标签
- for i, v in enumerate(data['Margin']):
- plt.text(i, v + max(data['Margin']) * 0.01, f"{v:.1%}", ha='center')
-
- # 将图表保存到内存中
- img_buffer2 = io.BytesIO()
- plt.savefig(img_buffer2, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- img_buffer2.seek(0)
-
- # 添加图表
- chart2 = Image(img_buffer2, width=7*inch, height=4*inch)
- elements.append(chart2)
- elements.append(Spacer(1, 0.5*inch))
-
- # 添加页脚
- footer_text = f"This report was automatically generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} by {company_name}."
- footer = Paragraph(footer_text, styles['Normal'])
- elements.append(footer)
-
- # 构建PDF
- doc.build(elements)
- print(f"Report generated successfully: {output_filename}")
- # 示例使用
- if __name__ == "__main__":
- # 创建示例数据
- np.random.seed(42)
- products = [f'Product-{chr(65+i)}' for i in range(8)]
- sales = np.random.randint(10000, 50000, size=8)
- profit = np.random.randint(1000, 15000, size=8)
- margin = profit / sales
-
- sales_data = pd.DataFrame({
- 'Product': products,
- 'Sales': sales,
- 'Profit': profit,
- 'Margin': margin
- })
-
- # 生成报告
- generate_sales_report(sales_data, "automated_sales_report.pdf", "ABC Corporation")
复制代码
实际应用案例
销售数据分析报告
让我们创建一个完整的销售数据分析报告,包括数据汇总、趋势分析和可视化:
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- import seaborn as sns
- from reportlab.lib.pagesizes import letter, landscape
- from reportlab.platypus import SimpleDocTemplate, Table, TableStyle, Paragraph, Image, Spacer, PageBreak
- from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
- from reportlab.lib.units import inch
- from reportlab.lib import colors
- import io
- from datetime import datetime, timedelta
- # 创建模拟销售数据
- np.random.seed(42)
- date_range = pd.date_range(start='2023-01-01', end='2023-12-31', freq='D')
- products = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E']
- regions = ['North', 'South', 'East', 'West']
- # 生成数据
- data = []
- for date in date_range:
- for product in products:
- for region in regions:
- # 基础销量,带有季节性和随机性
- base_sales = np.random.randint(10, 100)
- # 季节性因素(夏季销量更高)
- seasonal_factor = 1 + 0.3 * np.sin(2 * np.pi * date.dayofyear / 365)
- # 周末因素
- weekend_factor = 1.2 if date.dayofweek >= 5 else 1.0
- # 计算最终销量
- sales = int(base_sales * seasonal_factor * weekend_factor)
- # 计算收入和利润
- price = np.random.uniform(10, 50)
- cost = price * np.random.uniform(0.4, 0.7)
- revenue = sales * price
- profit = sales * (price - cost)
-
- data.append([date, product, region, sales, price, cost, revenue, profit])
- # 创建DataFrame
- columns = ['Date', 'Product', 'Region', 'Sales', 'Price', 'Cost', 'Revenue', 'Profit']
- sales_df = pd.DataFrame(data, columns=columns)
- # 按月汇总数据
- sales_df['Month'] = sales_df['Date'].dt.to_period('M')
- monthly_summary = sales_df.groupby(['Month', 'Product']).agg({
- 'Sales': 'sum',
- 'Revenue': 'sum',
- 'Profit': 'sum'
- }).reset_index()
- # 按产品和地区汇总数据
- product_region_summary = sales_df.groupby(['Product', 'Region']).agg({
- 'Sales': 'sum',
- 'Revenue': 'sum',
- 'Profit': 'sum'
- }).reset_index()
- # 创建PDF报告
- doc = SimpleDocTemplate("sales_analysis_report.pdf", pagesize=letter)
- elements = []
- # 获取样式
- styles = getSampleStyleSheet()
- # 添加标题
- title = Paragraph("2023 Sales Analysis Report", styles['Title'])
- elements.append(title)
- elements.append(Spacer(1, 0.25*inch))
- # 添加报告日期
- date_paragraph = Paragraph(f"Report Date: {datetime.now().strftime('%Y-%m-%d')}", styles['Normal'])
- elements.append(date_paragraph)
- elements.append(Spacer(1, 0.5*inch))
- # 添加执行摘要
- exec_summary = """
- <b>Executive Summary:</b><br/><br/>
- This report provides a comprehensive analysis of sales performance for the year 2023.
- Key findings include:<br/>
- - Total revenue for the year was ${:,.2f}<br/>
- - Total profit was ${:,.2f}<br/>
- - The best performing product was {} with ${:,.2f} in revenue<br/>
- - Sales showed strong seasonal patterns, with peak performance in the summer months
- """.format(
- sales_df['Revenue'].sum(),
- sales_df['Profit'].sum(),
- sales_df.groupby('Product')['Revenue'].sum().idxmax(),
- sales_df.groupby('Product')['Revenue'].sum().max()
- )
- summary = Paragraph(exec_summary, styles['Normal'])
- elements.append(summary)
- elements.append(PageBreak())
- # 添加月度销售趋势图表
- plt.figure(figsize=(10, 6))
- monthly_trend = sales_df.groupby('Month')['Revenue'].sum()
- plt.plot(monthly_trend.index.astype(str), monthly_trend.values, marker='o', linewidth=2)
- plt.title('Monthly Revenue Trend', fontsize=16)
- plt.xlabel('Month', fontsize=12)
- plt.ylabel('Revenue ($)', fontsize=12)
- plt.grid(True, linestyle='--', alpha=0.7)
- plt.xticks(rotation=45)
- plt.tight_layout()
- # 将图表保存到内存中
- img_buffer = io.BytesIO()
- plt.savefig(img_buffer, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- img_buffer.seek(0)
- # 添加图表
- chart = Image(img_buffer, width=7*inch, height=4*inch)
- elements.append(chart)
- elements.append(Spacer(1, 0.5*inch))
- # 添加产品销售比较图表
- plt.figure(figsize=(10, 6))
- product_sales = sales_df.groupby('Product')['Revenue'].sum()
- plt.bar(product_sales.index, product_sales.values, color=sns.color_palette('viridis', len(product_sales)))
- plt.title('Revenue by Product', fontsize=16)
- plt.xlabel('Product', fontsize=12)
- plt.ylabel('Revenue ($)', fontsize=12)
- plt.grid(axis='y', linestyle='--', alpha=0.7)
- # 在柱状图上添加数值标签
- for i, v in enumerate(product_sales.values):
- plt.text(i, v + max(product_sales.values) * 0.01, f"${v:,.0f}", ha='center')
- plt.tight_layout()
- # 将图表保存到内存中
- img_buffer2 = io.BytesIO()
- plt.savefig(img_buffer2, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- img_buffer2.seek(0)
- # 添加图表
- chart2 = Image(img_buffer2, width=7*inch, height=4*inch)
- elements.append(chart2)
- elements.append(Spacer(1, 0.5*inch))
- # 添加地区销售热图
- region_product_pivot = sales_df.pivot_table(
- values='Revenue',
- index='Region',
- columns='Product',
- aggfunc='sum'
- )
- plt.figure(figsize=(10, 6))
- sns.heatmap(region_product_pivot, annot=True, fmt='.0f', cmap='YlGnBu', linewidths=.5)
- plt.title('Revenue Heatmap by Region and Product', fontsize=16)
- plt.tight_layout()
- # 将图表保存到内存中
- img_buffer3 = io.BytesIO()
- plt.savefig(img_buffer3, format='png', dpi=300, bbox_inches='tight')
- plt.close()
- img_buffer3.seek(0)
- # 添加图表
- chart3 = Image(img_buffer3, width=7*inch, height=4*inch)
- elements.append(chart3)
- elements.append(PageBreak())
- # 添加产品月度销售表格
- # 准备表格数据
- product_month_pivot = sales_df.pivot_table(
- values='Revenue',
- index='Product',
- columns='Month',
- aggfunc='sum'
- )
- # 格式化数据
- product_month_pivot = product_month_pivot.applymap(lambda x: f"${x:,.0f}")
- # 添加总计行
- product_month_pivot.loc['Total'] = product_month_pivot.sum(axis=0)
- product_month_pivot['Total'] = product_month_pivot.sum(axis=1)
- # 转换为列表格式
- table_data = [product_month_pivot.index.tolist()] + [
- [product_month_pivot.index[i]] + product_month_pivot.iloc[i].tolist()
- for i in range(len(product_month_pivot))
- ]
- # 创建表格
- table = Table(table_data)
- # 设置表格样式
- style = TableStyle([
- # 标题行样式
- ('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 10),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
-
- # 产品列样式
- ('BACKGROUND', (0, 1), (0, -1), colors.lightgrey),
- ('ALIGN', (0, 1), (0, -1), 'LEFT'),
-
- # 总计行样式
- ('BACKGROUND', (0, -1), (-1, -1), colors.darkblue),
- ('TEXTCOLOR', (0, -1), (-1, -1), colors.white),
- ('FONTNAME', (0, -1), (-1, -1), 'Helvetica-Bold'),
-
- # 数据单元格样式
- ('ALIGN', (1, 1), (-2, -2), 'RIGHT'),
- ('FONTNAME', (1, 1), (-2, -2), 'Helvetica'),
- ('FONTSIZE', (1, 1), (-2, -2), 9),
-
- # 边框
- ('GRID', (0, 0), (-1, -1), 1, colors.black),
- ('BOX', (0, 0), (-1, -1), 2, colors.black),
- ])
- table.setStyle(style)
- # 添加表格标题
- table_title = Paragraph("Monthly Revenue by Product ($)", styles['Heading2'])
- elements.append(table_title)
- elements.append(Spacer(1, 0.25*inch))
- # 添加表格
- elements.append(table)
- elements.append(PageBreak())
- # 添加地区销售表格
- # 准备表格数据
- region_summary = sales_df.groupby('Region').agg({
- 'Sales': 'sum',
- 'Revenue': 'sum',
- 'Profit': 'sum'
- }).reset_index()
- # 计算利润率
- region_summary['Profit Margin'] = region_summary['Profit'] / region_summary['Revenue']
- # 格式化数据
- region_summary['Sales'] = region_summary['Sales'].map('{:,.0f}'.format)
- region_summary['Revenue'] = region_summary['Revenue'].map('${:,.2f}'.format)
- region_summary['Profit'] = region_summary['Profit'].map('${:,.2f}'.format)
- region_summary['Profit Margin'] = region_summary['Profit Margin'].map('{:.2%}'.format)
- # 转换为列表格式
- table_data2 = [region_summary.columns.tolist()] + region_summary.values.tolist()
- # 创建表格
- table2 = Table(table_data2, colWidths=[1.5*inch, 1.5*inch, 1.5*inch, 1.5*inch, 1.5*inch])
- # 设置表格样式
- style2 = TableStyle([
- # 标题行样式
- ('BACKGROUND', (0, 0), (-1, 0), colors.darkblue),
- ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
- ('ALIGN', (0, 0), (-1, 0), 'CENTER'),
- ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
- ('FONTSIZE', (0, 0), (-1, 0), 12),
- ('BOTTOMPADDING', (0, 0), (-1, 0), 12),
-
- # 数据行样式
- ('ALIGN', (0, 1), (-1, -1), 'CENTER'),
- ('FONTNAME', (0, 1), (-1, -1), 'Helvetica'),
- ('FONTSIZE', (0, 1), (-1, -1), 11),
-
- # 交替行颜色
- *[('BACKGROUND', (0, i), (-1, i), colors.lightgrey)
- for i in range(1, len(table_data2), 2)],
-
- # 边框
- ('GRID', (0, 0), (-1, -1), 1, colors.black),
- ('BOX', (0, 0), (-1, -1), 2, colors.black),
- ])
- table2.setStyle(style2)
- # 添加表格标题
- table_title2 = Paragraph("Sales Summary by Region", styles['Heading2'])
- elements.append(table_title2)
- elements.append(Spacer(1, 0.25*inch))
- # 添加表格
- elements.append(table2)
- elements.append(Spacer(1, 0.5*inch))
- # 添加结论和建议
- conclusions = """
- <b>Conclusions and Recommendations:</b><br/><br/>
- Based on the analysis of 2023 sales data, we have identified several key insights and opportunities:<br/><br/>
- 1. <b>Seasonal Trends:</b> Sales show clear seasonal patterns with peak performance during summer months.
- We recommend increasing marketing efforts and inventory levels before these peak periods.<br/><br/>
- 2. <b>Product Performance:</b> {} has been the top performer throughout the year. Consider allocating
- more resources to this product line and analyzing its success factors for application to other products.<br/><br/>
- 3. <b>Regional Differences:</b> The {} region shows the highest profit margins. Investigate the factors
- contributing to this success and consider applying these strategies to underperforming regions.<br/><br/>
- 4. <b>Growth Opportunities:</b> The {} region shows potential for growth. Develop targeted marketing
- campaigns and consider introducing product bundles tailored to this market.
- """.format(
- sales_df.groupby('Product')['Revenue'].sum().idxmax(),
- region_summary.loc[region_summary['Profit Margin'].astype(str).str.strip('%').astype(float).idxmax(), 'Region'],
- region_summary.loc[region_summary['Profit Margin'].astype(str).str.strip('%').astype(float).idxmin(), 'Region']
- )
- conclusion_paragraph = Paragraph(conclusions, styles['Normal'])
- elements.append(conclusion_paragraph)
- elements.append(Spacer(1, 0.5*inch))
- # 添加页脚
- footer_text = f"This report was automatically generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}."
- footer = Paragraph(footer_text, styles['Normal'])
- elements.append(footer)
- # 构建PDF
- doc.build(elements)
- print("Sales analysis report generated successfully: sales_analysis_report.pdf")
复制代码
常见问题和解决方案
问题1:表格内容超出页面边界
当表格内容太多时,可能会超出PDF页面的边界。解决方案包括:
- # 调整表格大小和字体
- table = Table(data_for_table, colWidths=[1*inch, 1*inch, 1*inch]) # 减小列宽
- style = TableStyle([
- ('FONTSIZE', (0, 0), (-1, -1), 8), # 减小字体大小
- # 其他样式设置...
- ])
- table.setStyle(style)
- # 或者使用自动调整列宽
- table = Table(data_for_table)
- style = TableStyle([
- ('FONTSIZE', (0, 0), (-1, -1), 10),
- # 其他样式设置...
- ])
- table.setStyle(style)
- # 让表格自动调整大小以适应内容
- table._argW[0] = 1.5*inch # 设置第一列宽度
- table._argW[1] = 1.5*inch # 设置第二列宽度
- # 以此类推...
复制代码
问题2:中文字符显示问题
在PDF中显示中文字符时可能会遇到编码问题。解决方案:
- from reportlab.pdfbase import pdfmetrics
- from reportlab.pdfbase.ttfonts import TTFont
- # 注册中文字体
- pdfmetrics.registerFont(TTFont('SimSun', 'SimSun.ttf')) # 确保字体文件存在
- # 在样式中使用中文字体
- style = TableStyle([
- ('FONTNAME', (0, 0), (-1, -1), 'SimSun'),
- # 其他样式设置...
- ])
复制代码
问题3:图像分辨率过低
在PDF中插入的图像可能分辨率过低,影响打印质量。解决方案:
- # 提高保存图像时的DPI
- plt.savefig(img_buffer, format='png', dpi=600, bbox_inches='tight')
- # 或者使用矢量图形格式(如SVG)
- plt.savefig(img_buffer, format='svg', bbox_inches='tight')
复制代码
问题4:大型DataFrame处理缓慢
处理大型DataFrame时可能会遇到性能问题。解决方案:
- # 分批处理数据
- chunk_size = 1000 # 每批处理的行数
- for i in range(0, len(df), chunk_size):
- chunk = df.iloc[i:i+chunk_size]
- # 处理当前批次的数据
- process_chunk(chunk)
- # 或者使用数据采样
- sample_df = df.sample(n=1000) # 随机采样1000行
- # 处理采样后的数据
- process_sample(sample_df)
复制代码
问题5:PDF文件过大
生成的PDF文件可能过大,不利于分享和存储。解决方案:
- # 压缩图像
- from PIL import Image
- def compress_image(input_buffer, quality=85):
- img = Image.open(input_buffer)
- output_buffer = io.BytesIO()
- img.save(output_buffer, format='JPEG', quality=quality, optimize=True)
- output_buffer.seek(0)
- return output_buffer
- # 使用压缩后的图像
- compressed_buffer = compress_image(img_buffer)
- chart = Image(compressed_buffer, width=7*inch, height=4*inch)
复制代码
结论和最佳实践
在本文中,我们详细介绍了如何使用pandas和其他Python库将数据导出为PDF格式,从基础操作到高级技巧。以下是一些最佳实践建议:
1. 选择合适的工具:根据需求选择最适合的库。对于简单的表格导出,reportlab是一个不错的选择;对于复杂的布局和样式,weasyprint提供了更大的灵活性。
2. 规划布局:在开始编写代码之前,先规划好PDF的布局,包括标题、表格、图表和文本的位置。
3. 保持一致性:在整个报告中保持一致的样式,包括字体、颜色和间距,以创建专业的外观。
4. 优化性能:对于大型数据集,考虑分批处理或采样,以提高性能。
5. 添加元数据:在PDF中添加标题、作者、创建日期等元数据,以便于管理和搜索。
6. 测试不同设备:在不同的设备和PDF阅读器上测试生成的PDF,确保兼容性和显示效果。
7. 自动化流程:创建可重用的函数和模板,以自动化报告生成过程。
8. 错误处理:添加适当的错误处理机制,以应对数据格式问题或生成过程中的异常。
选择合适的工具:根据需求选择最适合的库。对于简单的表格导出,reportlab是一个不错的选择;对于复杂的布局和样式,weasyprint提供了更大的灵活性。
规划布局:在开始编写代码之前,先规划好PDF的布局,包括标题、表格、图表和文本的位置。
保持一致性:在整个报告中保持一致的样式,包括字体、颜色和间距,以创建专业的外观。
优化性能:对于大型数据集,考虑分批处理或采样,以提高性能。
添加元数据:在PDF中添加标题、作者、创建日期等元数据,以便于管理和搜索。
测试不同设备:在不同的设备和PDF阅读器上测试生成的PDF,确保兼容性和显示效果。
自动化流程:创建可重用的函数和模板,以自动化报告生成过程。
错误处理:添加适当的错误处理机制,以应对数据格式问题或生成过程中的异常。
通过遵循这些最佳实践,您可以创建专业、美观且信息丰富的PDF报告,有效地展示您的数据分析结果。
希望本文能帮助您掌握使用pandas将数据导出为PDF的技巧,提升您的数据分析报告质量!
版权声明
1、转载或引用本网站内容(使用pandas轻松将数据导出为PDF格式从基础操作到高级技巧全面解析让你的数据分析报告更专业)须注明原网址及作者(威震华夏关云长),并标明本网站网址(https://pixtech.org/)。
2、对于不当转载或引用本网站内容而引起的民事纷争、行政处理或其他损失,本网站不承担责任。
3、对不遵守本声明或其他违法、恶意使用本网站内容者,本网站保留追究其法律责任的权利。
本文地址: https://pixtech.org/thread-36669-1-1.html
|
|