Logo
Published on

3.3.推导式

Authors
  • avatar
    Name
    xiaobai
    Twitter

1.概述

推导式是Python中一种简洁、高效的创建数据结构的方法,可以用更少的代码生成列表、字典、集合等。推导式让代码更加简洁、可读性更强,同时性能通常比传统循环更好。

2.核心概念

2.1.推导式的优势

优势描述示例
代码简洁用一行表达式完成循环与条件判断[x**2 for x in range(5)]
可读性强结构清晰,表达意图明确比传统for循环更直观
性能优越通常比循环+append更快底层优化实现
功能丰富支持条件过滤、嵌套循环等复杂数据处理

2.2.推导式类型

类型语法结果类型示例
列表推导式[expr for item in iterable]list[x**2 for x in range(5)]
字典推导式{key: value for item in iterable}dict{x: x**2 for x in range(5)}
集合推导式{expr for item in iterable}set{x**2 for x in range(5)}
生成器表达式(expr for item in iterable)generator(x**2 for x in range(5))

3.列表推导式 (List Comprehensions)

3.1.基本语法

[表达式 for 变量 in 可迭代对象 (可选的if条件)]

组成部分

  • 表达式:对每个元素进行处理的代码
  • for 变量 in 可迭代对象:遍历数据源
  • if 条件:可选的过滤条件

3.2.基本示例

# 传统方式
squares = []
for x in range(5):
    squares.append(x**2)
print(squares)  # [0, 1, 4, 9, 16]

# 列表推导式
squares = [x**2 for x in range(5)]
print(squares)  # [0, 1, 4, 9, 16]

3.3.带条件的列表推导式

# 只包含偶数的平方
even_squares = [x**2 for x in range(10) if x % 2 == 0]
print(even_squares)  # [0, 4, 16, 36, 64]

# 多个条件
numbers = [x for x in range(20) if x % 2 == 0 if x % 3 == 0]
print(numbers)  # [0, 6, 12, 18]

# 条件表达式(三元运算符)
results = [x if x % 2 == 0 else 'odd' for x in range(5)]
print(results)  # [0, 'odd', 2, 'odd', 4]

3.4.嵌套循环

# 二维列表展开
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flattened = [num for row in matrix for num in row]
print(flattened)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

# 等价于:
flattened = []
for row in matrix:
    for num in row:
        flattened.append(num)

# 创建乘法表
multiplication_table = [[i * j for j in range(1, 6)] for i in range(1, 6)]
print(multiplication_table)
# [[1, 2, 3, 4, 5],
#  [2, 4, 6, 8, 10],
#  [3, 6, 9, 12, 15],
#  [4, 8, 12, 16, 20],
#  [5, 10, 15, 20, 25]]

4.字典推导式 (Dictionary Comprehensions)

4.1.基本语法

{键表达式: 值表达式 for 变量 in 可迭代对象 (可选的if条件)}

4.2.基本示例

# 最基础的字典推导式
d = {x: x**2 for x in range(5)}
print(d)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

# 带条件的字典推导式
d = {x: x**2 for x in range(5) if x % 2 == 0}
print(d)  # {0: 0, 2: 4, 4: 16}

4.3.实际应用场景

4.3.1.1. 数据过滤

# 从分数字典中只保留及格的学生
scores = {
    'Alice': 85,
    'Bob': 58,
    'Charlie': 71,
    'David': 49
}
passed = {name: score for name, score in scores.items() if score > 60}
print(passed)  # {'Alice': 85, 'Charlie': 71}

4.3.2.2. 键值交换

# 交换字典中的键和值
fruit_colors = {'apple': 'red', 'banana': 'yellow', 'grape': 'purple'}
color_fruits = {color: fruit for fruit, color in fruit_colors.items()}
print(color_fruits)  # {'red': 'apple', 'yellow': 'banana', 'purple': 'grape'}

4.3.3.3. 条件变换

# 将不及格成绩标记为"不及格"
scores = {
    'Alice': 85,
    'Bob': 58,
    'Charlie': 71,
    'David': 49
}
result = {name: (score if score >= 60 else '不及格') for name, score in scores.items()}
print(result)  # {'Alice': 85, 'Bob': '不及格', 'Charlie': 71, 'David': '不及格'}

4.4.带条件的字典推导式

# 只包含值大于2的项
numbers = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
filtered_dict = {k: v for k, v in numbers.items() if v > 2}
print(filtered_dict)  # {'c': 3, 'd': 4, 'e': 5}

# 根据条件修改值
processed_dict = {k: v * 2 if v % 2 == 0 else v for k, v in numbers.items()}
print(processed_dict)  # {'a': 1, 'b': 4, 'c': 3, 'd': 8, 'e': 5}

# 处理两个列表创建字典
keys = ['name', 'age', 'city']
values = ['Alice', 25, 'New York']
person_dict = {keys[i]: values[i] for i in range(len(keys))}
print(person_dict)  # {'name': 'Alice', 'age': 25, 'city': 'New York'}

5.集合推导式 (Set Comprehensions)

5.1.基本语法

{表达式 for 变量 in 可迭代对象 (可选的if条件)}

5.2.基本示例

# 创建唯一平方数的集合
squares_set = {x**2 for x in range(-5, 6)}
print(squares_set)  # {0, 1, 4, 9, 16, 25}

# 从列表去重
words = ['hello', 'world', 'hello', 'python', 'world']
unique_words = {word for word in words}
print(unique_words)  # {'hello', 'world', 'python'}

# 带条件的集合推导式
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares)  # {0, 64, 4, 36, 16}

6.生成器表达式 (Generator Expressions)

6.1.基本语法

(表达式 for 变量 in 可迭代对象 (可选的if条件))

6.2.基本示例

# 生成器表达式
squares_gen = (x**2 for x in range(5))
print(squares_gen)  # <generator object <genexpr> at 0x...>

# 转换为列表
print(list(squares_gen))  # [0, 1, 4, 9, 16]

# 带条件的生成器表达式
even_squares_gen = (x**2 for x in range(10) if x % 2 == 0)
print(list(even_squares_gen))  # [0, 4, 16, 36, 64]

6.3.生成器表达式的优势

# 内存效率对比
import sys

n = 100000
# 列表推导式 - 立即创建所有元素
list_comp = [x**2 for x in range(n)]
# 生成器表达式 - 惰性计算
gen_expr = (x**2 for x in range(n))

print(f"列表推导式内存: {sys.getsizeof(list_comp)} 字节")  # 800984 字节
print(f"生成器表达式内存: {sys.getsizeof(gen_expr)} 字节")  # 200 字节

7.推导式的嵌套和复杂用法

7.1.多层嵌套推导式

# 三维嵌套列表展平
three_d = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]
flattened_3d = [num for matrix in three_d for row in matrix for num in row]
print(flattened_3d)  # [1, 2, 3, 4, 5, 6, 7, 8]

# 使用字典推导式创建嵌套字典
nested_dict = {
    f'group_{i}': {f'item_{j}': i * j for j in range(1, 4)}
    for i in range(1, 4)
}
print(nested_dict)
# {'group_1': {'item_1': 1, 'item_2': 2, 'item_3': 3},
#  'group_2': {'item_1': 2, 'item_2': 4, 'item_3': 6},
#  'group_3': {'item_1': 3, 'item_2': 6, 'item_3': 9}}

7.2.复杂条件逻辑

# 复杂条件筛选
numbers = range(20)
complex_filter = [
    x for x in numbers
    if (x % 2 == 0 and x < 10) or (x % 3 == 0 and x > 10)
]
print(complex_filter)  # [0, 2, 4, 6, 8, 12, 15, 18]

# 使用函数进行复杂判断
def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

# 筛选素数
primes = [x for x in range(2, 30) if is_prime(x)]
print(primes)  # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

8.实际应用场景

8.1.数据清洗和转换

# 字符串清洗
raw_data = ['  hello  ', 'WORLD', '  Python  ', 'PROGRAMMING']
cleaned_data = [word.strip().lower() for word in raw_data]
print(cleaned_data)  # ['hello', 'world', 'python', 'programming']

# 提取数字
text_lines = ['Price: $100', 'Weight: 2.5kg', 'Count: 25 items']
numbers = [float(''.join(c for c in line if c.isdigit() or c == '.')) 
          for line in text_lines if any(c.isdigit() for c in line)]
print(numbers)  # [100.0, 2.5, 25.0]

8.2.文件处理

def process_file_data(filename):
    """处理文件数据"""
    with open(filename, 'r', encoding='utf-8') as file:
        # 读取非空行并去除空白
        lines = [line.strip() for line in file if line.strip()]
        
        # 筛选包含关键词的行
        keyword_lines = [line for line in lines if 'error' in line.lower()]
        
        # 创建行号字典
        line_dict = {i: line for i, line in enumerate(lines, 1)}
        
        return lines, keyword_lines, line_dict

# 使用示例
lines, errors, numbered = process_file_data('log.txt')
print("所有行:", lines)
print("错误行:", errors)
print("行号字典:", numbered)

8.3.数据分析和处理

# 学生成绩分析
students = [
    {'name': 'Alice', 'scores': [85, 92, 78]},
    {'name': 'Bob', 'scores': [76, 88, 95]},
    {'name': 'Charlie', 'scores': [90, 85, 92]}
]

# 计算平均分
student_averages = {
    student['name']: sum(student['scores']) / len(student['scores'])
    for student in students
}
print(student_averages)  # {'Alice': 85.0, 'Bob': 86.33, 'Charlie': 89.0}

# 找出高分学生
high_achievers = [
    student['name'] for student in students
    if sum(student['scores']) / len(student['scores']) > 85
]
print(high_achievers)  # ['Bob', 'Charlie']

# 找出所有科目及格的学生
passing_students = [
    student['name'] for student in students
    if all(score >= 60 for score in student['scores'])
]
print(passing_students)  # ['Alice', 'Bob', 'Charlie']

9.性能比较

9.1.推导式 vs 传统循环

import timeit

def test_performance():
    n = 10000
    
    # 列表推导式
    list_comp_time = timeit.timeit(
        '[x**2 for x in range(n)]', 
        globals={'n': n},
        number=1000
    )
    
    # 传统 for 循环
    for_loop_time = timeit.timeit(
        '''
        result = []
        for x in range(n):
            result.append(x**2)
        ''',
        globals={'n': n},
        number=1000
    )
    
    # 生成器表达式
    gen_expr_time = timeit.timeit(
        'list(x**2 for x in range(n))', 
        globals={'n': n},
        number=1000
    )
    
    print(f"列表推导式: {list_comp_time:.4f}秒")
    print(f"传统循环: {for_loop_time:.4f}秒")
    print(f"生成器表达式: {gen_expr_time:.4f}秒")

test_performance()

9.2.性能总结

方法性能内存使用适用场景
列表推导式最快高(立即创建)需要完整列表
传统循环中等高(立即创建)复杂逻辑
生成器表达式中等低(惰性计算)大数据集

10.高级技巧和模式

10.1.使用 walrus 运算符 (Python 3.8+)

海象运算符(:=)允许在表达式内部为变量赋值,让代码更简洁高效。

# 在推导式中使用 walrus 运算符
data = [" apple ", "banana", "  ", "cherry", "date  "]
results = [clean.strip().upper() 
           for item in data 
           if (clean := item.strip())]  # 只有非空才加入
print(results)  # ['APPLE', 'BANANA', 'CHERRY', 'DATE']

# 多条件处理
numbers = ['42', 'not_a_num', '99', 'abc', '123']
valid_numbers = [int(s) for s in numbers if s.isdigit()]
print(valid_numbers)  # [42, 99, 123]

10.2.推导式中的异常处理

# 安全的类型转换
raw_data = ['123', '456', 'abc', '789', 'def']

def safe_int_convert(value):
    try:
        return int(value)
    except ValueError:
        return None

# 使用安全转换函数
numbers = [safe_int_convert(x) for x in raw_data]
clean_numbers = [x for x in numbers if x is not None]
print(clean_numbers)  # [123, 456, 789]

# 更简洁的方式
numbers = [int(x) for x in raw_data if x.isdigit()]
print(numbers)  # [123, 456, 789]

10.3.多步骤数据处理管道

def data_processing_pipeline(data):
    """多步骤数据处理"""
    # 步骤1: 清理和验证
    cleaned = [item.strip().lower() for item in data if item.strip()]
    
    # 步骤2: 过滤和转换
    processed = [
        f"processed_{item}" for item in cleaned 
        if len(item) > 3 and not item.startswith('test')
    ]
    
    # 步骤3: 创建查找字典
    lookup = {item: len(item) for item in processed}
    
    return cleaned, processed, lookup

# 使用管道
raw_data = ['  Hello  ', '  test_data  ', 'WORLD  ', '  py  ', '  ']
cleaned, processed, lookup = data_processing_pipeline(raw_data)

print("Cleaned:", cleaned)      # ['hello', 'test_data', 'world', 'py']
print("Processed:", processed)  # ['processed_hello', 'processed_world']
print("Lookup:", lookup)        # {'processed_hello': 15, 'processed_world': 15}

11.最佳实践和注意事项

11.1.可读性考虑

# 好的写法:简单明了
squares = [x**2 for x in range(10)]

# 避免过于复杂的推导式
# 复杂逻辑应该使用传统循环
def generate_triples():
    result = []
    for x in range(5):
        for y in range(5):
            if x != y:
                for z in range(5):
                    if x + y > z and (x % 2 == 0 or y % 2 == 1):
                        result.append((x, y, z))
    return result

11.2.何时使用推导式

11.2.1.适合使用推导式的情况:

# 1. 简单的数据转换和过滤
numbers = [1, 2, 3, 4, 5]
doubled_evens = [x * 2 for x in numbers if x % 2 == 0]
print(doubled_evens)  # [4, 8]

# 2. 创建新的数据结构
word_lengths = {word: len(word) for word in ['hello', 'world']}
print(word_lengths)  # {'hello': 5, 'world': 5}

# 3. 简单的数据提取
people_list = [{'name': 'Alice', 'age': 25}, {'name': 'Bob', 'age': 30}]
names = [person['name'] for person in people_list if person['age'] > 18]
print(names)  # ['Alice', 'Bob']

11.2.2.不适合使用推导式的情况:

  1. 复杂的逻辑 - 使用传统循环
  2. 需要多次引用中间结果
  3. 需要异常处理的复杂操作
  4. 嵌套过深影响可读性

12.综合实战示例

12.1.数据分析任务

def analyze_sales_data(sales_records):
    """分析销售数据"""
    
    # 数据清理:筛选出有效的记录
    valid_records = [
        record for record in sales_records
        if record.get('amount', 0) > 0 and record.get('product')
    ]
    
    # 创建按产品分组的销售总额字典
    sales_by_product = {}
    for record in valid_records:
        product = record['product']
        amount = record['amount']
        sales_by_product[product] = sales_by_product.get(product, 0) + amount
    
    # 找出高销售额的产品
    high_sales_products = [
        product for product, total in sales_by_product.items()
        if total > 1000
    ]
    
    # 创建销售报告
    sales_report = {
        product: {
            'total_sales': total,
            'average_sale': total / len([r for r in valid_records if r['product'] == product]),
            'is_high_sales': total > 1000
        }
        for product, total in sales_by_product.items()
    }
    
    return {
        'total_records': len(valid_records),
        'high_sales_products': high_sales_products,
        'sales_report': sales_report
    }

# 示例数据
sample_sales = [
    {'product': 'A', 'amount': 100},
    {'product': 'B', 'amount': 200},
    {'product': 'A', 'amount': 150},
    {'product': 'C', 'amount': 300},
    {'product': 'B', 'amount': 250},
    {'product': 'A', 'amount': 1200},  # 高销售额
]

# 分析结果
result = analyze_sales_data(sample_sales)
print(result)

12.2.文本处理任务

def process_text_data(texts):
    """处理文本数据"""
    
    # 1. 清理文本
    cleaned_texts = [
        text.strip().lower() for text in texts if text.strip()
    ]
    
    # 2. 提取关键词
    keywords = [
        word for text in cleaned_texts 
        for word in text.split() 
        if len(word) > 3
    ]
    
    # 3. 统计词频
    word_freq = {
        word: keywords.count(word) 
        for word in set(keywords)
    }
    
    # 4. 找出高频词
    high_freq_words = [
        word for word, freq in word_freq.items() 
        if freq > 1
    ]
    
    return {
        'cleaned_texts': cleaned_texts,
        'keywords': keywords,
        'word_frequency': word_freq,
        'high_frequency_words': high_freq_words
    }

# 使用示例
texts = [
    "Python is great for data analysis",
    "Data analysis with Python is powerful",
    "Python programming is fun and useful"
]

result = process_text_data(texts)
print("清理后的文本:", result['cleaned_texts'])
print("关键词:", result['keywords'])
print("词频统计:", result['word_frequency'])
print("高频词:", result['high_frequency_words'])

13.总结

推导式是Python中强大的数据处理工具,通过合理使用推导式,我们可以:

  • 提高代码效率:推导式通常比传统循环更快
  • 简化代码逻辑:用更少的代码表达复杂的操作
  • 提高可读性:代码结构更清晰,意图更明确
  • 减少错误:减少手动循环中的常见错误

13.1.学习建议

  1. 从简单开始:先掌握基本的列表推导式
  2. 逐步深入:学习字典和集合推导式
  3. 实践应用:在实际项目中多使用推导式
  4. 注意可读性:避免过度复杂的推导式
  5. 性能考虑:了解不同推导式的性能特点

掌握推导式是成为优秀Python程序员的重要技能,它们让我们的代码更加优雅和高效。