網站首頁 編程語言 正文
搞機器學習或者深度學習算法很多時候需要遍歷某個目錄讀取文件,特別是經常需要讀取某個特定后綴的文件,比如圖片的話可能需要讀取jpg, png, bmp格式的文件。python本身的庫函數功能沒有這么定制化,所以就需要再重新包裝一下。
例子
假設我們有如下的目錄結構,以bmp結尾的是文件,其他是文件夾。下面的程序都將以該目錄結構為例進行說明。
os.listdir
os.listdir僅讀取當前路徑下的文件和文件夾,返回一個列表。讀取demo目錄結構的代碼和結果如下:
path = r'D:\data'
items = os.listdir(path) # ==> ['1.bmp', '2.bmp', 'a', 'b']
os.walk
os.walk本身已經是遍歷讀取,包含所有的子文件(夾)但是其結果不像是os.listdir一樣是個list,而是一個比較復雜的數據體,難以直接使用,所以一般需要再處理一下。我們可以使用for語句將其打印出來看看:
path = r'D:\data'
# part 1
for items in os.walk(path):
print(items)
# part 2
for main_dir, sub_dir_list, sub_file_list in os.walk(path):
print(main_dir, sub_dir_list, sub_file_list)
結果為:
# part 1
('D:\\data', ['a', 'b'], ['1.bmp', '2.bmp'])
('D:\\data\\a', [], ['a1.bmp'])
('D:\\data\\b', [], ['b1.bmp'])
# part 2
D:\data ['a', 'b'] ['1.bmp', '2.bmp']
D:\data\a [] ['a1.bmp']
D:\data\b [] ['b1.bmp']
使用迭代器對os.walk()的結果進行輸出,發現每一條包含三個部分(part 1),在part 2中,我們給三個部分分別起名為main_dir, sub_dir_list, sub_file_list,下面對其進行簡單解釋:
- main_dir:遍歷得到的路徑下所有文件夾
- sub_dir_list:main_dir下面的文件夾
- sub_file_list:main_dir下面的文件
連接main_dir和sub_file_list中的文件可以得到路徑下的所有文件。
sub_dir_list在這里則沒有用處,我們無需再去遍歷sub_dir_list,因為它們已經包含在main_dir里了。
遍歷讀取代碼
代碼邏輯如下:
需要有后綴辨別功能,并且能夠同時辨別多個后綴
需要有遞歸和非遞歸功能
返回的是以入參path為前綴的路徑,所以如果path是完整路徑那么返回的就是完整路徑,否則就不是
# -*- coding: utf-8 -*-
import os
def file_ext(filename, level=1):
"""
return extension of filename
Parameters:
-----------
filename: str
name of file, path can be included
level: int
level of extension.
for example, if filename is 'sky.png.bak', the 1st level extension
is 'bak', and the 2nd level extension is 'png'
Returns:
--------
extension of filename
"""
return filename.split('.')[-level]
def _contain_file(path, extensions):
"""
check whether path contains any file whose extension is in extensions list
Parameters:
-----------
path: str
path to be checked
extensions: str or list/tuple of str
extension or extensions list
Returns:
--------
return True if contains, else return False
"""
assert os.path.exists(path), 'path must exist'
assert os.path.isdir(path), 'path must be dir'
if isinstance(extensions, str):
extensions = [extensions]
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
if (extensions is None) or (file_ext(file) in extensions):
return True
return False
def _process_extensions(extensions=None):
"""
preprocess and check extensions, if extensions is str, convert it to list.
Parameters:
-----------
extensions: str or list/tuple of str
file extensions
Returns:
--------
extensions: list/tuple of str
file extensions
"""
if extensions is not None:
if isinstance(extensions, str):
extensions = [extensions]
assert isinstance(extensions, (list, tuple)), \
'extensions must be str or list/tuple of str'
for ext in extensions:
assert isinstance(ext, str), 'extension must be str'
return extensions
def get_files(path, extensions=None, is_recursive=True):
"""
read files in path. if extensions is None, read all files, if extensions
are specified, only read the files who have one of the extensions. if
is_recursive is True, recursively read all files, if is_recursive is False,
only read files in current path.
Parameters:
-----------
path: str
path to be read
extensions: str or list/tuple of str
file extensions
is_recursive: bool
whether read files recursively. read recursively is True, while just
read files in current path if False
Returns:
--------
files: the obtained files in path
"""
extensions = _process_extensions(extensions)
files = []
# get files in current path
if not is_recursive:
for name in os.listdir(path):
fullname = os.path.join(path, name)
if os.path.isfile(fullname):
if (extensions is None) or (file_ext(fullname) in extensions):
files.append(fullname)
return files
# get files recursively
for main_dir, _, sub_file_list in os.walk(path):
for filename in sub_file_list:
fullname = os.path.join(main_dir, filename)
if (extensions is None) or (file_ext(fullname) in extensions):
files.append(fullname)
return files
def get_folders(path, extensions=None, is_recursive=True):
"""
read folders in path. if extensions is None, read all folders, if
extensions are specified, only read the folders who contain any files that
have one of the extensions. if is_recursive is True, recursively read all
folders, if is_recursive is False, only read folders in current path.
Parameters:
-----------
path: str
path to be read
extensions: str or list/tuple of str
file extensions
is_recursive: bool
whether read folders recursively. read recursively is True, while just
read folders in current path if False
Returns:
--------
folders: the obtained folders in path
"""
extensions = _process_extensions(extensions)
folders = []
# get folders in current path
if not is_recursive:
for name in os.listdir(path):
fullname = os.path.join(path, name)
if os.path.isdir(fullname):
if (extensions is None) or \
(_contain_file(fullname, extensions)):
folders.append(fullname)
return folders
# get folders recursively
for main_dir, _, _ in os.walk(path):
if (extensions is None) or (_contain_file(main_dir, extensions)):
folders.append(main_dir)
return folders
if __name__ == '__main__':
path = r'.\data'
files = get_files(path)
print(files) # ==> ['D:\\data\\1.bmp', 'D:\\data\\2.bmp', 'D:\\data\\a\\a1.bmp', 'D:\\data\\b\\b1.bmp']
folders = get_folders(path)
print(folders) # ==> ['D:\\data', 'D:\\data\\a', 'D:\\data\\b']
原文鏈接:https://blog.csdn.net/bby1987/article/details/108764387
相關推薦
- 2022-01-19 解決form表單reset()報錯,$(...)[0].reset is not a functio
- 2022-10-01 React?hooks?useState異步問題及解決_React
- 2022-09-19 Tomcat配置https?SSL證書的項目實踐_Tomcat
- 2022-06-08 兩步配置解決 IDEA新項目maven依賴問題
- 2022-04-23 git如何提交本地倉庫并同步碼云倉庫
- 2022-05-21 C++實現快捷店會員管理系統_C 語言
- 2022-06-12 Android代碼檢查規則Lint的自定義與應用詳解_Android
- 2021-11-03 linux下shell常用腳本命令及有關知識_Linux
- 最近更新
-
- window11 系統安裝 yarn
- 超詳細win安裝深度學習環境2025年最新版(
- Linux 中運行的top命令 怎么退出?
- MySQL 中decimal 的用法? 存儲小
- get 、set 、toString 方法的使
- @Resource和 @Autowired注解
- Java基礎操作-- 運算符,流程控制 Flo
- 1. Int 和Integer 的區別,Jav
- spring @retryable不生效的一種
- Spring Security之認證信息的處理
- Spring Security之認證過濾器
- Spring Security概述快速入門
- Spring Security之配置體系
- 【SpringBoot】SpringCache
- Spring Security之基于方法配置權
- redisson分布式鎖中waittime的設
- maven:解決release錯誤:Artif
- restTemplate使用總結
- Spring Security之安全異常處理
- MybatisPlus優雅實現加密?
- Spring ioc容器與Bean的生命周期。
- 【探索SpringCloud】服務發現-Nac
- Spring Security之基于HttpR
- Redis 底層數據結構-簡單動態字符串(SD
- arthas操作spring被代理目標對象命令
- Spring中的單例模式應用詳解
- 聊聊消息隊列,發送消息的4種方式
- bootspring第三方資源配置管理
- GIT同步修改后的遠程分支