黑马程序员技术交流社区

标题: Python基本语法_文件操作_读写函数详解 [打印本页]

作者: 易大帅 时间: 2017-3-14 09:46
标题: Python基本语法_文件操作_读写函数详解

Python基本语法_文件操作_读写函数详解（Jmilk）
目录

目录
软件环境
file文件对象
open文件操作
      读文件
         read读取所有文件内容
         readline获取一行内容
         readlines读取所有文件内容
         readreadlinereadlines的区别
      写文件
         write
         writelines写入多行内容
         write和writelines的区别
         将标准输出重定向写入到指定文件
      文件指针
         tell获取当前文件指针位置
         truncate截断文件
         seek转移文件指针
最后

软件环境

系统
      UbuntuKylin 14.01
软件
      Python 2.7.3
      IPython 4.0.0

file()文件对象

file(name[, mode[, buffering]]) -> file object
Open a file. The mode can be ‘r’, ‘w’ or ‘a’ for reading (default),writing or appending. The file will be created if it doesn’t exist when opened for writing or appending; it will be truncated when opened for writing. Add a ‘b’ to the mode for binary files.
Add a ‘+’ to the mode to allow simultaneous reading and writing.
If the buffering argument is given, 0 means unbuffered, 1 means line buffered, and larger numbers specify the buffer size. The preferred way to open a file is with the builtin open() function.
Add a ‘U’ to mode to open the file for input with universal newline support. Any line ending in the input file will be seen as a ‘\n’ in Python. Also, a file so opened gains the attribute ‘newlines’;the value for this attribute is one of None (no newline read yet), ‘\r’, ‘\n’, ‘\r\n’ or a tuple containing all the newline types seen.
‘U’ cannot be combined with ‘w’ or ‘+’ mode.
file()与open()的功能一致，打开文件或创建文件。都属于内建函数。
file的属性和方法：

In [324]: dir(file)
Out[324]:
['__class__',
'__delattr__',
'__doc__',
'__enter__',
'__exit__',
'__format__',
'__getattribute__',
'__hash__',
'__init__',
'__iter__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'close',
'closed',    #标识文件都否已经关闭
'encoding', #文件的编码
'errors',
'fileno',    #返回一个Long的文件标签
'flush',
'isatty',    #判断文件是否是一个终端设备文件
'mode',    #打开文件的模式
'name',    #文件名
'newlines', #文件使用的换行符
'next',    #返回下一行，并将文件指针指向下一行。把一个file用于for循环时，就是调用next()函数来实现遍历。在文件最后执行next()会报错。
'read',
'readinto',
'readline',
'readlines',
'seek',
'softspace', #boolean型，defalut==0
'tell',
'truncate',
'write',
'writelines',
'xreadlines']

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

open()文件操作

open(…)
open(name[, mode[, buffering]]) -> file object
Open a file using the file() type, returns a file object. This is the
preferred way to open a file. See file.__doc__ for further information.
open()函数是file()函数的别名函数，能够打开文件并返回一个文件对象而非文件的内容(应该理解为一个存储着文件的内容的对象，如果想获取内容便需要对文件对象进行操作)。可以指定不同的打开mode(rw)，在调用open()函数后一定要调用文件对象内建的close()函数来关闭文件。一般结合try..finally语句来确定会关闭文件对象。
注意：当你open()一个文件，实质上是将该文件的内容加载到缓存中，所以当你open()文件之后，对文件做了修改也不会影响到open()返回的对象的value。
常用mode：
1. r(read缺省参数)：已读的方式打开文件，不能调用write方法，当文件不存在时报错。
2. w(write)：已写方式打开文件，能够写入内容并覆盖，不能调用read方法，如果文件不存在，则创建新同名文件。
3. a(append)：已追加模式打开文件，可以进行写操作，如果恩健不存在，则创建同名文件。
4. +：使用+允许同时进行读写操作。
5. U：支持所有类型的换行符(\n、\r、\r\n)
6. b：表示对二进制文件进行操作(图片、视频)。
7. t：对文本文件进行操作。
6种mode可以组合使用
读文件

以读方式打开文件后可以调用这三个函数read()\readline()\readlines()
他们都可以传递一个int来指定需要读取的总Size(Bytes)。
注意：因为读取的文件会缓存到内存中，所以当需要读取的文件Size大于内存时，需要指定每次读入的Size。

In [15]: !tail /etc/passwd > fileOperation.txt

In [20]: pswd = open('/usr/local/src/pyScript/fileOperation.txt','r')

In [21]: type(pswd)
Out[21]: file

In [32]: pswd
Out[32]: <open file '/usr/local/src/pyScript/fileOperation.txt', mode 'r' at 0x7f048314a420>

1
2
3
4
5
6
7
8
9

read()读取所有文件内容

read(…)
read([size]) -> read at most size bytes, returned as a string.
If the size argument is negative or omitted, read until EOF is reached.
Notice that when in non-blocking mode, less data than what was requested
may be returned, even if no size parameter was given.
读取指定Size的内容，缺省参数为全部内容，返回一个String类型对象。

In [34]: content = pswd.read()

In [48]: print content
stack:x:1001:1001::/opt/stack:/bin/bash
memcache:x:116:125:Memcached,,,:/nonexistent:/bin/false
sshd:x:117:65534::/var/run/sshd:/usr/sbin/nologin
postgres:x:118:126:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
rabbitmq:x:119:127:RabbitMQ messaging server,,,:/var/lib/rabbitmq:/bin/false
mysql:x:120:128:MySQL Server,,,:/nonexistent:/bin/false
haproxy:x:121:129::/var/lib/haproxy:/bin/false
libvirt-qemu:x:122:130:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false
libvirt-dnsmasq:x:123:131:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/bin/false
guest-5LawJh:x:124:132:Guest,,,:/tmp/guest-5LawJh:/bin/bash

1
2
3
4
5
6
7
8
9
10
11
12
13

readline()获取一行内容

readline(…)
readline([size]) -> next line from the file, as a string.
Retain newline. A non-negative size argument limits the maximum
number of bytes to return (an incomplete line may be returned then).
Return an empty string at EOF.
读取文件中的一行含有行结束符的内容，每执行一次会自动获取往下一行的内容，返回一个String。当读取到最后一行再执行此函数时，会返回一个空String，不会报错。

In [62]: pwd = open('fileOperation.txt','r')

In [70]: content = pwd.readline()

In [71]: content
Out[71]: 'stack:x:1001:1001::/opt/stack:/bin/bash\n'

In [72]: content = pwd.readline()

In [73]: content
Out[73]: 'memcache:x:116:125:Memcached,,,:/nonexistent:/bin/false\n'

1
2
3
4
5
6
7
8
9
10
11

一个综合例子：
open()+fileObject.readline()+try..finally+String.split()+os.path.exists()
因为readline()函数返回的是String类型对象，所以我们可以使用循环来遍历这一行中所有的元素。

import os
def ergodicIndex(fileName):
pwd = open('fileOperation.txt','r')
try:
      content  = pwd.readline()
      index = content.split(':')
      for i in index:
         print i,
finally:
      pwd.close()
if __name__ == '__main__':
fileName='/usr/local/src/pyScript/fileOperation.txt'
if os.path.exists(fileName):
      ergodicIndex(fileName)
else:print "The file not exist"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

在处理文件数据中是非常常用的一个方法

In [99]: %run testReadline.py
stack x 1001 1001  /opt/stack /bin/bash

1
2

readlines()读取所有文件内容

readlines(…)
readlines([size]) -> list of strings, each a line from the file.
Call readline() repeatedly and return a list of the lines so read.
The optional size argument, if given, is an approximate bound on the
total number of bytes in the lines returned.
获取文件所有的内容，并返回一个以每行内容作为一个String元素的List类型对象，本质是通过循环调用readline()实现的。

In [106]: pwd = open('fileOperation.txt','r')

In [108]: content = pwd.readlines()

In [109]: print content
['stack:x:1001:1001::/opt/stack:/bin/bash\n', 'memcache:x:116:125:Memcached,,,:/nonexistent:/bin/false\n', 'sshd:x:117:65534::/var/run/sshd:/usr/sbin/nologin\n', 'postgres:x:118:126:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash\n', 'rabbitmq:x:119:127:RabbitMQ messaging server,,,:/var/lib/rabbitmq:/bin/false\n', 'mysql:x:120:128:MySQL Server,,,:/nonexistent:/bin/false\n', 'haproxy:x:121:129::/var/lib/haproxy:/bin/false\n', 'libvirt-qemu:x:122:130:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false\n', 'libvirt-dnsmasq:x:123:131:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/bin/false\n', 'guest-5LawJh:x:124:132:Guest,,,:/tmp/guest-5LawJh:/bin/bash\n']

In [110]: content[0]
Out[110]: 'stack:x:1001:1001::/opt/stack:/bin/bash\n'

In [111]: content[0][0]
Out[111]: 's'

1
2
3
4
5
6
7
8
9
10
11
12

修改指定行的内容：

cfg = open(cfgUrl,'r+')
cfgFile = cfg.readlines()
cfgFile[lineNum] = cfgStr
cfg = open(cfgUrl,'w+')
cfg.writelines(cfgFile)
cfg.flush() #刷新内存的缓存区，即将缓存区中的内容写入到磁盘，但不会关闭文件。
cfg.close()

1
2
3
4
5
6
7

将文件以r+的方式打开，并返回一个对象。对对象的内容进行修改后，再将文件以w+的方式打开，将对象的内容写入到文件中。实现对文件指定行的内容修改。
read()、readline()、readlines()的区别

read()和readlines()默认都是获取文件的所有内容。但是read()返回一个String类型对象，元素是一个Char。readlines()返回一个List类型对象，元素是一个Sting。而readline()获取文件的一行内容，返回是一个String。
写文件

注意：调用write()、writeline()时，文件原有的内容会被清空，因为文件指针初始指向文件的首行首个字母，而进行写操作实质就是在文件指针指向的位置开始写入内容。
write()

write(…)
write(str) -> None. Write string str to file.
Note that due to buffering, flush() or close() may be needed before
the file on disk reflects the data written.
将传递的String参数写入并覆盖文件内容，返回None。需要执行close()或flush()后才会将内存的数据写入到文件中。
注意：当你在没有调用close()函数之前，你是可以调用多次write()函数来实现追加额效果，即后来的write()函数的写入的内容并不会覆盖前一次使用write()函数写入的内容，但是不会自动添加换行符。

In [153]: pwd = open('fileOperation.txt','w')

In [155]: pwd.write('My name is JMilk')

In [157]: pwd.flush()

In [159]: pwd.write('My name is chocolate')

In [161]: pwd.flush()

In [163]: pwd.write('123')

In [165]: pwd.write('456')

In [167]: pwd.close()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

结果：

My name is JMilkMy name is chocolate123456

1

一个综合例子：
open()+fileObject.write()+os.path.exists()+ergodicDictionary

import os
def write_test(fileName,content_iterable):
try:
      pwd = open(fileName,'w')
      for key,value in content_iterable.items():
         pwd.write(key+'\t'+value+'\n')  #传入String类型参数同时加入换行符
finally:
      pwd.close()

if __name__ == '__main__':
fileName = '/usr/local/src/pyScript/fileOperation.txt'
dic = {'name':'Jmilk','age':'23','city':'BJ'}
if os.path.exists(fileName):
      write_test(fileName,dic)
else:print 'File not exist!'

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

结果：

city BJ
age    23
name Jmilk

1
2
3

writelines()写入多行内容

writelines(…)
writelines(sequence_of_strings) -> None. Write the strings to the file.
Note that newlines are not added. The sequence can be any iterable object
producing strings. This is equivalent to calling write() for each string.
将传递的迭代对象的String元素逐个写入文件，相当于没一行都调用额write()函数，但是不会自动添加换行符。
修改上面的综合例子：

import os
def write_lines(fileName,content_iterable):
try:
      pwd = open(fileName,'w')
      pwd.writelines(content_iterable) #传递List类型参数
finally:
      pwd.close()

if __name__ == '__main__':
fileName = '/usr/local/src/pyScript/fileOperation.txt'
li = ['my name is Jmilk'+'\n','My name is chocolate'+'\n']  #定义List时加入换行符
if os.path.exists(fileName):
      write_lines(fileName,li)
else:print 'File not exist!'

1
2
3
4
5
6
7
8
9
10
11
12
13
14

结果：

my name is Jmilk
My name is chocolate

1
2

write()和writelines()的区别

从上面两个例子中可以看出，write()接受的是String类型参数，所以可以在()中对实参进行修改加入’\n’。而writelines()接受的是iterable类型参数，并且iteraber对象的元素需要为String类型，只能在定义iterable的时候加入’\n’。在写入多行内容时writelines()会比write()更有效率。再一次反映数据结构决定了对象操作这一句话，所以对数据结构的理解是非常重要的。Python数据结构，请参考：http://blog.csdn.net/jmilk/article/details/48391283
将标准输出重定向写入到指定文件

系统标准输入、输出、Err本质是一个类文件对象。重定向即：
sys.stdout = fileObject_write
Example：

In [59]: pycat stdoTest.py
#!/usr/bin/env python
#Filename:stdoTest.py
#coding=utf8
import sys

fristOut = sys.stdout  #备份初始的输出文件对象
print type(fristOut)

logOut = open('/usr/local/src/pyScript/out.log','w')
sys.stdout = logOut  #重定向输出到新的文件对象
print 'Test stdout.'  #重定向后，不会打印到屏幕

logOut.close() #关闭open()打开的文件对象
sys.stdout = fristOut  #还原输出文件对象

In [60]: run stdoTest.py
<type 'file'>

In [61]: cat out.log
Test stdout.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

文件指针

文件指针：当使用open()函数打开一个文件并返回一个文件对象后，在文件对象中会存放着当前”光标”在文件中的位置，对文件进行的读、写、截断操作都是基于文件指针，并从文件指针+1开始进行的操作。。这个位置称为文件指针(从文件头部开始计算的字节数)，与C语言额指针概念相似，实质是文件中位置的标识。大部分的文件操作都是基于文件指针来实现的。
tell()获取当前文件指针(位置)

tell(…)
tell() -> current file position, an integer (may be a long integer).

In [283]: pwd = open('fileOperation.txt','rw+')

In [285]: pwd.tell()
Out[285]: 0

1
2
3
4

truncate()截断文件

truncate(…)
truncate([size]) -> None. Truncate the file to at most size bytes.
Size defaults to the current file position, as returned by tell().
默认从文件指针指向的位置开始截断文件内容，也可以通过传递int参数n来指定截断的起始位置，即改变文件指针的位置。从文件指针指向的位置n开始，之后的文件内容(不包含n)全部删除，以可修改mode打开的文件可以使用此方法。

In [273]: cat fileOperation.txt
0123456789

In [274]: pwd = open('fileOperation.txt','rw+')

In [275]: pwd.truncate(5)

In [276]: pwd.close()

In [277]: cat fileOperation.txt
01234

1
2
3
4
5
6
7
8
9
10
11

seek()转移文件指针

seek(…)
seek(offset[, whence]) -> None. Move to new file position.
可以接收偏移量和选项作为参数，返回None。
当whence==0时，将文件指针从文件头部转移到”偏移量”指定的字符处。
当whence==1时，将文件指针从文件的当前位置往后转移”偏移量”指定的字符数。
当whence==2时，将文件指针从文件尾部向前移动”偏移量”指定的字符数。
一个综合例子：
truncate()+tell()+seek()

In [308]: %cat fileOperation.txt
0123456789

In [309]: pwd = open('fileOperation.txt','rw+')

In [310]: pwd.tell()
Out[310]: 0

In [311]: pwd.seek(5)

In [312]: pwd.tell()
Out[312]: 5

In [313]: pwd.truncate()

In [314]: pwd.close()

In [315]: %cat fileOperation.txt
01234

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

总结：上面的例子可以看见，可以通过seek()函数来移动文件指针，并结合truncate()来截断文件指针指定位置后面的文件内容。同理，当传递int参数给truncate(n)后也会改变文件指针。
注意：当对文件进行了读、写操作后都会改变文件指针的值，而改变的值相当于操作过的len(String)。

作者: Mr_Maty 时间: 2017-3-14 12:19
66666666

欢迎光临黑马程序员技术交流社区 (http://bbs.itheima.com/)

黑马程序员IT技术论坛 X3.2