依存分析

背景

什么是 dependency parsing

  • 通过分析语言单位内成分之间的依存关系揭示其句法结构。即识别句子中的“主谓宾”、“定状补”这些语法成分,并分析各成分之间关系。
  • 句法分析(symentic parsing)= 句法结构分析(syntactic structure parsing)+ 依存关系分析(dependency parsing)。
  • 成分结构分析(constituent structure parsing)以获取整个句子的句法结构或者完全短语结构为目的的句法分析,又称短语结构分析(phrase structure parsing);
  • 依存分析(dependency parsing)以获取局部成分为目的的句法分析。

目前的句法分析已经从句法结构分析转向依存句法分析,一是因为通用数据集Treebank(Universal Dependencies treebanks)的发展,虽然该数据集的标注较为复杂,但是其标注结果可以用作多种任务(命名体识别或词性标注)且作为不同任务的评估数据,因而得到越来越多的应用,二是句法结构分析的语法集是由固定的语法集组成,较为固定和呆板;三是依存句法分析树标注简单且parser准确率高。

有哪些应用

主要是句法分析的应用,如机器翻译、信息检索与抽取、问答系统、语音识别。
在项目中使用的:NER(挖掘新词)

如何描述语法

有两种主流观点,其中一种是短语结构文法,英文术语是:Constituency = phrase structure grammar = context-free grammars (CFGs)。
这种短语语法用固定数量的rule分解句子为短语和单词、分解短语为更短的短语或单词……
从 none phrase 开始,后面可能会有 adjective none、prepositiontial phrase(可能又依赖于其他名词短语),

另一种是依存结构,用单词之间的依存关系来表达语法。如果一个单词修饰另一个单词,则称该单词依赖于另一个单词

细节

人们画依存句法树的弧的方式不同,这门课是head指向dependent(即箭头指向的词语是依赖者,箭头尾部的词语是被依赖者)。

每个句子都有一个虚根,代表句子之外的开始,这样句子中的每个单词都有自己的依存对象了。

defination

可用的特征

  • 双词汇亲和(Bilexical affinities),比如discussion与issues。
  • 词语间距,因为一般相邻的词语才具有依存关系
  • 中间词语,如果中间词语是动词或标点,则两边的词语不太可能有依存
  • 词语配价,一个词语最多有几个依赖者。

约束

  • ROOT只能被一个词依赖。
  • 无环。

方法

Dynamic programming

估计是找出以某head结尾的字串对应的最可能的句法树。

Graph algorithms

最小生成树。

Constraint Satisfaction

估计是在某个图上逐步删除不符合要求的边,直到成为一棵树。

“Transition-based parsing” or “deterministic dependency parsing”

greedy choice of attachments guided by ml clf.

主流方法,基于贪心决策动作拼装句法树。
defination

step 1: build up structrue
step 2: train a classifier

已经有了 a tree bank with parses of sentences, 就根据 tree bank 来确定 which sequence of operations would give the correct parse of a sentence (transition的分类问题)

step 3: evaluation

evl

Unlabeled attachment score(UAS)= head

Labeled attachment score(LAS) = head and label

简单实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
import sys
import cPickle
import os
import time
import tensorflow as tf
import numpy as np
from model import Model
from utils.parser_utils import minibatches, load_and_preprocess_data

def xavier_weight_init():
"""Returns function that creates random tensor.

The specified function will take in a shape (tuple or 1-d array) and
returns a random tensor of the specified shape drawn from the
Xavier initialization distribution.

Hint: You might find tf.random_uniform useful.
"""
def _xavier_initializer(shape, **kwargs):
"""Defines an initializer for the Xavier distribution.
Specifically, the output should be sampled uniformly from [-epsilon, epsilon] where
epsilon = sqrt(6) / <sum of the sizes of shape's dimensions>
e.g., if shape = (2, 3), epsilon = sqrt(6 / (2 + 3))

This function will be used as a variable initializer.

Args:
shape: Tuple or 1-d array that species the dimensions of the requested tensor.
Returns:
out: tf.Tensor of specified shape sampled from the Xavier distribution.
"""
epsilon = np.sqrt(6 / np.sum(shape))
out_t = tf.Variable(tf.random_uniform(shape=shape, minval=-epsilon, maxval=epsilon))
return out_t
return _xavier_initializer

class Config(object):
"""Holds model hyperparams and data information.

The config class is used to store various hyperparameters and dataset
information parameters. Model objects are passed a Config() object at
instantiation. They can then call self.config.<hyperparameter_name> to
get the hyperparameter settings.
"""
n_features = 36
n_classes = 3
dropout = 0.5 # (p_drop in the handout)
embed_size = 50
hidden_size = 200
batch_size = 1024
n_epochs = 10
lr = 0.0005

class ParserModel(Model):
"""
Implements a feedforward neural network with an embedding layer and single hidden layer.
This network will predict which transition should be applied to a given partial parse
configuration.
"""

def add_placeholders(self):
"""Generates placeholder variables to represent the input tensors"""
self.input_placeholder = tf.placeholder(tf.int32, [None, self.config.n_features])
self.labels_placeholder = tf.placeholder(tf.float32, [None, self.config.n_classes])
self.droupout_placeholder = tf.placeholder(tf.float32)

def create_feed_dict(self, inputs_batch, labels_batch=None, dropout=0):
"""Creates the feed_dict for the dependency parser.

A feed_dict takes the form of:

feed_dict = {
<placeholder>: <tensor of values to be passed for placeholder>,
....
}


Hint: The keys for the feed_dict should be a subset of the placeholder tensors created in add_placeholders.
Hint: When an argument is None, don't add it to the feed_dict.

Args:
inputs_batch: A batch of input data.
labels_batch: A batch of label data.
dropout: The dropout rate.
Returns:
feed_dict: The feed dictionary mapping from placeholders to values.
"""
feed_dict = {self.input_placeholder: inputs_batch, \
self.droupout_placeholder: dropout}
if labels_batch.any():
feed_dict[self.labels_placeholder] = labels_batch
return feed_dict

def add_embedding(self):
"""Adds an embedding layer that maps from input tokens (integers) to vectors and then
concatenates those vectors:
- Creates a tf.Variable and initializes it with self.pretrained_embeddings.
- Uses the input_placeholder to index into the embeddings tensor, resulting in a
tensor of shape (None, n_features, embedding_size).
- Concatenates the embeddings by reshaping the embeddings tensor to shape
(None, n_features * embedding_size).

Hint: You might find tf.nn.embedding_lookup useful.
Hint: You can use tf.reshape to concatenate the vectors.

Returns:
embeddings: tf.Tensor of shape (None, n_features*embed_size)
"""
embedding = tf.Variable(self.pretrained_embeddings)
embeddings = tf.nn.embedding_lookup(embedding, self.input_placeholder)
embeddings = tf.reshape(embeddings, [-1, self.config.n_features * self.config.embed_size])
return embeddings

def add_prediction_op(self):
"""Adds the 1-hidden-layer NN:
h = Relu(xW + b1)
h_drop = Dropout(h, dropout_rate)
pred = h_dropU + b2

Note that we are not applying a softmax to pred. The softmax will instead be done in
the add_loss_op function, which improves efficiency because we can use
tf.nn.softmax_cross_entropy_with_logits

Use the initializer from q2_initialization.py to initialize W and U (you can initialize b1
and b2 with zeros)

Hint: Note that tf.nn.dropout takes the keep probability (1 - p_drop) as an argument.
Therefore the keep probability should be set to the value of
(1 - self.dropout_placeholder)

Returns:
pred: tf.Tensor of shape (batch_size, n_classes)
"""

x = self.add_embedding()
xavier = xavier_weight_init()
with tf.variable_scope("transformation"):
b1 = tf.Variable(tf.random_uniform([self.config.hidden_size]))
b2 = tf.Variable(tf.random_uniform([self.config.n_classes]))

W = xavier([self.config.n_features * self.config.embed_size, self.config.hidden_size])
U = xavier([self.config.hidden_size, self.config.n_classes])

z1 = tf.matmul(x, W) + b1
h = tf.nn.relu(z1)
h_dropout = tf.nn.dropout(h, keep_prob=(1-self.droupout_placeholder))
pred = tf.matmul(h_dropout, U) + b2
return pred

def add_loss_op(self, pred):
"""Adds Ops for the loss function to the computational graph.
In this case we are using cross entropy loss.
The loss should be averaged over all examples in the current minibatch.

Hint: You can use tf.nn.softmax_cross_entropy_with_logits to simplify your
implementation. You might find tf.reduce_mean useful.
Args:
pred: A tensor of shape (batch_size, n_classes) containing the output of the neural
network before the softmax layer.
Returns:
loss: A 0-d tensor (scalar)
"""
loss = tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=self.labels_placeholder)
loss = tf.reduce_mean(loss)
return loss

def add_training_op(self, loss):
adam_optim = tf.train.AdamOptimizer(self.config.lr)
train_op = adam_optim.minimize(loss)
return train_op

def train_on_batch(self, sess, inputs_batch, labels_batch):
feed = self.create_feed_dict(inputs_batch, labels_batch=labels_batch,
dropout=self.config.dropout)
_, loss = sess.run([self.train_op, self.loss], feed_dict=feed)
return loss

def run_epoch(self, sess, parser, train_examples, dev_set):
n_minibatches = 1 + len(train_examples) / self.config.batch_size
prog = tf.keras.utils.Progbar(target=n_minibatches)
for i, (train_x, train_y) in enumerate(minibatches(train_examples, self.config.batch_size)):
loss = self.train_on_batch(sess, train_x, train_y)
prog.update(i + 1, [("train loss", loss)])

print "Evaluating on dev set",
dev_UAS, _ = parser.parse(dev_set)
print "- dev UAS: {:.2f}".format(dev_UAS * 100.0)
return dev_UAS

def fit(self, sess, saver, parser, train_examples, dev_set):
best_dev_UAS = 0
for epoch in range(self.config.n_epochs):
print "Epoch {:} out of {:}".format(epoch + 1, self.config.n_epochs)
dev_UAS = self.run_epoch(sess, parser, train_examples, dev_set)
if dev_UAS > best_dev_UAS:
best_dev_UAS = dev_UAS
if saver:
print "New best dev UAS! Saving model in ./data/weights/parser.weights"
saver.save(sess, './data/weights/parser.weights')
print

def __init__(self, config, pretrained_embeddings):
self.pretrained_embeddings = pretrained_embeddings
self.config = config
self.build()

def main(debug=True):
print 80 * "="
print "INITIALIZING"
print 80 * "="
config = Config()
parser, embeddings, train_examples, dev_set, test_set = load_and_preprocess_data(debug)
if not os.path.exists('./data/weights/'):
os.makedirs('./data/weights/')

with tf.Graph().as_default() as graph:
print "Building model...",
start = time.time()
model = ParserModel(config, embeddings)
parser.model = model
init_op = tf.global_variables_initializer()
saver = None if debug else tf.train.Saver()
print "took {:.2f} seconds\n".format(time.time() - start)
graph.finalize()

with tf.Session(graph=graph) as session:
parser.session = session
session.run(init_op)

print 80 * "="
print "TRAINING"
print 80 * "="
model.fit(session, saver, parser, train_examples, dev_set)

if not debug:
print 80 * "="
print "TESTING"
print 80 * "="
print "Restoring the best model weights found on the dev set"
saver.restore(session, './data/weights/parser.weights')
print "Final evaluation on test set",
UAS, dependencies = parser.parse(test_set)
print "- test UAS: {:.2f}".format(UAS * 100.0)
print "Writing predictions"
with open('q2_test.predicted.pkl', 'w') as f:
cPickle.dump(dependencies, f, -1)
print "Done!"
class PartialParse(object):
def __init__(self, sentence):
"""Initializes this partial parse.

Your code should initialize the following fields:
self.stack: The current stack represented as a list with the top of the stack as the
last element of the list.
self.buffer: The current buffer represented as a list with the first item on the
buffer as the first item of the list
self.dependencies: The list of dependencies produced so far. Represented as a list of
tuples where each tuple is of the form (head, dependent).
Order for this list doesn't matter.

The root token should be represented with the string "ROOT"

Args:
sentence: The sentence to be parsed as a list of words.
Your code should not modify the sentence.
"""
# The sentence being parsed is kept for bookkeeping purposes. Do not use it in your code.
self.sentence = sentence

self.stack = ['ROOT']
self.buffer = sentence[:]
self.dependencies = []

def parse_step(self, transition):
"""Performs a single parse step by applying the given transition to this partial parse

Args:
transition: A string that equals "S", "LA", or "RA" representing the shift, left-arc,
and right-arc transitions. You can assume the provided transition is a legal
transition.
"""
if transition == 'S':
self.stack.append(self.buffer[0])
self.buffer.pop(0)
elif transition == 'LA':
self.dependencies.append((self.stack[-1], self.stack[-2]))
self.stack.pop(-2)
else:
self.dependencies.append((self.stack[-2], self.stack[-1]))
self.stack.pop(-1)

def parse(self, transitions):
"""Applies the provided transitions to this PartialParse

Args:
transitions: The list of transitions in the order they should be applied
Returns:
dependencies: The list of dependencies produced when parsing the sentence. Represented
as a list of tuples where each tuple is of the form (head, dependent)
"""
for transition in transitions:
self.parse_step(transition)
return self.dependencies

def test_step(name, transition, stack, buf, deps,
ex_stack, ex_buf, ex_deps):
"""Tests that a single parse step returns the expected output"""
pp = PartialParse([])
pp.stack, pp.buffer, pp.dependencies = stack, buf, deps

pp.parse_step(transition)
stack, buf, deps = (tuple(pp.stack), tuple(pp.buffer), tuple(sorted(pp.dependencies)))
assert stack == ex_stack, \
"{:} test resulted in stack {:}, expected {:}".format(name, stack, ex_stack)
assert buf == ex_buf, \
"{:} test resulted in buffer {:}, expected {:}".format(name, buf, ex_buf)
assert deps == ex_deps, \
"{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps)
print "{:} test passed!".format(name)

调用函数试一下:

test

Our network will predict which transition should be applied next to a partial parse. We could use it to parse a single sentence by applying predicted transitions until the parse is complete. However, neural networks run much more efficiently when making predictions about batches of data at a time (i.e., predicting the next transition for a many different partial parses simultaneously). We can parse sentences in minibatches with the following algorithm.

就是一个parser一次处理一个sentence,怎么能批量处理 sentences ~~ miniBatch ~~

miniBatchParsing

写个程序试一下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import copy
def minibatch_parse(sentences, model, batch_size):
"""Parses a list of sentences in minibatches using a model.

Args:
sentences: A list of sentences to be parsed (each sentence is a list of words)
model: The model that makes parsing decisions. It is assumed to have a function
model.predict(partial_parses) that takes in a list of PartialParses as input and
returns a list of transitions predicted for each parse. That is, after calling
transitions = model.predict(partial_parses)
transitions[i] will be the next transition to apply to partial_parses[i].
batch_size: The number of PartialParses to include in each minibatch
Returns:
dependencies: A list where each element is the dependencies list for a parsed sentence.
Ordering should be the same as in sentences (i.e., dependencies[i] should
contain the parse for sentences[i]).
"""

partial_parses = [PartialParse(s) for s in sentences]

unfinished_parse = copy.copy(partial_parses)

while len(unfinished_parse) > 0:
minibatch = unfinished_parse[0:batch_size]
# perform transition and single step parser on the minibatch until it is empty
while len(minibatch) > 0:
transitions = model.predict(minibatch)
for index, action in enumerate(transitions):
minibatch[index].parse_step(action)
minibatch = [parse for parse in minibatch if len(parse.stack) > 1 or len(parse.buffer) > 0]

# move to the next batch
unfinished_parse = unfinished_parse[batch_size:]

dependencies = []
for n in range(len(sentences)):
dependencies.append(partial_parses[n].dependencies)

return dependencies

class DummyModel(object):
"""Dummy model for testing the minibatch_parse function
First shifts everything onto the stack and then does exclusively right arcs if the first word of
the sentence is "right", "left" if otherwise.
"""
def predict(self, partial_parses):
return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S" for pp in partial_parses]

def test_dependencies(name, deps, ex_deps):
"""Tests the provided dependencies match the expected dependencies"""
deps = tuple(sorted(deps))
assert deps == ex_deps, \
"{:} test resulted in dependency list {:}, expected {:}".format(name, deps, ex_deps)

testminiBatchParsing