正文

条件随机场介绍——anintroductiontoconditionalrandomfields

author  author  2022-10-03  501

关键词：

参考文献

[1] S.M.AjiandR.J.McEliece,“Thegeneralizeddistributivelaw,”IEEETrans- actions on Information Theory, vol. 46, no. 2, pp. 325–343, 2000.

[2] Y.Altun,I.Tsochantaridis,andT.Hofmann,“HiddenMarkovsupportvector machines,” in International Conference on Machine Learning (ICML), 2003.

[3] G.AndrewandJ.Gao,“Scalabletrainingofl1-regularizedlog-linearmodels,” in International Conference on Machine Learning (ICML), 2007.

[4] L.R.Bahl,P.F.Brown,P.V.deSouza,andR.L.Mercer,“Maximummutual information estimation of hidden Markov model parameters for speech recogni- tion,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 11, pp. 49–52, 1986.

[5] G. H. Bakir, T. Hofmann, B. Sch ?olkopf, A. J. Smola, B. Taskar, and S. V. N. Vishwanathan, eds., Predicting Structured Data. MIT Press, 2007.

[6] T. Berg-Kirkpatrick, A. Bouchard-C?ot ?e, J. DeNero, and D. Klein, “Painless unsupervised learning with features,” in Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL), pp. 582–590.

[7] A. Bernal, K. Crammer, A. Hatzigeorgiou, and F. Pereira, “Global discriminative learning for higher-accuracy computational gene prediction,” PLoS Computational Biology, vol. 3, no. 3, 2007.

[8] D. P. Bertsekas, Nonlinear Programming. Athena Scientific, 2nd ed., 1999.

[9] J. Besag, “Statistical analysis of non-lattice data,” The Statistician, vol. 24, no. 3, pp. 179–195, 1975.

[10] A. Blake, P. Kohli, and C. Rother, eds., Markov Random Fields for Vision and Image Processing. MIT Press, 2011.

[11] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” Journal of Machine Learning Research, vol. 3, p. 993, 2003.

[12] P. Blunsom and T. Cohn, “Discriminative word alignment with conditional random fields,” in International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (COLING- ACL), pp. 65–72, 2006.

[13] L. Bottou, “Stochastic gradient descent examples on toy problems,” 2010.

[14] Y. Boykov and M.-P. Jolly, “Interactive graph cuts for optimal boundary & region segmentation of objects in nd images,” in International Conference on Computer Vision (ICCV), vol. 1, pp. 105–112, 2001.

[15] J. K. Bradley and C. Guestrin, “Learning tree conditional random fields,” in International Conference on Machine Learning (ICML), 2010.

[16] R. Bunescu and R. J. Mooney, “Collective information extraction with rela- tional Markov networks,” in Annual Meeting of the Association for Computational Linguistics (ACL), 2004.

[17] R.H.Byrd,J.Nocedal,andR.B.Schnabel,“Representationsofquasi-Newton matrices and their use in limited memory methods,” Mathematical Programming, vol. 63, no. 2, pp. 129–156, 1994.

[18] R.Caruana,“Multitasklearning,”MachineLearning,vol.28,no.1,pp.41–75, 1997.

[19] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervisedlearning algorithms using different performance metrics,” Technical Report TR2005-1973, Cornell University, 2005.

[20] H. L. Chieu and H. T. Ng, “Named entity recognition with a maximum entropy approach,” in Conference on Natural Language Learning (CoNLL), pp. 160–163, 2003.

[21] Y. Choi, C. Cardie, E. Riloff, and S. Patwardhan, “Identifying sources of opinions with conditional random fields and extraction patterns,” in Proceed- ings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT-EMNLP), 2005.

[22] C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Oluko- tun, “Map-reduce for machine learning on multicore,” in Advances in Neural Information Processing Systems 19, pp. 281–288, MIT Press, 2007.

[23] S. Clark and J. R. Curran, “Parsing the WSJ using CCG and log-linear models,” in Proceedings of the Meeting of the Association for Computational Linguistics (ACL), pp. 103–110, 2004.

[24] T. Cohn, “Efficient inference in large conditional random fields,” in European Conference on Machine Learning (ECML), pp. 606–613, Berlin, Germany, September 2006.

[25] M. Collins, “Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms,” in Conference on Empirical Methods in Natural Language Processing (EMNLP), 2002.

[26] P. J. Cowans and M. Szummer, “A graphical model for simultaneous parti- tioning and labeling,” in Conference on Artificial Intelligence and Statistics (AISTATS), 2005.

[27] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, “Online passive-aggressive algorithms,” Journal of Machine Learning Research, 2006.

[28] K. Crammer and Y. Singer, “Ultraconservative online algorithms for multi- class problems,” Journal of Machine Learning Research, vol. 3, pp. 951–991, January 2003.

[29] A. Culotta, R. Bekkerman, and A. McCallum, “Extracting social networks and contact information from email and the web,” in First Conference on Email and Anti-Spam (CEAS), Mountain View, CA, 2004.

[30] A. Culotta and A. McCallum, “Confidence estimation for information extrac- tion,” in Human Language Technology Conference (HLT), 2004.

[31] H. Daum ?e III, J. Langford, and D. Marcu, “Search-based structured predic- tion,” Machine Learning Journal, 2009.

[32] H. Daum ?e III and D. Marcu, “Learning as search optimization: Approximate large margin methods for structured prediction,” in International Conference on Machine Learning (ICML), Bonn, Germany, 2005.

[33] T.Deselaers,B.Alexe,andV.Ferrari,“Localizingobjectswhilelearningtheir appearance,” in European Conference on Computer Vision (ECCV), 2010.

[34] J. V. Dillon and G. Lebanon, “Stochastic composite likelihood,” Journal of Machine Learning Research, vol. 11, pp. 2597–2633, October 2010.

[35] G. Elidan, I. McGraw, and D. Koller, “Residual belief propagation: Informed scheduling for asynchronous message passing,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2006.

[36] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part based models,” IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 2010.

[37] J. Finkel, T. Grenager, and C. D. Manning, “Incorporating non-local infor- mation into information extraction systems by Gibbs sampling,” in Annual Meeting of the Association for Computational Linguistics (ACL), 2005.

[38] J. R. Finkel, A. Kleeman, and C. D. Manning, “Efficient, feature-based, con- ditional random field parsing,” in Annual Meeting of the Association for Computational Linguistics (ACL/HLT), pp. 959–967, 2008.

[39] K. Ganchev, J. Graca, J. Gillenwater, and B. Taskar, “Posterior regulariza- tion for structured latent variable models,” Technical Report MS-CIS-09-16, University of Pennsylvania Department of Computer and Information Science, 2009.

[40] A.E.GelfandandA.F.M.Smith,“Sampling-basedapproachestocalculating marginal densities,” Journal of the American Statistical Association, vol. 85, pp. 398–409, 1990.

[41] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, pp. 721–741, 1984.

[42] N. Ghamrawi and A. McCallum, “Collective multi-label classification,” in Conference on Information and Knowledge Management (CIKM), 2005.

[43] A. Globerson, T. Koo, X. Carreras, and M. Collins, “Exponentiated gradient algorithms for log-linear structured prediction,” in International Conference on Machine Learning (ICML), 2007.

[44] J. Goodman, “Exponential priors for maximum entropy models,” in Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics (HLT/NAACL), 2004.

[45] J. Graca, K. Ganchev, B. Taskar, and F. Pereira, “Posterior vs parameter sparsity in latent variable models,” in Advances in Neural Information Pro- cessing Systems 22, (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds.), pp. 664–672, 2009.

[46] Y. Grandvalet and Y. Bengio, “Semi-supervised learning by entropy mini- mization,” in Advances in Neural Information Processing Systems (NIPS), 2004.

[47] M.L.GregoryandY.Altun,“Usingconditionalrandomfieldstopredictpitch accents in conversational speech,” in Annual Meeting of the Association for Computational Linguistics (ACL), pp. 677–683, 2004.

[48] A.Gunawardana,M.Mahajan,A.Acero,andJ.C.Platt,“Hiddenconditional random fields for phone classification,” in International Conference on Speech Communication and Technology, 2005.

[49] X. He, R. S. Zemel, and M. A. Carreira-Perpin ?i ?an, “Multiscale conditional random fields for image labelling,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2004.

[50] G. E. Hinton, “Training products of experts by minimizing contrastive diver- gence,” Neural Computation, vol. 14, pp. 1771–1800, 2002.

[51] L. Hirschman, A. Yeh, C. Blaschke, and A. Valencia, “Overview of BioCre- AtIvE: critical assessment of information extraction for biology,” BMC Bioin- formatics, vol. 6, no. Suppl 1, no. Suppl 1, 2005.

[52] F.Jiao,S.Wang,C.-H.Lee,R.Greiner,andD.Schuurmans,“Semi-supervised conditional random fields for improved sequence segmentation and labeling,” in Joint Conference of the International Committee on Computational Lin- guistics and the Association for Computational Linguistics (COLING/ACL), 2006.

[53] S. S. Keerthi and S. Sundararajan, “CRF versus SVM-struct for sequence labeling,” Technical report, Yahoo! Research, 2007.

[54] J.KieferandJ.Wolfowitz,“Stochasticestimationofthemaximumofaregres- sion function,” Annals of Mathematical Statistics, vol. 23, pp. 462–466, 1952.

[55] J.-D. Kim, T. Ohta, Y. Tsuruoka, Y. Tateisi, and N. Collier, “Introduction to the bio-entity recognition task at JNLPBA,” in International joint workshop on natural language processing in biomedicine and its applications, pp. 70–75, Association for Computational Linguistics, 2004.

[56] P. Kohli, L. Ladicky, and P. H. S. Torr, “Robust higher order potentials for enforcing label consistency,” International Journal of Computer Vision, vol. 82, no. 3, no. 3, pp. 302–324, 2009.

[57] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

[58] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, no. 2, pp. 498–519, 2001.

[59] T. Kudo, K. Yamamoto, and Y. Matsumoto, “Applying conditional random fields to Japanese morphological analysis,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2004.

[60] A. Kulesza and F. Pereira, “Structured learning with approximate inference,” in Advances in Neural Information Processing Systems, 2008.

[61] S. Kumar and M. Hebert, “Discriminative fields for modeling spatial depen- dencies in natural images,” in Advances in Neural Information Processing Systems (NIPS), 2003.

[62] S. Kumar and M. Hebert, “Discriminative random fields,” International Jour- nal of Computer Vision, vol. 68, no. 2, no. 2, pp. 179–201, 2006.

[63] J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Prob- abilistic models for segmenting and labeling sequence data,” International Conference on Machine Learning (ICML), 2001.

[64] J.Langford,A.Smola,andM.Zinkevich,“Slowlearnersarefast,”inAdvances in Neural Information Processing Systems 22, (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds.), pp. 2331–2339, 2009.

[65] T. Lavergne, O. Capp ?e, and F. Yvon, “Practical very large scale CRFs,” in Annual Meeting of the Association for Computational Linguistics (ACL), pp. 504–513, 2010.

[66] Y. Le Cun, L. Bottou, G. B. Orr, and K.-R. Mu ?ller, “Efficient backprop,” in Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science LNCS 1524, Springer Verlag, 1998.

[67] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, November 1998.

[68] Y. LeCun, S. Chopra, R. Hadsell, R. Marc’Aurelio, and F.-J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, (G. Bakir, T. Hofman, B. Sch ?olkopf, A. Smola, and B. Taskar, eds.), MIT Press, 2007.

[69] S. Z. Li, Markov Random Field Modeling in Image Analysis. Springer-Verlag, 2001.

[70] W. Li and A. McCallum, “A note on semi-supervised learning using Markov random fields,” 2004.

[71] P. Liang, H. Daum ?e III, and D. Klein, “Structure compilation: Trading struc- ture for features,” in International Conference on Machine Learning (ICML), pp. 592–599, 2008.

[72] P. Liang and M. I. Jordan, “An asymptotic analysis of generative, discrim- inative, and pseudolikelihood estimators,” in International Conference on Machine Learning (ICML), pp. 584–591, 2008.

[73] P. Liang, M. I. Jordan, and D. Klein, “Learning from measurements in expo- nential families,” in International Conference on Machine Learning (ICML), 2009.

[74] C.-J. Lin, R. C.-H. Weng, and S. Keerthi, “Trust region newton methods for large-scale logistic regression,” in Interational Conference on Machine Learn- ing (ICML), 2007.

[75] B. G. Lindsay, “Composite likelihood methods,” Contemporary Mathematics, pp. 221–239, 1988.

[76] Y. Liu, J. Carbonell, P. Weigele, and V. Gopalakrishnan, “Protein fold recog- nition using segmentation conditional random fields (SCRFs),” Journal of Computational Biology, vol. 13, no. 2, no. 2, pp. 394–406, 2006.

[77] D. G. Lowe, “Object recognition from local scale-invariant features,” in Inter- national Conference on Computer Vision (ICCV), vol. 2, pp. 1150–1157, 1999.

[78] D. J. Lunn, A. Thomas, N. Best, and D. Spiegelhalter, “WinBUGS — a Bayesian modelling framework: Concepts, structure, and extensibility,” Statis- tics and Computing, vol. 10, no. 4, no. 4, pp. 325–337, 2000.

[79] D. J. C. MacKay, Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

[80] R. Malouf, “A comparison of algorithms for maximum entropy parameter estimation,” in Conference on Natural Language Learning (CoNLL), (D. Roth and A. van den Bosch, eds.), pp. 49–55, 2002.

[81] G. Mann and A. McCallum, “Generalized expectation criteria for semi- supervised learning of conditional random fields,” in Proceedings of Associ- ation of Computational Linguistics, 2008.

[82] M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz, “Building a large anno- tated corpus of English: The Penn Treebank,” Computational Linguistics, vol. 19, no. 2, no. 2, pp. 313–330, 1993.

[83] A. McCallum, “Efficiently inducing features of conditional random fields,” in Conference on Uncertainty in AI (UAI), 2003.

[84] A. McCallum, K. Bellare, and F. Pereira, “A conditional random field for discriminatively-trained finite-state string edit distance,” in Conference on Uncertainty in AI (UAI), 2005.

[85] A. McCallum, D. Freitag, and F. Pereira, “Maximum entropy Markov models for information extraction and segmentation,” in International Conference on Machine Learning (ICML), pp. 591–598, San Francisco, CA, 2000.

[86] A. McCallum and W. Li, “Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons,” in Seventh Conference on Natural Language Learning (CoNLL), 2003.

[87] A. McCallum, K. Schultz, and S. Singh, “FACTORIE: Probabilistic program- ming via imperatively defined factor graphs,” in Advances in Neural Informa- tion Processing Systems (NIPS), 2009.

[88] A. McCallum and B. Wellner, “Conditional models of identity uncertainty with application to noun coreference,” in Advances in Neural Information Processing Systems 17, (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 905– 912, Cambridge, MA: MIT Press, 2005.

[89] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng, “Turbo decoding as an instance of Pearl’s “belief propagation” algorithm,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 2, no. 2, pp. 140–152, 1998.

[90] S. Miller, J. Guinness, and A. Zamanian, “Name tagging with word clus- ters and discriminative training,” in HLT-NAACL 2004: Main Proceedings, (D. Marcu, S. Dumais, and S. Roukos, eds.), pp. 337–342, Boston, Mas- sachusetts, USA: Association for Computational Linguistics, May 2–May 7 2004.

[91] T. P. Minka, “The EP energy function and minimization schemes,” Technical report, 2001.

[92] T. P. Minka, “A comparsion of numerical optimizers for logistic regression,” Technical report, 2003.

[93] T. P. Minka, “Discriminative models, not discriminative training,” Technical Report MSR-TR-2005-144, Microsoft Research, October 2005.

[94] T. P. Minka, “Divergence measures and message passing,” Technical Report MSR-TR-2005-173, Microsoft Research, 2005.

[95] I. Murray, “Advances in Markov chain Monte Carlo methods,” PhD thesis, Gatsby computational neuroscience unit, University College London, 2007.

[96] I. Murray, Z. Ghahramani, and D. J. C. MacKay, “MCMC for doubly- intractable distributions,” in Uncertainty in Artificial Intelligence (UAI), pp. 359–366, AUAI Press, 2006.

[97] A. Y. Ng, “Feature selection, l1 vs. l2 regularization, and rotational invari- ance,” in International Conference on Machine Learning (ICML), 2004.

[98] A. Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes,” in Advances in Neu- ral Information Processing Systems 14, (T. G. Dietterich, S. Becker, and Z. Ghahramani, eds.), pp. 841–848, Cambridge, MA: MIT Press, 2002.

[99] N. Nguyen and Y. Guo, “Comparisons of sequence labeling algorithms and extensions,” in International Conference on Machine Learning (ICML), 2007.

[100] J. Nocedal and S. J. Wright, Numerical Optimization. New York: Springer-Verlag, 1999.

[101] S. Nowozin and C. H. Lampert, “Structured prediction and learning in computer vision,” Foundations and Trends in Computer Graphics and Vision, vol. 6, no. 3-4, no. 3-4, 2011.

[102] C. Pal, C. Sutton, and A. McCallum, “Sparse forward-backward using minimum divergence beams for fast training of conditional random fields,” in Inter- national Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2006.

[103] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

[104] F. Peng, F. Feng, and A. McCallum, “Chinese segmentation and new word detection using conditional random fields,” in International Conference on Computational Linguistics (COLING), pp. 562–568, 2004.

[105] F. Peng and A. McCallum, “Accurate information extraction from research papers using conditional random fields,” in Human Language Technology Con- ference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), 2004.

[106] D.Pinto,A.McCallum,X.Wei,andW.B.Croft,“Tableextractionusingcon- ditional random fields,” in ACM SIGIR Conference on Research and Devel- opment in Information Retrieval, 2003.

[107] Y. Qi, M. Szummer, and T. P. Minka, “Bayesian conditional random fields,” in Conference on Artificial Intelligence and Statistics (AISTATS), 2005.

[108] Y. Qi, M. Szummer, and T. P. Minka, “Diagram structure recognition by Bayesian conditional random fields,” in International Conference on Computer Vision and Pattern Recognition, 2005.

[109] A.Quattoni,M.Collins,andT.Darrell,“Conditionalrandomfieldsforobject recognition,” in Advances in Neural Information Processing Systems (NIPS), pp. 1097–1104, 2005.

[110] A. Quattoni, S. Wang, L.-P. Morency, M. Collins, and T. Darrell, “Hidden- state conditional random fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.

[111] L.R.Rabiner,“AtutorialonhiddenMarkovmodelsandselectedapplications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, no. 2, pp. 257– 286, 1989.

[112] N. Ratliff, J. A. Bagnell, and M. Zinkevich, “Maximum margin planning,” in International Conference on Machine Learning, July 2006.

[113] M.RichardsonandP.Domingos,“Markovlogicnetworks,”MachineLearning, vol. 62, no. 1–2, no. 1–2, pp. 107–136, 2006.

[114] S. Riezler, T. King, R. Kaplan, R. Crouch, J. T. Maxwell III, and M. John- son, “Parsing the Wall Street Journal using a lexical-functional grammar and discriminative estimation techniques,” in Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2002.

[115] H. Robbins and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400–407, 1951.

[116] C. Robert and G. Casella, Monte Carlo Statistical Methods. Springer, 2004.

[117] D.Rosenberg,D.Klein,andB.Taskar,“Mixture-of-parentsmaximumentropy Markov models,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2007.

[118] D. Roth and W. Yih, “Integer linear programming inference for conditional random fields,” in International Conference on Machine Learning (ICML), pp. 737–744, 2005.

[119] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive foreground extraction using iterated graph cuts,” ACM Transactions on Graphics (SIG- GRAPH), vol. 23, no. 3, no. 3, pp. 309–314, 2004.

[120] E. F. T. K. Sang and S. Buchholz, “Introduction to the CoNLL-2000 shared task: Chunking,” in Proceedings of CoNLL-2000 and LLL-2000, 2000. See http://lcg- www.uia.ac.be/～erikt/research/np- chunking.html.

[121] E. F. T. K. Sang and F. D. Meulder, “Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition,” in Proceed- ings of CoNLL-2003, (W. Daelemans and M. Osborne, eds.), pp. 142–147, Edmonton, Canada, 2003.

[122] S. Sarawagi and W. W. Cohen, “Semi-Markov conditional random fields for information extraction,” in Advances in Neural Information Processing Sys- tems 17, (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 1185–1192, Cam- bridge, MA: MIT Press, 2005.

[123] K. Sato and Y. Sakakibara, “RNA secondary structural alignment with conditional random fields,” Bioinformatics, vol. 21, pp. ii237–242, 2005.

[124] B. Settles, “Abner: An open source tool for automatically tagging genes, pro- teins, and other entity names in text,” Bioinformatics, vol. 21, no. 14, no. 14, pp. 3191–3192, 2005.

[125] F. Sha and F. Pereira, “Shallow parsing with conditional random fields,” in Conference on Human Language Technology and North American Association for Computational Linguistics (HLT-NAACL), pp. 213–220, 2003.

[126] S. Shalev-Shwartz, Y. Singer, and N. Srebro, “Pegasos: Primal estimated sub- gradient solver for SVM,” in International Conference on Machine Learning (ICML), 2007.

[127] J.Shotton,J.Winn,C.Rother,andA.Criminisi,“Textonboost:Jointappear- ance, shape and context modeling for mulit-class object recognition and seg- mentation,” in European Conference on Computer Vision (ECCV), 2006.

[128] P. Singla and P. Domingos, “Discriminative training of Markov logic net- works,” in Proceedings of the National Conference on Artificial Intelligence, pp. 868–873, Pittsburgh, PA, 2005.

[129] F. K. Soong and E.-F. Huang, “A tree-trellis based fast search for finding the n-best sentence hypotheses in continuous speech recognition,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1991.

[130] D. H. Stern, T. Graepel, and D. J. C. MacKay, “Modelling uncertainty in the game of go,” in Advances in Neural Information Processing Systems 17, (L. K. Saul, Y. Weiss, and L. Bottou, eds.), pp. 1353–1360, Cambridge, MA: MIT Press, 2005.

[131] I. Sutskever and T. Tieleman, “On the convergence properties of contrastive divergence,” in Conference on Artificial Intelligence and Statistics (AIS- TATS), 2010.

[132] C. Sutton, “Efficient Training Methods for Conditional Random Fields,” PhD thesis, University of Massachusetts, 2008.

[133] C. Sutton and A. McCallum, “Collective segmentation and labeling of distant entities in information extraction,” in ICML Workshop on Statistical Rela- tional Learning and Its Connections to Other Fields, 2004.

[134] C. Sutton and A. McCallum, “Piecewise training of undirected models,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2005.

[135] C. Sutton and A. McCallum, “Improved dynamic schedules for belief propa- gation,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2007.

[136] C. Sutton and A. McCallum, “An introduction to conditional random fields for relational learning,” in Introduction to Statistical Relational Learning, (L. Getoor and B. Taskar, eds.), MIT Press, 2007.

[137] C. Sutton and A. McCallum, “Piecewise training for structured prediction,” Machine Learning, vol. 77, no. 2–3, no. 2–3, pp. 165–194, 2009.

[138] C. Sutton, A. McCallum, and K. Rohanimanesh, “Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data,” Journal of Machine Learning Research, vol. 8, pp. 693–723, March 2007.

[139] C. Sutton and T. Minka, “Local training and belief propagation,” Technical Report TR-2006-121, Microsoft Research, 2006.

[140] C. Sutton, K. Rohanimanesh, and A. McCallum, “Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data,” in International Conference on Machine Learning (ICML), 2004.

[141] C. Sutton, M. Sindelar, and A. McCallum, “Reducing weight undertraining in structured discriminative learning,” in Conference on Human Language Technology and North American Association for Computational Linguistics (HLT-NAACL), 2006.

[142] B. Taskar, P. Abbeel, and D. Koller, “Discriminative probabilistic models for relational data,” in Conference on Uncertainty in Artificial Intelligence (UAI), 2002.

[143] B. Taskar, C. Guestrin, and D. Koller, “Max-margin Markov networks,” in Advances in Neural Information Processing Systems 16, (S. Thrun, L. Saul, and B. Sch ?olkopf, eds.), Cambridge, MA: MIT Press, 2004.

[144] B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning, “Max-margin parsing,” in Empirical Methods in Natural Language Processing (EMNLP04), 2004.

[145] B. Taskar, S. Lacoste-Julien, and D. Klein, “A discriminative matching approach to word alignment,” in Conference on Human Language Technol- ogy and Empirical Methods in Natural Language Processing (HLT-EMNLP), pp. 73–80, 2005.

[146] K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of- speech tagging with a cyclic dependency network,” in HLT-NAACL, 2003.

[147] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, “Support vector machine learning for interdependent and structured output spaces,” in Inter- ational Conference on Machine Learning (ICML), ICML ’04, 2004.

[148] P. Viola and M. Narasimhan, “Learning to extract information from semi- structured text using a discriminative context free grammar,” in Proceedings of the ACM SIGIR, 2005.

[149] S. V. N. Vishwanathan, N. N. Schraudolph, M. W. Schmidt, and K. Mur- phy, “Accelerated training of conditional random fields with stochastic meta-descent,” in International Conference on Machine Learning (ICML), pp. 969–976, 2006.

[150] M. J. Wainwright and M. I. Jordan, “Graphical models, exponential fami- lies, and variational inference,” Foundations and Trends in Machine Learning, vol. 1, no. 1-2, no. 1-2, pp. 1–305, 2008.

[151] M.J.Wainwright,“EstimatingthewrongMarkovrandomfield:Benefitsinthe computation-limited setting,” in Advances in Neural Information Processing Systems 18, (Y. Weiss, B. Sch ?olkopf, and J. Platt, eds.), Cambridge, MA: MIT Press, 2006.

[152] M. J. Wainwright, T. Jaakkola, and A. S. Willsky, “Tree-based reparameteri- zation framework for analysis of sum-product and related algorithms,” IEEE Transactions on Information Theory, vol. 45, no. 9, no. 9, pp. 1120–1146, 2003.

[153] H. Wallach, “Efficient training of conditional random fields,” M.Sc. thesis, University of Edinburgh, 2002.

[154] M.WellingandS.Parise,“Bayesianrandomfields:TheBethe-Laplaceapprox- imation,” in Uncertainty in Artificial Intelligence (UAI), 2006.

[155] M. Wick, K. Rohanimanesh, A. Culotta, and A. McCallum, “SampleRank: Learning preferences from atomic gradients,” in Neural Information Process- ing Systems (NIPS) Workshop on Advances in Ranking, 2009.

[156] M. Wick, K. Rohanimanesh, A. McCallum, and A. Doan, “A discriminative approach to ontology alignment,” in International Workshop on New Trends in Information Integration (NTII), 2008.

[157] J. S. Yedidia, W. T. Freeman, and Y. Weiss, “Constructing free energy approximations and generalized belief propagation algorithms,” Technical Report TR2004-040, Mitsubishi Electric Research Laboratories, 2004.

[158] J.S.Yedidia,W.T.Freeman,andY.Weiss,“Constructingfree-energyapprox- imations and generalized belief propagation algorithms,” IEEE Transactions on Information Theory, vol. 51, no. 7, pp. 2282–2312, July 2005.

[159] C.-N. Yu and T. Joachims, “Learning structural svms with latent variables,” in International Conference on Machine Learning (ICML), 2009.

[160] J. Yu, S. V. N. Vishwanathan, S. Gu ?unter, and N. N. Schraudolph, “A quasi- Newton approach to nonsmooth convex optimization problems in machine learning,” Journal of Machine Learning Research, vol. 11, pp. 1145–1200, March 2010.

[161] Y. Zhang and C. Sutton, “Quasi-Newton Markov chain Monte Carlo,” in Advances in Neural Information Processing Systems (NIPS), 2011.

条件随机场（crf）-基础

　　条件随机场（conditionalrandomfields，简称CRF，或CRFs）下文简称CRF，是一种典型的判别模型，相比隐马尔可夫模型可以没有很强的假设存在，在分词、词性标注、命名实体识别等领域有较好的应用。CRF是在马尔可夫随机场的基础... 查看详情

条件随机场介绍——anintroductiontoconditionalrandomfields

6.相关研究和未来方向本部分简要分析条件随机场的发展路线，特别是在结构化预测（structuredprediction）方面。除此之外，还将分析条件随机场与神经网络和最大熵马尔可夫模型（MEMMs）的关系。最后列出了几个未来研究的开放领... 查看详情

条件随机场介绍——anintroductiontoconditionalrandomfields

参考文献[1]S.M.AjiandR.J.McEliece,“Thegeneralizeddistributivelaw,”IEEETrans-actionsonInformationTheory,vol.46,no.2,pp.325–343,2000.[2]Y.Altun,I.Tsochantaridis,andT.Hofmann,“HiddenMarkovsupportvectormachin 查看详情

条件随机场介绍——anintroductiontoconditionalrandomfields

4.推断高效的推断算法对条件随机场的训练和序列预测都非常重要。主要有两个推断问题：第一，模型训练之后，为新的输入(mathbf{x})确定最可能的标记(mathbf{y}^*=argmax_{mathbf{y}}p(mathbf{y}|mathbf{x}))；第二，如第5部分所述，参数估计... 查看详情

条件随机场crf介绍

...示逐帧softmax并没有直接考虑输出的上下文关联条件查看详情

条件随机场介绍——anintroductiontoconditionalrandomfields

...些应用所共有的特征，是在已知观测特征向量(mathbf{x})的条件下，预测随机向量输出(mathbf{y}={y_0,y_1,cdots,y_T})。以自查看详情

条件随机场-应用

　　今天介绍CRFs在中文分词中的应用　　工具：CRF++,可以去 https://taku910.github.io/crfpp/下载，训练数据和测试数据可以考虑使用bakeoff2005,这是链接http://sighan.cs.uchicago.edu/bakeoff2005/　　首先需要了解一些概念　　字标记法——统... 查看详情

条件随机场之crf++源码详解-开篇

介绍　　最近在用条件随机场做切分标注相关的工作，系统学习了下条件随机场模型。能够理解推导过程，但还是比较抽象。因此想研究下模型实现的具体过程，比如：1）状态特征和转移特征具体是什么以及如何构造2）前向后... 查看详情

条件随机场(crf)-1-简介（转载）

转载自：http://www.68idc.cn/help/jiabenmake/qita/20160530618222.html 首先我们先弄懂什么是“条件随机场”，然后再探索其详细内容。于是，先介绍几个名词。马尔可夫链 & 查看详情

条件随机场(crf)-2-定义和形式（转载）

转载自：http://www.68idc.cn/help/jiabenmake/qita/20160530618218.html 参考书本：《2012.李航.统计学习方法.pdf》书上首先介绍概率无向图模型，然后叙述条件随机场的定义和查看详情

条件随机场入门条件随机场的训练

本节讨论给定训练数据集估计条件随机场模型参数的问题，即条件随机场的学习问题。条件随机场模型实际上是定义在时序数据上的对数线形模型，其学习方法包括极大似然估计和正则化的极大似然估计。具体的优化实现算法有... 查看详情

条件随机场入门条件随机场的预测算法

...预测问题是给定模型参数和输入序列（观测序列）x，求条件概率最大的输出序列（标记序列）$y^*$，即对观测序列进行标注。条件随机场的预测算法同HMM还是维特比算法，根据CRF模型可得：egin{aligned}y^*&=argmax_yP_w(y|x)\&= ... 查看详情

条件随机场入门条件随机场的概率计算问题

条件随机场的概率计算问题是给定条件随机场P(Y|X)，输入序列x和输出序列y,计算条件概率$P(y_i|x)$，$P(y_{i-1},y_i|x)$以及相应的数学期望的问题。为了方便起见，像HMM那样，引进前向-后向向量，递归地计算以上概率及期望值。这样... 查看详情

ml-13-5条件随机场（crf-conditionalrandomfield）

目录知识串讲HMMVSMEMM从随机场到马尔科夫随机场条件随机场(CRF)MRF因子分解定理线性链条件随机场(Linear-CRF)一句话简介：条件随机场(ConditionalRandomFields,以下简称CRF)是给定一组输入序列条件下另一组输出序列的条件概率分布模型... 查看详情

条件随机场之crf++源码详解-预测(代码片段)

...相对来说比较简单，所以这篇文章理解起来也会比上一篇条件随机场训练的内容要容易。预测　　上一篇条件随机场训练的源码详解中，有一个地方并没有介绍。就是训练结束后，会把待优化权重alpha等变量保存到文件中，也就... 查看详情

理解条件随机场（转）

理解条件随机场最好的办法就是用一个现实的例子来说明它。但是目前中文的条件随机场文章鲜有这样干的，可能写文章的人都是大牛，不屑于举例子吧。于是乎，我翻译了这篇文章。希望对其他伙伴有所帮助。原文在这里[http:... 查看详情

条件随机场

...calT$上进行对数似然函数$mathcalL$的极大化。根据上一篇《条件随机场（三）》，我们知道线性链CRF的模型为egin{equation}p_{vec{lambda}}(vecy|vecx)=frac1{Z_{vec{lambda}}(vecx)}exp( 查看详情

ml-13-6条件随机场的三个问题（crf-conditionalrandomfield）

目录条件随机场CRF——前向后向算法评估标记序列概率条件随机场CRF——模型参数学习条件随机场CRF——维特比算法解码一、条件随机场CRF——前向后向算法评估标记序列概率　　linear-CRF第一个问题是评... 查看详情