Topological Relation Aware Transformer

Download Article

DOI: 10.21522/TIJAR.2014.11.01.Art015

Authors : Nathan Manzambi Ndongala, Taratisio Ndwiga

Abstract:

We present a Topological Relation Aware Transformer (T-RAT), a specialized head transformer to open sets, an element of the topology τ generated by the set S, the set of all pre-existing relations between input tokens of the model. From this topological space (S, τ), we present the way to spread each open set to one head of our Transformer. T-RAT improves exact match accuracy in Text-To-SQL challenge (62.09%) without any enhancement of large language models compared to the baseline models RAT-SQL (57.2%) and Light RAT-SQL (60.25%).

Keywords: Deep learning, Natural Language Processing, Neural Semantic Parsing, Relation Aware Transformer, RAT-SQL, Text-To-SQL Transformer.

References:

[1] T. Scholak, R. Li, D. Bahdanau, H. de Vries, and C. Pal, “DuoRAT: Towards Simpler Text-to-SQL Models,” Oct. 2020, doi: 10.18653/v1/2021.naacl-main.103.

[2] W. Hou and Y. Nie, “Seq2seq-Attention Question Answering Model.

[3] O. Goldman, V. Latcinnik, U. Naveh, A. Globerson, and J. Berant, “Weakly-supervised Semantic Parsing with Abstract Examples,” Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.05240.

[4] X. V. Lin, R. Socher, and C. Xiong, “Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing,” Dec. 2020, [Online]. Available: http://arxiv.org/abs/2012.12627

[5] I Gur, S. Yavuz, Y. Su, and X. Yan, “DialSQL: Dialogue Based Structured Query Generation.”

[6] X. Xu, C. Liu, and D. Song, “SQLNet: Generating Structured Queries from Natural Language Without Reinforcement Learning,” Nov. 2017, [Online]. Available: http://arxiv.org/abs/1711.04436.

[7] B. Wang, R. Shin, X. Liu, O. Polozov, and M. Richardson, “RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers,” 2020. [Online]. Available: https://github.com/Microsoft/rat-sql.

[8] N. M. Ndongala, “Light RAT-SQL: A RAT-SQL with More Abstraction and Less Embedding of Pre-existing Relations,” Texila Int. J. Acad. Res., vol. 10, no. 2, pp. 1–11, 2023, doi: 10.21522/tijar.2014.10.02.art001.

[9] G. Huilin, G. Tong, W. Fan, and M. Chao, “Bidirectional attention for SQL generation,” in 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analytics, ICCCBDA 2019, Institute of Electrical and Electronics Engineers Inc., Apr. 2019, pp. 676–682. doi: 10.1109/ICCCBDA.2019.8725626.

[10] K. Clark, M.-T. Luong, Q. V. Le, and C. D. Manning, “ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators,” Mar. 2020, [Online]. Available: http://arxiv.org/abs/2003.10555.

[11] M. Lewis et al., “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.13461.

[12] M. Shoeybi, M. Patwary, R. Puri, P. Legresley, J. Casper, and B. Catanzaro, “Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism,” 2020. [Online]. Available: https://github.com/.

[13] T. B. Brown et al., “Language Models are Few-Shot Learners,” 2020. [Online]. Available: https://commoncrawl.org/the-data/.

[14] Z. Lan et al., “ALBERT: A Lite Bert for Self-Supervised Learning Of Language Representations,” 2020. [Online]. Available: https://github.com/google-research/ALBERT.

[15] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language Models are Unsupervised Multitask Learners,” 2019. [Online]. Available: https://github.com/codelucas/newspaper.

[16] V. Zhong, C. Xiong, and R. Socher, “Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning,” Aug. 2017, [Online]. Available: http://arxiv.org/abs/1709.00103.

[17] A. Vaswani et al., “Attention Is All You Need,” Jun. 2017, [Online]. Available: http://arxiv.org/abs/1706.03762.

[18] O. Vinyals, M. Fortunato, and N. Jaitly, “Pointer Networks,” Jun. 2015, [Online]. Available: http://arxiv.org/abs/1506.03134.

[19] Z. Tu, Z. Lu, L. Yang, X. Liu, and H. Li, “Modeling coverage for neural machine translation,” in 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Long Papers, Jan. 2016, pp. 76–85. doi: 10.18653/v1/p16-1008.

[20] P. Shaw, J. Uszkoreit, G. Brain, and A. Vaswani, “Self-Attention with Relative Position Representations,” 2018.

[21] L. Zehui, P. Liu, L. Huang, J. Chen, X. Qiu, and X. Huang, “DropAttention: A Regularization Method for Fully Connected Self-Attention Networks,” Jul. 2019, Accessed: Apr. 04, 2022. [Online]. Available: http://arxiv.org/abs/1907.11065.

[22] E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic parrots: Can language models be too big?” in FAccT 2021 - Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Association for Computing Machinery, Inc, Mar. 2021, pp. 610–623. doi: 10.1145/3442188.3445922.

[23] D. E. B. A. D. Ecoding and E. Bert, “Entangled A Ttention with D Is -,” 2021.

[24] W. Fedus, B. Zoph, and N. Shazeer, “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity,” 2022.

[25] J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Bidirectional Encoder Representations from Transformers),” Bert-Ppt, 2018.

[26] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter,” Oct. 2019, [Online]. Available: http://arxiv.org/abs/1910.01108.

[27] Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” Jul. 2019, [Online]. Available: http://arxiv.org/abs/1907.11692.

[28] H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL,” Feb. 2023, [Online]. Available: http://arxiv.org/abs/2302.05965

[29] L. Zhao, H. Cao, and Y. Zhao, “GP: Context-free Grammar Pre-training for Text-to-SQL Parsers,” Jan. 2021, [Online]. Available: http://arxiv.org/abs/2101.09901.

[30] P. Shi et al., “Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training,” Dec. 2020, [Online]. Available: http://arxiv.org/abs/2012.10309

[31] T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing,” Sep. 2020, [Online]. Available: http://arxiv.org/abs/2009.13845.

[32] X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” Oct. 2020, doi: 10.18653/v1/2021.naacl-main.105.

[33] P. Yin and G. Neubig, “TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.02720.

[34] P. Yin, C. Zhou, J. He, and G. Neubig, “STRUCTVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing.” [Online]. Available: http://pcyin.me/struct.

[35] L. Dong and M. Lapata, “Language to Logical Form with Neural Attention,” Jan. 2016, [Online]. Available: http://arxiv.org/abs/1601.01280.

[36] L. Dong and M. Lapata, “Coarse-to-Fine Decoding for Neural Semantic Parsing,” May 2018, [Online]. Available: http://arxiv.org/abs/1805.04793.

[37] A. Gopalan et al., “Neural Structured Learning: Training Neural Networks with Structured Signals,” in WSDM 2021 - Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021. doi: 10.1145/3437963.3441666.

[38] I. Gopalan et al., “Neural Structured Learning,” 2020. doi: 10.1145/3394486.3406701.

[39] II. Bogin, M. Gardner, and J. Berant, “Global Reasoning over Database Structures for Text-to-SQL Parsing,” 2019.

[40] Y. Ma and J. Tang, “Graph Neural Networks in Natural Language Processing,” in Deep Learning on Graphs, 2021. doi: 10.1017/9781108924184.015.

[41] I. Hui et al., “S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers,” Mar. 2022, [Online]. Available: http://arxiv.org/abs/2203.06958.

[42] R. Cai, J. Yuan, B. Xu, and Z. Hao, “SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL,” Oct. 2021, [Online]. Available: http://arxiv.org/abs/2111.00653.

[43] R. Cao, L. Chen, Z. Chen, Y. Zhao, S. Zhu, and K. Yu, “LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations,” Jun. 2021, [Online]. Available: http://arxiv.org/abs/2106.01093.

[44] OpenAI et al., “GPT-4 Technical Report,” vol. 4, pp. 1–100, 2023, [Online]. Available: http://arxiv.org/abs/2303.08774.

[45] M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction,” no. NeurIPS, pp. 1–34, 2023, [Online]. Available: http://arxiv.org/abs/2304.11015.

[46] I. Gao et al., “Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation,” 2023, [Online]. Available: http://arxiv.org/abs/2308.15363.

[47] I. A. Dahl et al., “EXPANDING THE SCOPE OF THE ATIS TASK: THE ATIS-3 CORPUS.”

[48] Y. Gan et al., “Towards robustness of text-to-SQL models against synonym substitution,” ACL-IJCNLP 2021 - 59th Annu. Meet. Assoc. Comput. Linguist. 11th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 2505–2515, 2021, doi: 10.18653/v1/2021.acl-long.195.

[49] P. Utama et al., “An End-to-end Neural Natural Language Interface for Databases,” 2018, [Online]. Available: http://arxiv.org/abs/1804.00401.

[50] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” Sep. 2018, [Online]. Available: http://arxiv.org/abs/1809.08887.

[51] T. Yu et al., “SPARC: Cross-domain semantic parsing in context,” ACL 2019 - 57th Annu. Meet. Assoc. Comput. Linguist. Proc. Conf., pp. 4511–4523, 2020, doi: 10.18653/v1/p19-1443.

[52] X. Yu et al., “Dataset and enhanced model for eligibility criteria-to-SQL semantic parsing,” Lr. 2020 - 12th Int. Conf. Lang. Resour. Eval. Conf. Proc., no. May, pp. 5829–5837, 2020.

[53] H. Zhang et al., “CSS: A Large-scale Cross-schema Chinese Text-to-SQL Medical Dataset,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 6970–6983, 2023, doi: 10.18653/v1/2023.findings-acl.435.

[54] Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization,” EMNLP 2021 - 2021 Conf. Empir. Methods Nat. Lang. Process. Proc., pp. 8926–8931, 2021, doi: 10.18653/v1/2021.emnlp-main.702.

[55] C. T. Hemphill, J. J. Godfrey, and G. R. Doddington, “The ATIS Spoken Language Systems Pilot Corpus.”

[56] Q. Min, Y. Shi, and Y. Zhang, “A pilot study for Chinese SQL semantic parsing,” EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3652–3658, 2019, doi: 10.18653/v1/d19-1377.

[57] D. Sean and P. S. Meltzer, “GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor,” Bioinformatics, vol. 23, no. 14, pp. 1846–1847, Jul. 2007, doi: 10.1093/bioinformatics/btm254.

[58] T. Shi, C. Zhao, J. Boyd-Graber, H. Daumé, and L. Lee, “On the potential of lexico-logical alignments for semantic parsing to SQL queries,” Find. Assoc. Comput. Linguist. Find. ACL EMNLP 2020, pp. 1849–1864, 2020, doi: 10.18653/v1/2020.findings-emnlp.167.

[59] M. Singh et al., “CL Scholar: The ACL Anthology Knowledge Graph Miner,” NAACL HLT 2018 - 2018 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Proc. Demonstr. Sess., pp. 16–20, 2018, doi: 10.18653/v1/n18-5004.

[60] A. Suhr, M. W. Chang, P. Shaw, and K. Lee, “Exploring unexplored generalization challenges for cross-database semantic parsing,” Proc. Annu. Meet. Assoc. Comput. Linguist., pp. 8372–8388, 2020, doi: 10.18653/v1/2020.acl-main.742.

[61] L. R. Tang and R. J. Mooney, “A u t o m a t e d Construction of Database Interfaces: Integrating Statistical and Relational Learning for Semantic Parsing,” 1996.

[62] J. MUNKRES, Topology. Pearson College Div, 2000. [Online]. Available: https://www.amazon.com/Topology-2nd-James-Munkres/dp/0131816292.

[63] .P Yin and G. Neubig, “A Syntactic Neural Model for General-Purpose Code Generation,” Apr. 2017, [Online]. Available: http://arxiv.org/abs/1704.01696.

[64] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. Mcclosky, “The Stanford CoreNLP Natural Language Processing Toolkit,” 2014.


[65] J. Pennington, R. Socher, and C. D. Manning, “GloVe: Global Vectors for Word Representation,” 2014. [Online]. Available: http://nlp.

[66] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput., vol. 9, no. 8, 1997, doi: 10.1162/neco.1997.9.8.1735.

[67] A. Paszke et al., “PyTorch: An Imperative Style, High-Performance Deep Learning Library,” 2019.

[68] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” Dec. 2014, [Online]. Available: http://arxiv.org/abs/1412.6980.

[69] J. Guo et al., “Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation,” 2019.

B. L. Edelman, S. Goel, S. Kakade, and C. Zhang, “Inductive Biases and Variable Creation in Self-Attention Mechanisms,” Proc. Mach. Learn. Res., vol. 162, pp. 5793–5831, 2022.