Hayim Lapin – Maryland Institute for Technology in the Humanities https://mith.umd.edu Thu, 08 Oct 2020 20:02:39 +0000 en-US hourly 1 https://wordpress.org/?v=5.5.1 New Version of Digital Mishnah Demo https://mith.umd.edu/new-version-of-digital-mishnah-demo/ https://mith.umd.edu/new-version-of-digital-mishnah-demo/#comments Mon, 25 Feb 2013 19:49:03 +0000 http://mith.umd.edu/?p=10126 We have released a new version of the demo. Much of the change is in styling and branding, but there are new texts added, some new views, and a new naming convention. New texts. Gradually, I am replacing the sample files with just Bava Metsi’a Ch. 2 with transcriptions covering all of tractate Neziqin (the [...]

The post New Version of Digital Mishnah Demo appeared first on Maryland Institute for Technology in the Humanities.

]]>
We have released a new version of the demo. Much of the change is in styling and branding, but there are new texts added, some new views, and a new naming convention.

New texts. Gradually, I am replacing the sample files with just Bava Metsi’a Ch. 2 with transcriptions covering all of tractate Neziqin (the Bavot). Currently, this applies to the Maimonides autograph, Paris BNF Héb. 328-329, and the Naples editio princeps (with the marginalia from the copy in the National Library of Israel.) Work is ongoing on other witnesses. Some new Genizah fragments have been added, and, in the next release, I hope to be able to show some samples of virtually joined manuscripts that can be broken out into the individual fragments.

New views. Users can now browse through documents page by page or column by column, and they can see witnesses chunked by chapter in a compact view.

New naming convention. Sigla for the manuscripts will now be based on the recent Thesaurus of Talmudic Manuscripts. Print editions will be based on serial numbers in similar format. We are experimenting with a convention for sigla that is slightly more informative, so that it will be possible to tell that a given witness includes the Mishnah alone, or a commentary in Hebrew or Arabic, and perhaps other data such as region and date of hand. (This last will require expert typing of the manuscripts.)

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on February 23, 2013.

The post New Version of Digital Mishnah Demo appeared first on Maryland Institute for Technology in the Humanities.

]]>
https://mith.umd.edu/new-version-of-digital-mishnah-demo/feed/ 1
Answering the Mail: Digital Mishnah Project Update https://mith.umd.edu/answering-the-mail-digital-mishnah-project-update/ Thu, 15 Nov 2012 13:30:14 +0000 http://mith.umd.edu/?p=9855 I had promised to respond to comments on the Digital Mishnah demo, so, at long last, here goes. Request for greater highlighting of collation options (Tim Finney). In fact, CollateX has several alignment methods built into libraries that can be utilized. This is outside of what I feel comfortable talking about (I don’t really read [...]

The post Answering the Mail: Digital Mishnah Project Update appeared first on Maryland Institute for Technology in the Humanities.

]]>
I had promised to respond to comments on the Digital Mishnah demo, so, at long last, here goes.

  1. Request for greater highlighting of collation options (Tim Finney). In fact, CollateX has several alignment methods built into libraries that can be utilized. This is outside of what I feel comfortable talking about (I don’t really read Java … yet) but there is no reason we can’t allow users to select methods and see what yields the best results.
  2. Don’t build unnecessary mechanisms (Desmond Schmidt). Well taken. As a non-programmer, I’m not always the best judge of what is difficult or simple to build. The point though was to allow manual error-correction of the alignment by adding or deleting cells in a table row. As for the order of witnesses, my own sense is that it is extremely useful for visually examining groupings of manuscripts.
  3. Apparatus unnecessary (Desmond Schmidt), or unwieldy (Daniel Stoekl, Naftali Cohn). Well, Stoekl, a potential user, suggests that the print-type apparatus is useful. It is a way of compactly summarizing data. My include-everything model is in fact unwieldy, and the suggestion to leave out readings that are identical with the base text would simplify the situation. Just how text families can be generated and then used in the apparatus is a discussion for a later day, but it is definitely a desideratum.
  4. Additional textual detail; handling absence of evidence (Daniel Stoekl, Naftali Cohn). These are important points. For collation, I made the decision to present a simplified text, but obviously this will have to be made more complex. I don’t think additional tagging is necessary in most cases; different processing is. For additions, corrections in second hand, we effectively generate an additional witness, but ignore the readings of that secondary witness except when they differ from the primary witness. For dealing with highly lacunose texts, the method will be: to have a reference text that includes individual addressing for each word in the Mishnah. The tagging in the lacunose text aligns the text and lacunae with the reference text. At a minimum, this allows us to identify “gaps” to be ignored and “gaps” to be processed. A reference text of the Bavot exists, and I am working on extending it further, but we are still working on the pointing mechanism.
  5. Search functionality (Naftali Cohn). Yes, but what? Ironically, I can envision complex searches (a particular abbreviation in texts in Sephardic hands) more easily than simple searches. What should a search for “Rabbi Meir” or “Prohibited” return?
  6. Other matters (Naftali Cohn). My December and January task is to start working on page by page and chapter by chapter view, especially that now my text sample includes extended runs of text. I’d also like to be able to generate apparatus or alignments for a whole chapter.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on November 13th, 2012.

The post Answering the Mail: Digital Mishnah Project Update appeared first on Maryland Institute for Technology in the Humanities.

]]>
Drowning in Texts https://mith.umd.edu/drowning-in-texts/ Wed, 24 Oct 2012 13:00:54 +0000 http://mith.umd.edu/?p=9712 The comments on the Digital Mishnah demo deserve a full response (although the short response is: thank you and, in almost all cases, I agree). However, for this post I want to report on progress in getting and identifying texts for the extended demo. We have made the decision to build out from the sample [...]

The post Drowning in Texts appeared first on Maryland Institute for Technology in the Humanities.

]]>
The comments on the Digital Mishnah demo deserve a full response (although the short response is: thank you and, in almost all cases, I agree). However, for this post I want to report on progress in getting and identifying texts for the extended demo. We have made the decision to build out from the sample chapter in Bava Metsi’a to all of tractate Neziqin (the “Bavot”), a 30-chapter and 13-14,000-word base text to work with.

Michael Krupp has generously provided transcriptions of 4 orders for three manuscripts (Kaufmann, Parma de Rossi 138, and Cambridge Add. 470.1). The first is now available in an electronic version that is far better than what was available to Krupp when the transcriptions were made. The Cambridge ms  is presumably based on the edition of it by Lowe in the nineteenth century, and the Cambridge Libraries reported recently that that manuscript would be available on line. (At least, that’s what the Genizah Unit said on Facebook on July 4.) So there is room for improving the texts and resources available to do so. This should facilitate making substantial blocks of text available rather quickly. The problem is actually finding the time to encode the texts …

Meanwhile, with the participation of Lieberman Institute, under the direction of Shamma Friedman and the aid of Leor Jacoby, I am gradually filling out the corpus of texts available. I say gradually not because the work on the part of the Institute transcribers is slow. However, our agreement is for transcribers to provide transcriptions, and I see to the conversion to XML.

Those in the “biz” know that Yad Izhak Ben-Zvi and the Friedenberg Genizah Project recently published a three volume Thesaurus of Talmudic Manuscripts, edited by Sussman. The detailed information on joins makes it easier to prioritize fragments to transcribe. (It also leaves me feeling “scooped,” since my discoveries of joins were in most cases, possibly in all, anticipated by the Thesaurus, which was not yet available when I started working on this project.) On the basis of that catalog, the number of distinct shelfmarks for witnesses (once we include all the fragments of joined manuscripts where one or more fragment has text in the Bavot) runs to 200.

So, aside from wondering about next steps on the application that will drive the edition, I am drowning in texts. Happily, but drowning nonetheless.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on October 20th, 2012.

The post Drowning in Texts appeared first on Maryland Institute for Technology in the Humanities.

]]>
Digital Mishnah: Live Demo https://mith.umd.edu/digital-mishnah-live-demo/ Tue, 04 Sep 2012 12:00:50 +0000 http://mith.umd.edu/?p=9021 I am pleased to say that with a lot of work on a lot of people’s part, there is now a live demo of the Digital Mishnah Project. The demo is just that: a demonstration of possible functionalities.This post will outline some of the features that were always meant to be temporary and some new [...]

The post Digital Mishnah: Live Demo appeared first on Maryland Institute for Technology in the Humanities.

]]>
I am pleased to say that with a lot of work on a lot of people’s part, there is now a live demo of the Digital Mishnah Project. The demo is just that: a demonstration of possible functionalities.This post will outline some of the features that were always meant to be temporary and some new planned or desired features, and then invite comments.

What will be changed

  • The selection of witnesses. Entering numerals is unwieldy. Ideally, users should be able to slide text “icons” around (as one does with a pivot table in Excel, for instance)
  • Output in browse functions. A single chapter was used for the demo version. Future versions will allow users to select specific chapters and/or specific ms pages and progress by page or chapter. Metadata should perhaps be hideable.
  • Output in collate functions. The demo groups output together; these are actually alternative functions.

Additional basic functionalities

  • Ability to download or print results.
  • Ability to  compare longer texts (whole chapters)
  • Improved collation–and/or the ability to select alternative collation methods

Desiderata

  • Statistical tools, such as multi-dimensional scaling and clustering, to group manuscripts and display results
  • Since there will inevitably be errors in collation, ability to correct alignment and re-run various operations
  • Dynamic synoptic view, in which two or more witnesses can be viewed in parallel columns, with the ability to highlight textual differences or other features.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on August 30, 2012.

The post Digital Mishnah: Live Demo appeared first on Maryland Institute for Technology in the Humanities.

]]>
Digital Mishnah: Summer Update https://mith.umd.edu/digital-mishnah-summer-update/ Tue, 24 Jul 2012 11:55:05 +0000 http://mith.umd.edu/?p=8706 In addition to getting the demo ready to go live–it’s ready to go!–this summer’s agenda has been to add texts and add reference material. We now have two sets of reference data ready to implement. The heavy lifting for this was done by Atara Siegel, an undergraduate at Stern College, who worked for me for [...]

The post Digital Mishnah: Summer Update appeared first on Maryland Institute for Technology in the Humanities.

]]>
In addition to getting the demo ready to go live–it’s ready to go!–this summer’s agenda has been to add texts and add reference material.
We now have two sets of reference data ready to implement. The heavy lifting for this was done by Atara Siegel, an undergraduate at Stern College, who worked for me for several weeks this summer. Atara prepared the lists, and, for the newly expanded sample text (tractates Bava Qamma, Bava Metsi’a and Bava Batra) also linked the relevant words in the reference text to the names list.

  • Personal Names. This list is based on the list of Tannaim in the Mishnah in Albeck, Mavo la-mishnah, cross-referenced with the relevant names from Stemberger-Strack, Introduction to the Talmud and Midrash.
  • Place Names. This list is based on three sources: B-Z Segal, Ha-geografya ba-mishnah, conveniently digitized here, cross-referenced with Tsafrir, et al., Tabula Imperii Romani: Iudaea-Palaestina, and G. Reeg, Die Ortsnamen Israels nach der rabbinischen Literatur. (Note: Map references are given according to the Survey of Israel coordinates; we will have to find alternatives for non-Palestine sites.)

In addition, we continue to add to the corpus of texts. The last of the planned witnesses for Bava Metsi’a Chapter 2 (my initial sample text) will be done by the end of the Summer, thanks to Bruce Roth, a graduate student at the Baltimore Hebrew Institute at Towson University, and transcribers students at Catholic University are preparing Genizah fragments.Working with the Lieberman Institute in Israel, I am preparing to have a number of witnesses to all three Bavot. We are starting with the Maimonides autograph and the Paris MS (Bibliothèqe nationale de France, Heb 328-329).
I keep holding out hope that the state of the Naples first edition is good enough that one should be able to OCR the text, but my experiments thus far have been disappointing.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on July 10th, 2012.

The post Digital Mishnah: Summer Update appeared first on Maryland Institute for Technology in the Humanities.

]]>
Almost Ready for Prime Time https://mith.umd.edu/almost-ready-for-prime-time/ Fri, 25 May 2012 13:00:03 +0000 http://mith.umd.edu/?p=8418 We now have two versions of a demos up and ready to run. Both allow a user to pull data from the witness files, containing manuscript transcriptions, select texts to compare, run the texts through a version of CollateX, then present the results as an alignment table (a “synopsis” in or “partitur” in some text-critical [...]

The post Almost Ready for Prime Time appeared first on Maryland Institute for Technology in the Humanities.

]]>
We now have two versions of a demos up and ready to run. Both allow a user to pull data from the witness files, containing manuscript transcriptions, select texts to compare, run the texts through a version of CollateX, then present the results as an alignment table (a “synopsis” in or “partitur” in some text-critical dialects), and as a text with apparatus.

The second of these is still buggy (and the cause of both a couple of late nights night and the lateness of this post (for which I apologize heartily to the nice people at MITH)), but it does a couple of additional things:

  • Prioritization. While the ability to generate all sorts of different apparatus is a desideratum, at present what we can do is choose the order in which results are presented, and, in the case of presenting a text with apparatus, the first text chosen becomes the base text for comparison.
  • Tokenizing. I am now able to tokenize in two steps. First with “rich” tokens that retain data about the individual words (e.g., abbreviations, which should be compared based on their expanded text rather than on the abbreviation as written), as well as other data in the text (page breaks, etc). From there we can create “regularized” tokens. For now I have regularized the tokens by removing all yods and waws. Additional candidates might include dealing with prepositions that are sometimes but not always attached in medieval Mishnah manuscripts (shel, e.g.), final aleph/heh, and final nun/mem. “Simple” tokens are passed to Collatex (or, we allow Collatex to process “rich” tokens) and the resulting collation output is merged with the rich tokens.
  • Presentation. Because the “rich” tokens retain information about the witness, it is possible to generate a “text-with-apparatus” in which the base text can be presented with formatting and contextual information that may be useful to the reader. (Disclaimer: Here is a big bug: The XSLT that joins the two lists of tokens inserts the non-words (page breaks etc.) in a position that is offset by one location. Any suggestions?)

Next up: modifying the demo to present multi-column synopses, and linking in Talmudic and Commentary citations.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on May 24th, 2012.

The post Almost Ready for Prime Time appeared first on Maryland Institute for Technology in the Humanities.

]]>
Housekeeping https://mith.umd.edu/haim-lapin-housekeeping/ Mon, 30 Apr 2012 13:30:23 +0000 http://mith.umd.edu/?p=8101 The Site I’ve now updated the “Examples of Work” page on digitalmishnah.org to include viewable samples. Thanks to Kirsten Keister for setting up the light box format to view the samples. The examples include two samples of work that processes more than one text (collation, synopsis) and a number of examples of manuscripts. The Project [...]

The post Housekeeping appeared first on Maryland Institute for Technology in the Humanities.

]]>
The Site
I’ve now updated the “Examples of Work” page on digitalmishnah.org to include viewable samples. Thanks to Kirsten Keister for setting up the light box format to view the samples. The examples include two samples of work that processes more than one text (collation, synopsis) and a number of examples of manuscripts.

The Project
I’ve been working on two issues. One is pointing. I now have a complete set of pointers from the reference file (ref.xml) to the witness files for locating spans of damaged text and page and fragment beginnings and ends for fragmentary texts. Of course, because nothing is simple, the direction of all of these will have to be reversed, so that the individual witnesses point into the reference text.
In addition, I’ve improved the tokenization process, so that I can process “rich” tokens, retaining data about the word in question (e.g., that it is an abbreviation, or deleted ….; hold a regularized spelling as well as the original) as well as simple tokens, and re-join a collation based on simple tokens with the complex tokens.

Text Geek Heaven
Along the way, I’ve discovered some joining Genizah fragments. The coolest by far on a technical, jigsaw-puzzle level is the four-way join between TS AS 78.69, TS AS 78.162, TS AS 78.235 and TS NS 329.286 (Cambridge). The four fragments adjoin yet another, TS E2.71. This will be featured as a Fragment of the Month of the Taylor-Schechter Genizah Research Unit. Look for it there!
Cool in that that they join material from multiple cities are:

  • TS E1.99 (Camb), MS heb. c.21/6, 8-11 (Oxf), TS F6.3 and Yevr. II A 294 (Pet), joining fragments from Cambridge, Oxford, and Peterberg, and:
  • TS AS 85.270 (Camb) and MS R2339, fol. 1 (JTS), joining fragments from Cambridge and New York

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on April 26th, 2012.

The post Housekeeping appeared first on Maryland Institute for Technology in the Humanities.

]]>
Progress, real but in small steps https://mith.umd.edu/progress-real-but-in-small-steps/ Mon, 12 Mar 2012 13:01:40 +0000 http://mith.umd.edu/?p=6940 I had been holding out for my next post for a new Digital Mishnah website, courtesy of MITH, and a new collation demo hosted on it, but, that will be for my next post, deo volente. Since my last confession, I have: Submitted a paper that details methods and progress to date. It’s for a [...]

The post Progress, real but in small steps appeared first on Maryland Institute for Technology in the Humanities.

]]>
I had been holding out for my next post for a new Digital Mishnah website, courtesy of MITH, and a new collation demo hosted on it, but, that will be for my next post, deo volente.

Since my last confession, I have:

  • Submitted a paper that details methods and progress to date. It’s for a Festschrift, and I’ve been asked not to state the venue openly, but can share a draft.
  • Thought a lot about (and only partly understand) multivariate statistics.
  • Completed the first round of markup for all the Genizah fragments for my sample chapter. A second round of markup linking the fragments to the reference text needs to be done (next bullet). Formatted versions of these texts will be viewable
  • Started rethinking how to handle the encoding of highly fragmentary texts. In particular, I’ve found four pieces of a single sheet of text in two different locations in the Taylor-Schechter collection (TS AS 78.69 + TS AS 78.162 + TS AS 78.235  + TS NS 329.286; the sheet adjoins another single sheet from a third box, TS E2.71). For the present, we are encoding each fragment as a document, and recording the extent of the lacunae at the edges of the fragment as fitting within the smallest properly oriented rectangle that encloses the fragment. What needs doing is a pointing scheme that will point into the reference text.
  • Identified the next fragments to work on to expand the work to Tractate Neziqin (aka the Bavot), and started to recruit people to work on it.

Next up, completing fragmentary texts; encoding the remaining Mishnah texts in the Babylonian Talmud mss., and learning some Java.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on March 11, 2012.

The post Progress, real but in small steps appeared first on Maryland Institute for Technology in the Humanities.

]]>
Thinking about the End Product https://mith.umd.edu/thinking-about-the-end-product/ Thu, 26 Jan 2012 15:12:24 +0000 http://mith.umd.edu/?p=4949 Since my last post, I have been working on a grant application. This has afforded the opportunity of some stock taking. I’ve also had some very helpful conversations with scholars in the field: Juan Garcés and Matt Munson in Hebrew Biblical Studies, Tim Finney in New Testament and Desmond Schmidt in textual computing and classics. [...]

The post Thinking about the End Product appeared first on Maryland Institute for Technology in the Humanities.

]]>
Since my last post, I have been working on a grant application. This has afforded the opportunity of some stock taking. I’ve also had some very helpful conversations with scholars in the field: Juan Garcés and Matt Munson in Hebrew Biblical Studies, Tim Finney in New Testament and Desmond Schmidt in textual computing and classics.

1. Collation. Based on very simple normalization and tokenization and a few samples, CollateX will remain error prone, unless the algorithm changes significantly. Examples: (1) In a Mishnah section with repeated words, slight differences in spelling resulted in pushing a whole clause off to the second match. (2) In another passage, CollateX failed to diagnose a missing clause in the text and aligned non matching tokens. My estimate is that currently the error rate is above 10% (for one passage it was about 15%). Better normalization will improve this result. This raises the question of whether the normalization (or, which may amount to the same thing, having CollateX ignore certain characters in comparison) can be carried out automatically, and what this would look like, or whether, as Desmond Schmidt assures me, the whole enterprise is wrongheaded.

2. Statistical measures, now done by hand, but ideally automated. I have now invested in a license for SPSS. This, and my old friend Excel have allowed me to run some preliminary analyses. First: run collations on every Mishnah section in my sample chapter using a few representative witnesses. Transfer the output to Excel; manually fix the alignment (remember, high error rate). Then start flagging variations. I have opted for a method that is akin to what Schmidt and Tim Finney have used: effectively to create a master document with all possible readings, and use a binary encoding (1, 0) for each witness for whether the reading appears in a given witness. Use SPSS to generate a distance matrix, multi-dimensional scaling (MDS), and clustering. I have also experimented with sites providing a graphic interface to Bioinformatic software (FastME and Phylip) to produce phylogenetic trees.

The results were interesting enough that I wanted to see the results with more careful identification of variance (I’m doing these by hand, after all) and more witnesses. I used the sections with the fullest representation among witnesses (Chapter 2, Mishnah 1-2), choosing a total of 10 witnesses. The results I got were consistent with the larger text sample and fewer witnesses, but neither represented the accepted wisdom on the relationship between manuscripts. I therefore divided the cases between no-variation, substantive (different word, different gender, change in grammatical form), and orthographic (initial waw, matres lectiones, spacing between preposition and word). As an example, the Greek word emporia generated no fewer than six variant spellings, but all represented a recognizable version of the word.

Now, there were some interesting results: the manuscripts thought to be of the “Palestinian type” clustered closely on substantive differences, considerably less so (and differently) on orthographic differences.

MDS for Substantive Differences, 10 Witnesses

MDS for Substantive Differences, 10 Witnesses

 

MDS for Orthographic Differences, 10 Witnesses

MDS for Orthographic Differences, 10 Witnesses

 

Rooted Tree (Phylip) for Substantive Differences, 10 Witnesses

Rooted Tree (Phylip) for Substantive Differences, 10 Witnesses

The lesson: Orthographic and substantive variations do not coincide, probably due to scribal decision-making (and inconsistency). Substantive differences  seem to be better for groupings of text families. (This may be easier to identify automatically as well: normalizing orthography to improve collation erases orthographic difference (by definition), while retaining non-orthographic difference). But lingusitic and orthographic differences are of research significance too.  We may need a way for the user to flag readings to be compared.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on January 25, 2012.

The post Thinking about the End Product appeared first on Maryland Institute for Technology in the Humanities.

]]>
New Output https://mith.umd.edu/new-output/ Tue, 03 Jan 2012 14:58:24 +0000 http://mith.umd.edu/?p=4745 Only spammers seem to be noticing this blog, but for web-trolling software that might be interested in digital humanities and philology I thought I might add that I have updated the sample output from Collatex. collatex-table-apparatus.html shows output from user-specified witnesses in the form of (1) an alignment table based on user-specified order, (2) an [...]

The post New Output appeared first on Maryland Institute for Technology in the Humanities.

]]>
Only spammers seem to be noticing this blog, but for web-trolling software that might be interested in digital humanities and philology I thought I might add that I have updated the sample output from Collatex.

collatex-table-apparatus.html shows output from user-specified witnesses in the form of (1) an alignment table based on user-specified order, (2) an extracted text of a base text (taking the first specified witness is the base text), (3) generating an apparatus.

CollateX is not perfect. Some of the output problems are the result of tokenizing (the samples used were tokenized very coarsely) and can be fixed. Abbreviations and the phenomenon of connected or unconnected prepositions (של, also words such as כיצד) can also be fixed. But some errors have to do with how CollateX deals with edit distance. Not sure how we are going to handle this.

Hayim Lapin is Robert H. Smith Professor of Jewish Studies and Professor in the Department of History at the University of Maryland. He currently is completing a faculty fellowship at MITH. This post originally appeared at Digital Mishnah on January 2, 2012.

The post New Output appeared first on Maryland Institute for Technology in the Humanities.

]]>